# Intro

This plugin implements tracking of document statuses (as defined by the client).
Status identifies a processing flow step, for example "sent for review", "needs lodgement", "archived", etc.

Functional requirements:
- Status has a handful of attributes (name, description, timestamp, ...)
- list of statuses along with permissions should be (user) configurable and not hard-coded
- Document status trail records all status changes (append-only).
- There are two major document categories called BBCA and non-BBCA; all document types
(Reports, Assessments, Contracts...) are categorized into one or another.
- Not all roles have access to *view* status trail history
- Different user roles might have different status *update* permissions, based on the document type and current status. System must enforce these permissions.
- Some users have more than one role. Plugin should not allow the same user holding different roles to perform successive updates (Maker set 001 status, Checker set 002)
- plugin should be able to export status trail as a PDF
- there is no limit on number of status updates
- some statuses are marked as "complete" and no further processing is expected to take place

# Design

The plugin uses SQL to store all data. Status configuration contains status definitions along with permissions required for updates:
```json
"permissions": {
  "draft": {
    "reviewed" : [
      {"editor": ["BBCA"]},
      {"senior_editor": ["BBCA","non-BBCA"]},
    ]
  }
}
```
For a draft document, users having role`editor` can update the status to `reviewd` only for `BBCA`, whilst `senior_editors` can do it for both document types.

Not all roles should have access to status tracking history. `access_roles` config option controls who can fetch status trail:
```json
  "access_roles": [
    "Workbench: CSSupportChecker",
    "Workbench: CSSupportMaker",
    "iLMS:BBCACSOChecker",
    "iLMS:BBCACSOMaker",
    "iLMS:BLTOfficer",
    "iLMS:BBCACSOChecker",
    "Workbench:Admin",
    "Workbench:SystemAdmin",
    "iLMS:BBCACSOMaker"
  ],
```

In order to accommodate for flexible status configuration and ever growing list of status updates,
we'll use JSON column to store both. Both PostgreSQL and MySQL have fairly large size limit
of a few megabytes which is enough to store tens of thousands of status updates.

Squirro has home-made database migration support but we decided to give alembic try.
There was a major change in SQLAlchemy v2 regarding type checking:
- https://docs.sqlalchemy.org/en/14/orm/extensions/mypy.html
- https://docs.sqlalchemy.org/en/20/orm/extensions/mypy.html

Once the platform upgrades to v2 make sure to remove sqlalchemy2-stubs.

## Data consistency

We have a distributed system - ElasticSearch & SQL, which should be kept in sync and self-recover from inconsistencies.
These operations currently write data to both storage systems:

1. Document upload → write to ES; write to SQL (create document); write current status code to elastic
2. status update → write to sql; write current status code to elastic
3. bulk import → write to sql; post-sql write to elastic

We'll do best-effort attempt for the second scenario which reduces likelyhood of failures
by checking db connection and contraints before trying to write to disk:

```python
db.flush();
try:
   update_elastic()
   db.commit() # if this fails we have an inconsistency
except:
   db.rollback()
```

The first scenario is asynchronous (ingestion pipeline calls studio api at some point),
so we've decide to use a queue (redis) to eventually recover:
```python
consumer = RedisQueueProcessor(
    redis=redis_client,
    queueus=[error_queue],
)
run_background_processes(
    count=1,
    queues=None,
    pool=None,
    task_processor_cls=consumer,
    name=error_queue,
)
errors_queue = Queue(error_queue, connection=redis_client)
...
# then later if creating a db record fails put it on the queue:
    try:
        session.add(doc_record)
        session.commit()
    except Exception as e:
        session.rollback()
        job = errors_queue.enqueue(
            err_handler.retry_document_create,
            request_data,
            ttl=300,
            retry=Retry(
                max=5,
                interval=10,20,60,
            ),
        )
```
Message consumer will try to write to sql and update elastic.

Job parameters can be configured in the `config.ini`:
```ini
[status_tracking]
error_redis_queue_name = status_tracking_sync
# http_request_timeout_seconds
http_request_timeout_seconds = 5
# how many times to try to recover:
error_retry_max_attempts = 100
# max queue message lifetime, 1 week by default so we have time to investigate:
error_retry_ttl_seconds = 600000
# time between retry attempts, must be larger than http request timeout.
error_retry_interval_seconds_csv = 60,120,300
```

### Type checking in alembic

todo

## Deployment

Upload the code and then restart the frontend service:
```bash
# see upload.sh script:
squirro_asset studio_plugin upload -f ../document_status_tracking --token $TOKEN  --cluster $CLUSTER
# ssh $CLUSTER -c systemctl restart sqfrontendd.service
```

Every time the plugin loads it runs [alembic upgrade](https://alembic.sqlalchemy.org/en/latest/tutorial.html#running-our-first-migration) - [plugin.py:run_db_migrations()](./plugin.py#L58).

If database is clean the next thing to do is to provide status tracking configuration -
status definitions and update permissions define which document statuses exist,
valid transitions between them, and permissions required for triggering status updates.
[StatusTrackingConfig](./status_tracking/models.py) model class implements persistance
and business logic related to status tracking.

The plugin exposes two API endpoints for managing status tracking configuration,
and a studio panel in the Server space `/app/#studio/<project_id>/document_status_tracking`.

The initial status configuration version can be found in [./status-configuration.json](./status-configuration.json).

Each updapte creates a new version and old ones are not removed, it might help reconsile problems
when incompatible status changes are introduced.

### Prod deployment

The prod instance does not have internet access for security reasons so all dependencies must be bundled.
When adding a new dependency raise a ticket or ensure that engineering builds it into an RPM.

## Local development

Install plugin's python dependencies from [requirements.txt](./requirements.txt).

The plugin needs an SQL db, here's how to run it using docker:

```bash
# MySQL
docker run --name mysql-studio -d -p 3306:3306 -e MYSQL_ROOT_PASSWORD=superdupersecret -e MYSQL_DATABASE=studio -e MYSQL_USER=studio -e MYSQL_PASSWORD=superdupersecret mysql:8.0
export DATABASE_URL="mysql+pymysql://studio:superdupersecret@0.0.0.0:3306/studio?charset=utf8"
```

```bash
# PostgreSQL
docker run --name ocbc-status-pg -d -p 5432:5432 -e POSTGRES_PASSWORD=superdupersecret -e POSTGRES_DB=studio -e POSTGRES_USER=studio postgres:14
export DATABASE_URL="postgresql+psycopg2://studio:superdupersecret@0.0.0.0:5432/studio"
```

With this run alembic to create db schema:
```bash
alembic upgrade head
```

## Troubleshooting

To enable detailed loging edit `/etc/squirro/frontend.ini`:
```bash
[logger_root]
level = DEBUG
[studio]
log_db = True
```

### DependencyNotFound: Could not instantiate db_studio

```bash
ERROR    squirro.integration.frontend.studio Cannot load the studio plugin from file: 'squirro/topic/assets/studio_plugin/_global/document_status_tracking/plugin.py'
...
  File "alembic/env.py", line 33, in get_engine
    session = get_injected("db_studio")
  File "/opt/squirro/virtualenv38/lib64/python3.8/site-packages/squirro/common/dependency.py", line 123, in get_injected
    raise DependencyNotFound("Could not instantiate %s" % dep)
squirro.common.dependency.DependencyNotFound: Could not instantiate db_studio
```

Make sure db connection works, check frontend.ini, check password, make sure you can connect using a command line client.

db_studio is registered in [`squirro-integration/frontend/main.py`](https://github.com/squirro/squirro-integration/blob/3.9.4/frontend/squirro/integration/frontend/main.py#L304). It tries to detect SQLAlchemy models, and if it doesn't find any it skips register_db:

```
    @staticmethod
    def _setup_studio_db():
        """Setup database access for Studio plugins"""
        try:
            models = get_studio_models()
            if models:
                register_db(
                    section="studio",
                    description="Studio DB",
                    base=models.values(),
                    dependency="db_studio",
                )

```

Instrument `frontend/main.py` to check if that's the case. When it works log messages look like this:

```
INFO     squirro.integration.frontend.studio Loading model.py file: '.../document_status_tracking/model.py'
INFO     squirro.integration.frontend.studio Base of Studio plugin 'document_status_tracking' was loaded successfully
2024-02-05 08:16:03,414 Thread-2 DEBUG    squirro.common.db Dialect name in use: 'mysql'
2024-02-05 08:16:03,415 Thread-2 INFO     squirro.common.context executing Base.metadata.create_all with engine: Engine(mysql+pymysql://studio:***@localhost/studio?

squirro.common.db New connection. Setting transaction isolation level to READ COMMITTED
```

## Document mapping is not valid

Make sure each item has bbca_document attribute.

## Manually run alembic

Case you want to test database migrations:

```
export PYTHONPATH=.../document_status_tracking/pkg/opt/squirro/virtualenv38/lib/python3.8/site-packages/
export DATABASE_URL=mysql+pymysql://<user:pass>@localhost/studio?charset=utf8

./pkg/opt/squirro/virtualenv38/bin/alembic upgrade head
```
