Database Migrations

Caution

Make sure you read this entire document before running a migration.

What are migrations?

As Integrates and the business evolve, it is natural for the structure of the data to change. In order to keep backwards compatibility, it is necessary to run data migrations that change all existing data so it complies with the latest data schema.

For example:

We have a cars database and are storing two attributes, color and brand.
At some point in time, we decide to also store the price attribute.
When this happens, we have to go through all the already-created cars and add the new price attribute accordingly.

Writing migration scripts

Note

All migration scripts are kept in the repo for traceability purposes, but are not required to be maintained and kept up with the rest of the codebase when breaking changes take place.

You can find all the already-executed migrations in the GitLab repository. The latest of them may be helpful as inspiration when creating your own migration.

Basic properties

All migration scripts have a comment including:

A basic description of what they do
An Execution time that specifies when it started running.
A Finalization Time that specifies when it finished running.

The main function

Your migration script should contain a main function, which will be called when the migration runs.

import logging
import logging.config
from settings import (
    LOGGING,
)
import time

logging.config.dictConfig(LOGGING)
LOGGER_CONSOLE = logging.getLogger("console")

async def main() -> None:   
    """Your code goes here"""

You can call dataloaders, domain functions, data model functions, and even direct calls to the corresponding datastore module, depending on the level of abstraction best suited to achieve the intended change.

Running migrations

Note

Data migrations tend to be risky operations, as they may introduce inconsistencies and errors, and therefore, it is advised to request a review in the Merge Request before running it.

Dry runs

As migrations affect production data, it is very important that you take all necessary measures so they work as expected.

A very useful measure is dry runs. Dry runs allow you to run migrations on your local environment.

To execute a dry run:

Write your migration.
Turn on your development environment.
Run integrates-db-migration dev name_of_script

Tip

It is the dev argument that allows us to know that we should do a dry run.

Tip

For a file named _0608_remove_root_url.py, the script would be integrates-db-migration dev 0608_remove_root_url.

This approach allows you to locally test your migration until you feel comfortable enough to run it on production.

Running locally

If you have the required role to modify the database, migrations can be executed from your machine by running: integrates-db-migration prod name_of_script.

Quality check

Use the default Integrates linter for linting migrations scripts as:

integrates-back-lint

Since our core linter also runs import-linter, therefore, it will verify that forbidden modules are not being imported in migration scripts; this is done to avoid direct usage of modules that directly modify the database.

Unless strictly necessary, prefer using already existing utils in db_model; otherwise, add exception rules to your migration script.

If you require adding a rule exception, then modify the integrates/back/import-linter.cfg config file.

Running on AWS Batch

Once you know that your migration does what it is supposed to do, it is recommended to execute it using a Batch schedule:

Write your migration.
Create a batch schedule that executes the migration.
Deploy both changes to production
Wait until the schedule executes.
Access the AWS console to review the logs of the migration.

This allows the migration to execute on an external environment from your own machine, which is faster and more reliable.

Restoring to a previous state

If something goes wrong, you have the option to restore data from a backup.

Follow these instructions to restore a Point In Time into a new table.
Restore the data by reading from the recovery table and writing into the main table
Remove the recovery table

Delete migrations

Migrations should be kept in the repository for at least one year. After that, they should be deleted to avoid compatibility issues.

There is no need to delete a migration immediately after one year has passed (we have no test in the CI for that). The usual procedure is simply to delete them in bulk at least twice a year, once in January and another time in July.

More about the database

Tip

Have an idea to simplify our architecture or noticed docs that could use some love? Don't hesitate to open an issue or submit improvements.

Database Migrations | Data Schema Evolution | Fluid Attacks Help