12. Clickhouse vs dbt#

Status#

Accepted

Context#

Most of the use cases for Aspects require complex transformations to be performed on the data stored in Clickhouse before it can be displayed in Superset. While xAPI is a good format for communicating learning-related events as they happen, it’s not the best format for understanding the effect of these events over time. In order to analyse and draw conclusions for what these events signify over time, Aspects needs efficient views of the data that are designed specifically for the queries being performed.

Clickhouse provides some features for transforming complex data, but creating efficient views and managing schema changes over time can be difficult. Plus, some of these features (like named collections) are not available on all hosted Clickhouse services.

This ADR describes the division of responsibilities in Aspects between the Clickhouse database and data build tool (dbt), a data transformation tool.

Decision#

Store in Clickhouse, transform with dbt

Raw event and event sink data tables should continue to be created using 7. Alembic migrations.

Data transformations on this raw data should be made with data build tool (dbt), including:

  • materialized views

  • partitions

  • dictionaries (pending support, see tutor-contrib-aspects#565)

  • fields extracted from event JSON

Note

We use dbt to manage the database schema that performs the transformations, but the transformations themselves happen in the ClickHouse process. As data is inserted it is immediately transformed and stored in the various query-efficient tables.

Consequences#

  1. Contribute upstream to dbt-clickhouse where support for required features is missing, where possible.

  2. Move transformations made by the “query” and “dataset” Aspects Superset assets to aspects-dbt.

  3. Move dictionaries and partitions originally created using 7. Alembic migrations to aspects-dbt.

  4. Squash remaining alembic migrations.

Rejected Alternatives#

Use native Clickhouse transforms instead of dbt

This option was rejected for maintainability reasons: data build tool (dbt) was designed to manage data transformations with its package and test framework, and so is more modular and reusable, and better suited to Aspects’ long-term goals.

References#