12. Clickhouse vs dbt#
Status#
Accepted
Context#
Most of the use cases for Aspects require complex transformations to be performed on the data stored in Clickhouse before it can be displayed in Superset. While xAPI is a good format for communicating learning-related events as they happen, it’s not the best format for understanding the effect of these events over time. In order to analyse and draw conclusions for what these events signify over time, Aspects needs efficient views of the data that are designed specifically for the queries being performed.
Clickhouse provides some features for transforming complex data, but creating efficient views and managing schema changes over time can be difficult. Plus, some of these features (like named collections) are not available on all hosted Clickhouse services.
This ADR describes the division of responsibilities in Aspects between the Clickhouse database and data build tool (dbt), a data transformation tool.
Decision#
Store in Clickhouse, transform with dbt
Raw event and event sink data tables should continue to be created using 7. Alembic migrations.
Data transformations on this raw data should be made with data build tool (dbt), including:
materialized views
partitions
dictionaries (pending support, see tutor-contrib-aspects#565)
fields extracted from event JSON
Note
We use dbt to manage the database schema that performs the transformations, but the transformations themselves happen in the ClickHouse process. As data is inserted it is immediately transformed and stored in the various query-efficient tables.
Consequences#
Contribute upstream to dbt-clickhouse where support for required features is missing, where possible.
Move transformations made by the “query” and “dataset” Aspects Superset assets to aspects-dbt.
Move dictionaries and partitions originally created using 7. Alembic migrations to aspects-dbt.
Squash remaining alembic migrations.
Rejected Alternatives#
Use native Clickhouse transforms instead of dbt
This option was rejected for maintainability reasons: data build tool (dbt) was designed to manage data transformations with its package and test framework, and so is more modular and reusable, and better suited to Aspects’ long-term goals.
References#
tutor-contrib-aspects#546 Only recreate materialized views when necessary
tutor-contrib-aspects#565 Add dictionary support to dbt-clickhouse