Extending dbt#
As noted in data build tool (dbt), you can install your own custom dbt package to apply your own transforms to the event data in Aspects.
This guide demonstrates how to create and use a custom dbt package in Aspects by building the aspects-dbt-example repo. See aspects-dbt-example#1 to follow along with each of these steps.
See Building dbt packages and Best practice guides for dbt’s official documentation.
Step 0. Install dbt core#
The easiest way to install dbt core is to use pip in a python3 virtual environment.
See aspects-dbt requirements.txt for the specific package versions used by Aspects.
# Create and activate a python3 virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
# Install the dbt package versions used by Aspects
pip install dbt-clickhouse==1.7.2 dbt-core==1.7.0
See Install dbt for more ways to install dbt.
Step 1. Initialize your dbt package#
Create a new dbt package by following the prompts given by the dbt init
tool.
dbt init
# Enter a name for your project (letters, digits, underscore): aspects_dbt_example
# Which database would you like to use? [1] clickhouse
ls aspects_dbt_example
# analyses/ dbt_project.yml macros/ models/ README.md seeds/ snapshots/ tests/
See About dbt init for more options.
Step 2. Use the Aspects profile#
Update the generated dbt_project.yml
to use the aspects
profile:
# This setting configures which "profile" dbt uses for this project.
profile: 'aspects'
Also set the location for compiled SQL to match that used by aspects
:
# directory which will store compiled SQL files
target-path: "target"
Step 3. Link to aspects-dbt#
Aspects charts depend on the transforms in aspects-dbt, so it’s important that your dbt package also installs the same version of aspects-dbt as your version of the Aspects Tutor plugin.
To do this, add a packages.yml
file to your dbt package at the top level where:
git
url matches the default value ofDBT_REPOSITORY
in tutor-contrib-aspects plugin.pyrevision
matches the default value ofDBT_BRANCH
in tutor-contrib-aspects plugin.py
packages:
- git: "https://github.com/openedx/aspects-dbt.git"
revision: v3.4.1
Step 4. Add your custom transforms#
Here is where you will need an understanding of dbt, Clickhouse, Aspects’ data schemas, and the specific transforms you want to create.
If you need any python dependencies beyond what is provided by aspects-dbt, add these to a requirements.txt
file at
the top level of your repository.
Note
You can use Aspects to debug your custom SQL:
Login to Superset as an Open edX superuser.
Using the menus at the top of the page, navigate to the “SQL -> SQL Lab” UI.
Browse the schemas and run read-only SQL queries on your data.
For this tutorial, we added a new model which will be materialized by dbt into a view in Clickhouse. Our new model calculates the average number of attempts made by users on each problem by referencing the int_problem_results model created by the base aspects-dbt package (see dbt ref):
select
problem_id, AVG(attempts) as average_attempts
from
(
select
problem_id,
max(attempts) as attempts
from
{{ ref('int_problem_results') }}
group by
actor_id,
problem_id
)
group by problem_id
Next, make sure your model is configured, for example in the db_project.yml
. If you forget this step, dbt will warn
you when running your package.
models:
problem_responses:
# Config indicated by + and applies to all files under models/problem_responses/
+materialized: view
See Configure dbt models to learn more.
Step 5. Add tests#
Writing tests for your transforms is important: not only can tests validate and document your intended changes, they can be used to guard against data edge cases and regressions from future code changes.
dbt generic tests are defined as SQL files, where the goal of the SQL statement is to return zero records.
Because our new average_attempts model aggregates on actor_id and problem_id, it should only have 1 entry for each problem_id. So our test can be:
-- average_attempts should only have one record for each problem_id.
select
count(*) as num_rows
from
{{ ref('average_attempts') }}
group by
problem_id
having num_rows > 1
See Writing data tests for more examples.
Step 6. Install and use your dbt package#
Once you’ve pushed all the changes to your custom dbt package repo, now we’re ready to use it.
Use tutor config save
to update the following Tutor variables to use your custom package instead of the Aspects
default.
DBT_REPOSITORY
: A git repository URL to clone and use as the dbt project.Set this to the URL for your custom dbt package.
Default:
https://github.com/openedx/aspects-dbt
DBT_BRANCH
: The branch to use when cloning the dbt project.Set this to the hash/branch/tag of the custom dbt package that you wish to use.
Default: varies between versions of Aspects.
EXTRA_DBT_PACKAGES
: Add any python packages that your dbt project requires here.Default:
[]
DBT_PROFILE_*
: variables used in the Aspectsdbt/profiles.yml
file, including several Clickhouse connection settings.DBT_SSH_KEY
: The private SSH key to use when cloning the dbt project. Only necessary if you are using a private repository.
Once your package is configured in Tutor, you can run dbt commands directly on your deployment.
See dbt commands for a full list of available commands.
# Build and test your package
tutor dev do dbt -c "build"
# Deploy your customizations
tutor dev do dbt -c "run"
# Run tests on the data
tutor dev do dbt -c "test"
Step 7. Troubleshooting#
You may need to repeat steps 4-6 a few times to resolve any warnings or errors that dbt reports with your package.
Don’t forget to push your changes to your repo before running the tutor dbt command: it fetches a clean copy of your configured package repo + branch each time it runs.
See dbt debugging and The missing guide to debug() in dbt for more information on how to debug issues with your package.
References#
Building dbt packages: dbt’s guide to building packages
Best practice guides: dbt’s guidelines on project structure, style, and setup.
About dbt models: dbt’s guide to creating SQL or Python model transforms
Writing data tests: dbt’s guide to writing package tests
dbt commands: list of all dbt commands
dbt debugging: guide for debugging issues with dbt
The missing guide to debug() in dbt: detailed advice for debugging issues with dbt
aspects-dbt: Aspects’ dbt package
aspects-dbt-example: the demo custom dbt package used in this tutorial.
eduNEXT/dbt-aspects-unidigital: a real custom dbt package running in production Aspects