Extend dbt#
As noted in data build tool (dbt), you can install your own custom dbt package to apply your own transforms to the event data in Aspects.
This guide demonstrates how to create and use a custom dbt package in Aspects by building the sample-aspects-dbt repo.
Step 1. Install dbt#
The easiest way to install dbt is to use pip in a python3 virtual environment.
# Create and activate a python3 virtual environment
python3 -m venv venv
source venv/bin/activate
pip install --upgrade pip
Check the aspects-dbt requirements.txt for the specific package versions used by Aspects. Both dbt-core and dbt-clickhouse are required.
# Install the dbt package versions used by Aspects
pip install dbt-clickhouse==x.x.x dbt-core==x.x.x
See Install dbt for more ways to install dbt.
Step 2. Set up new dbt package#
Create a new repository for your custom dbt package, and clone it to your local machine.
In the root of your local repository, create a new dbt package by following the prompts given by the
dbt inittool. See About dbt init for more options.# Use the profile name from your ``dbt/profiles.yml`` file or 'aspects' dbt init --profile aspects # Enter a name for your project (letters, digits, underscore): sample_aspects_dbt ls sample_aspects_dbt # analyses/ dbt_project.yml macros/ models/ README.md seeds/ snapshots/ tests/
Attention
Possible error when running
dbt init:No sample profile found for clickhouse
Make sure the profile matches your local
dbt/profiles.ymlprofile name.# This setting configures which "profile" dbt uses for this project. profile: "aspects"
In
dbt_project.yml, set the location for compiled SQL to match the aspects target path:# directory which will store compiled SQL files target-path: "target"
Create a requirements.txt file at the top level of your repository with the same version of dbt-core as in Step 1. This is needed when building the aspects image.
# requirements.txt dbt-core~=x.x.x
Note
If you need any python dependencies beyond what is provided by aspects-dbt, add these to the requirements file.
Step 3. Link to aspects-dbt#
Aspects charts depend on the transforms in aspects-dbt, so it’s important that your dbt package also installs the same version of aspects-dbt as your version of the Aspects Tutor plugin.
To do this, add a packages.yml file to your dbt package at the top level where:
giturl matches the default value ofDBT_REPOSITORYin tutor-contrib-aspects plugin.pyrevisionmatches the default value ofDBT_BRANCHin tutor-contrib-aspects plugin.py
packages:
- git: "https://github.com/openedx/aspects-dbt.git"
revision: vX.X.X
Step 4. Test dbt connection#
Before adding your custom transforms, it’s a good idea to test that your dbt package can connect to the Clickhouse database and run the base transforms from aspects-dbt.
Run
dbt debugto test the connection to your database and the validity of your dbt project.Run
dbt runto run the base transforms from aspects-dbt.You may need to run
dbt run --full-refreshif the previous step fails.
Attention
Possible error when running dbt commands:
Compilation Error
dbt found 1 package(s) specified in packages.yml, but only 0 package(s) installed in dbt_packages.
Run "dbt deps" to install package dependencies.
Run dbt deps to install the dependencies for your package, including aspects-dbt.
Step 5. Create your custom transforms#
Here is where you will need an understanding of dbt, Clickhouse, Aspects’ data schemas, and the specific transforms you want to create.
Tip
You can use Aspects to debug your custom SQL:
Login to Superset as an Open edX superuser.
Using the menus at the top of the page, navigate to the “SQL -> SQL Lab” UI.
Browse the schemas and run read-only SQL queries on your data.
For this tutorial, we created two new models - course_enrollments and learner_responses and which will be materialized by dbt into a view and materialized view in Clickhouse. More information on materialized views here.
Step 6. Add dbt tests#
Writing tests for your transforms is important to validate and document your intended changes, and guard against data edge cases and regressions from future code changes.
There are two types of dbt tests; data tests and unit tests. Run dbt test to run all tests for your package.
Data tests#
Data tests can be defined in the schema.yml file for each model, and are used to validate properties of the data such as types, accepted values, uniqueness, and relationships between tables.
- name: enrollment_mode
description: "The enrollment mode of the user"
data_tests:
- accepted_values:
values: ["audit", "honor", "verified"]
Data tests can also be defined in a SQL file, where the goal of the SQL statement is to return zero records.
select count(*) as num_rows
from {{ ref('enrollment_mode') }}
group by org, course_key, enrollment_mode, course_name, course_run
having num_rows > 1
Unit tests#
Unit tests are used to validate the logic of your dbt models in isolation from the underlying data. They are defined in a unit_tests.yaml file within the models directory.
See also
Other unit test resources:
Step 7. Install and use your dbt package#
Once you’ve pushed all the changes to your custom dbt package repo, you’re ready to use it.
Use tutor config save to update the following Tutor variables to use your custom package instead of the Aspects default.
DBT_REPOSITORY: The git repository URL of your custom dbt package.(Default:
https://github.com/openedx/aspects-dbt)DBT_BRANCH: The hash/branch/tag of your custom dbt package that you wish to use.(Default: the latest tagged version of aspects-dbt)
- Optional
EXTRA_DBT_PACKAGES: Add any python packages that your dbt project requires here.DBT_PROFILE_*: variables used in the Aspectsdbt/profiles.ymlfile, including several Clickhouse connection settings.DBT_SSH_KEY: The private SSH key to use when cloning the dbt project. Only necessary if you are using a private repository.
Once your package is configured in Tutor, you can run dbt commands directly on your deployment.
# Build and test your package
tutor dev do dbt -c "build"
# Deploy your customizations
tutor dev do dbt -c "run"
# Run data tests on the data
tutor dev do dbt -c "test"
# Run unit tests on the data
tutor dev do dbt -c "test --selector unit_tests"
# To push your new transformations to Superset SQL Lab
tutor dev do import-assets
See dbt commands for a full list of available commands.
Troubleshooting#
Tutor dbt commands may need to be run with
--only_changed Falseto force a full dbt run if you have made changes to your dbt package that are not being picked up.Don’t forget to push your changes to your repo before running the tutor dbt command: it fetches a clean copy of your configured package repo + branch each time it runs.
Next Steps#
With your custom dbt package, you can create new charts and dashboards in Superset using the new models created by your dbt transforms. See Add extra Superset assets for more information.
References#
aspects-dbt: Aspects’ dbt package
sample-aspects-dbt: the demo custom dbt package used in this tutorial
Building dbt packages: dbt’s guide to building packages
Best practice guides: dbt’s guidelines on project structure, style, and setup
About dbt models: dbt’s guide to creating SQL or Python model transforms
dbt debugging: guide for debugging issues with dbt
The missing guide to debug() in dbt: detailed advice for debugging issues with dbt