Skip to main content

Dagster Guide

OSO uses Dagster to perform all data orchestration in our backend infrastructure. Most of the time, contributors will be working in Dagster to connect new data sources. If the source data already exists in OSO, take a look at contributing models via sqlmesh.

In order to setup Dagster in your development environment, check out the getting started guide.

Data Architecture

As much as possible, we store data in an Iceberg cluster, making it the center of gravity of the OSO data lake. Thus, data ingest will typically go into Iceberg tables. pyoso is the best way to query any data in these tables. This will make it significantly easier to decentralize OSO infrastructure in the future.

Job schedules

All automated job schedules can be found on our public Dagster dashboard.

Currently, our main data pipeline runs once per week on Sundays.

Alert system

Dagster alert sensors are configured in warehouse/oso_dagster/factories/alerts.py

Right now, alerts are reported to #alerts in the OSO Discord server.

Secrets Management

When you are creating new data sources for OSO, you may need to handle secrets (e.g. passwords, access keys, DB connection strings).

warning

DO NOT CHECK SECRETS INTO THE REPOSITORY!

Instead, please use the OSO SecretResolver to properly handle your secrets.

Local secrets

While you are developing your code, the right place to store secrets is in your root .env file. The OSO SecretResolver organizes secrets by (prefix, group, key). Dagster will automatically load secrets from your environment by the following convention PREFIX__GROUP__KEY. By default, all secrets in Dagster use the prefix, dagster.

For example, we store all Clickhouse secrets under the clickhouse group. Thus, these are the environment variables we'd set for Clickhouse:

DAGSTER__CLICKHOUSE__HOST=
DAGSTER__CLICKHOUSE__USER=
DAGSTER__CLICKHOUSE__PASSWORD=

You can reference a secret using SecretReference and resolve it using SecretResolver.

from ..utils import SecretReference, SecretResolver

password_ref = SecretReference(group_name="clickhouse", key="password")
password = secret_resolver.resolve_as_str(password_ref)

In order to get a reference to the SecretResolver, you'll want to accept it as a Dagster resource. You can see definitions.py and clickhouse.py as an example.

Production secrets

When you are ready to run your assets in production, please reach out to the core OSO team on Discord. We will arrange a secure way to share your secrets into our production keystore.