Dagster Guide
OSO uses Dagster to perform all data orchestration in our backend infrastructure. Most of the time, contributors will be working in Dagster to connect new data sources. If the source data already exists in OSO, take a look at contributing models via sqlmesh.
In order to setup Dagster in your development environment, check out the getting started guide.
Data Architecture
As much as possible, we store data in an Iceberg cluster,
making it the center of gravity of the OSO data lake.
Thus, data ingest will typically go into Iceberg tables.
pyoso
is the best way to query any data in these tables.
This will make it significantly easier to decentralize OSO infrastructure
in the future.
Job schedules
All automated job schedules can be found on our public Dagster dashboard.
Currently, our main data pipeline runs once per week on Sundays.
Alert system
Dagster alert sensors are configured in
warehouse/oso_dagster/factories/alerts.py
Right now, alerts are reported to #alerts
in the
OSO Discord server.
Secrets Management
When you are creating new data sources for OSO, you may need to handle secrets (e.g. passwords, access keys, DB connection strings).
DO NOT CHECK SECRETS INTO THE REPOSITORY!
Instead, please use the OSO SecretResolver
to properly handle your secrets.
Local secrets
While you are developing your code,
the right place to store secrets is in your root .env
file.
The OSO SecretResolver
organizes secrets by
(prefix, group, key)
.
Dagster will automatically load secrets from your environment by
the following convention PREFIX__GROUP__KEY
.
By default, all secrets in Dagster use the prefix, dagster
.
For example, we store all Clickhouse secrets under the clickhouse
group.
Thus, these are the environment variables we'd set for Clickhouse:
DAGSTER__CLICKHOUSE__HOST=
DAGSTER__CLICKHOUSE__USER=
DAGSTER__CLICKHOUSE__PASSWORD=
You can reference a secret using SecretReference
and resolve it using SecretResolver
.
from ..utils import SecretReference, SecretResolver
password_ref = SecretReference(group_name="clickhouse", key="password")
password = secret_resolver.resolve_as_str(password_ref)
In order to get a reference to the SecretResolver
,
you'll want to accept it as a Dagster resource.
You can see
definitions.py
and
clickhouse.py
as an example.
Production secrets
When you are ready to run your assets in production, please reach out to the core OSO team on Discord. We will arrange a secure way to share your secrets into our production keystore.