Skip to main content

Dagster Local Development

Dagster is a data orchestrator that allows you to define data pipelines in a declarative way. It is a powerful tool that allows you to define the flow of data from source to destination, and to define the transformations that data undergoes along the way.

At OSO, we use Dagster to process data from various sources, transform it, and load it into BigQuery. This quickstart guide will help you set up our Dagster instance locally, with a duckdb backend, in order to follow along with our tutorials in the next sections.

Check pipeline status

https://dagster.opensource.observer

Use this to view the entire data infrastructure, as well as the current status of every stage of the pipeline.

Admins can trigger runs here

Setting up Dagster

First, we need to clone the OSO GitHub repository and install the required dependencies.

git clone [email protected]:opensource-observer/oso.git .

Install the dependencies and create a virtual environment with poetry:

poetry install && poetry shell

Now, let's fill the .env file with the required environment variables:

GOOGLE_PROJECT_ID=<your-google-project-id>
DAGSTER_DBT_PARSE_PROJECT_ON_LOAD=1
DAGSTER_HOME=/tmp/dagster-home

After setting the environment variables, Dagster needs $DAGSTER_HOME to be created before running the Dagster instance.

mkdir /tmp/dagster-home
info

Lastly, we need to configure dagster.yaml to disable concurrency. Our example is located at /tmp/dagster-home/dagster.yaml:

This is currently a limitation with our duckdb integration. Please check out this issue for more information.

run_queue:
max_concurrent_runs: 1

Running Dagster

Now that we have everything set up, we can run the Dagster instance:

dagster dev

After a little bit of time, you should see the following message:

2024-09-10 22:35:31 +0200 - dagster.daemon - INFO - Instance is configured with the following daemons: ['AssetDaemon', 'BackfillDaemon', 'QueuedRunCoordinatorDaemon', 'SchedulerDaemon', 'SensorDaemon']
2024-09-10 22:35:31 +0200 - dagster-webserver - INFO - Serving dagster-webserver on http://127.0.0.1:3000 in process 1095

Head over to http://localhost:3000 to access Dagster's UI. Et voilà! You have successfully set up Dagster locally.

This is just the beginning. Check out how to create a DLT Dagster Asset next and start building!