dbt Setup
We use dbt to build our data warehouse. You can view every model on OSO here: https://models.opensource.observer.
This guide walks you through setting up dbt (Data Build Tool) for OSO development.
Prerequisites
- Python >=3.11
- Python Poetry >= 1.8
- git
- A GitHub account
- BigQuery access
gcloud
CLI
Installing gcloud CLI
For macOS users:
brew install --cask google-cloud-sdk
For other platforms, follow the official instructions.
Installation
-
Follow the installation instructions in our monorepo README.
-
Enter the poetry environment:
poetry shell
- Verify dbt is installed:
which dbt
- Authenticate with gcloud:
gcloud auth application-default login
- Run the setup wizard:
poetry install && poetry run oso_lets_go
The wizard will create a GCP project and BigQuery dataset if needed, copy a subset of OSO data for development, and configure your dbt profile.
Configuration
dbt Profile Setup
Create or edit ~/.dbt/profiles.yml
:
opensource_observer:
outputs:
production:
type: bigquery
dataset: oso
job_execution_time_seconds: 300
job_retries: 1
location: US
method: oauth
project: opensource-observer
threads: 32
playground:
type: bigquery
dataset: oso_playground
job_execution_time_seconds: 300
job_retries: 1
location: US
method: oauth
project: opensource-observer
threads: 32
target: playground
VS Code Setup
-
Install the Power User for dbt core extension
-
Get your poetry environment path:
poetry env info --path
- In VS Code:
- Open command palette
- Select "Python: select interpreter"
- Choose "Enter interpreter path..."
- Enter the poetry path
Running dbt
Basic usage:
dbt run
Target specific model:
dbt run --select {model_name}
By default, this writes to the opensource-observer.oso_playground
dataset.
For more details on working with dbt models, see our Data Models Guide.