Skip to main content

dbt Setup

We use dbt to build our data warehouse. You can view every model on OSO here: https://models.opensource.observer.

This guide walks you through setting up dbt (Data Build Tool) for OSO development.

Prerequisites

  • Python >=3.11
  • Python Poetry >= 1.8
  • git
  • A GitHub account
  • BigQuery access
  • gcloud CLI

Installing gcloud CLI

For macOS users:

brew install --cask google-cloud-sdk

For other platforms, follow the official instructions.

Installation

  1. Follow the installation instructions in our monorepo README.

  2. Enter the poetry environment:

poetry shell
  1. Verify dbt is installed:
which dbt
  1. Authenticate with gcloud:
gcloud auth application-default login
  1. Run the setup wizard:
poetry install && poetry run oso_lets_go
tip

The wizard will create a GCP project and BigQuery dataset if needed, copy a subset of OSO data for development, and configure your dbt profile.

Configuration

dbt Profile Setup

Create or edit ~/.dbt/profiles.yml:

opensource_observer:
outputs:
production:
type: bigquery
dataset: oso
job_execution_time_seconds: 300
job_retries: 1
location: US
method: oauth
project: opensource-observer
threads: 32
playground:
type: bigquery
dataset: oso_playground
job_execution_time_seconds: 300
job_retries: 1
location: US
method: oauth
project: opensource-observer
threads: 32
target: playground

VS Code Setup

  1. Install the Power User for dbt core extension

  2. Get your poetry environment path:

poetry env info --path
  1. In VS Code:
    • Open command palette
    • Select "Python: select interpreter"
    • Choose "Enter interpreter path..."
    • Enter the poetry path

Running dbt

Basic usage:

dbt run

Target specific model:

dbt run --select {model_name}
tip

By default, this writes to the opensource-observer.oso_playground dataset.

For more details on working with dbt models, see our Data Models Guide.