Skip to main content

Access via BigQuery

OSO uses BigQuery as one means of sharing large datasets with the community. This page will guide you through the process of subscribing to the OSO production dataset and making your first query.

warning

BigQuery should only be used if you want need direct access to source data or mart models for your own data pipeline or or integration into third-party tools.

  • If you are a data scientist and you want to run your own analysis, we recommend using pyoso.
  • If you are an application developer and you want to integrate OSO data into your application, we recommend using the API.

Sign up for Google Cloud

Navigate to Google Cloud and log in. If this is your first time here, you can sign up for a free cloud account using your existing Google account. If you already have a GCP account, skip to the dataset.

GCP Signup

First, select a country and agree to the terms of service. Then, you need to enter your payment information for verification and answer a few marketing questions.

GCP Billing

tip

GCP offers a free tier that includes $300 in credits. After that, it is easy to stay in the free tier provided you remain under the 1TB per month limit for BigQuery data processed.

Finally, you will be brought to the admin console where you can create a new project. Feel free to name this GCP project anything you'd like. (Or you can simply leave the default project name 'My First Project'.)

Note: you won't be able to create a new project if you're not an administrator of your Google organization

GCP Create

Subscribe to the OSO production dataset

Go to the BigQuery Console. Navigate to BigQuery from the left-hand menu and then click on BigQuery Studio from the hover menu.

GCP Admin

The console features an Explorer frame on the left-hand side, which lists all the datasets available to you, and a Studio Console which has tabs for organizing your work. This will be your workspace for querying the OSO dataset.

GCP Welcome

Click on the following link to subscribe to the OSO production dataset:

Subscribe on BigQuery

Create a linked dataset in your own GCP project.

link dataset

Make your first query

Open a new tab by clicking on the + icon on the top right of the console to Create SQL Query.

From here you will be able to write any SQL you'd like to any OSO dataset. For example, you can query the oso_production dataset for all available collections like this:

SELECT *
FROM `YOUR_PROJECT_NAME.oso_production.projects_v1`

Remember to update the project name in the query.

Click Run to execute your query. The results will appear in a table at the bottom of the console.

GCP Query

The console will help you complete your query as you type, and will also provide you with a preview of the results and computation time. You can save your queries, download the results, and even make simple visualizations directly from the console.

tip

oso_production contains all production data. This can be quite large depending on the dataset.

To explore all the OSO datasets available, see the Data Overview.