Connect Pyoso to Google Sheets
This guide shows you how to set up a Google Sheets connector in the Open Source Observer (OSO) app and then use pyoso to query that sheet. We'll cover:
- Creating a service account and metadata sheet in Google Cloud.
- Configuring the Google Sheets connection in OSO.
- Querying your sheet from a Jupyter notebook using pyoso.
Prerequisites​
- An active OSO account with access to the Organization & Connections page.
- A Google Cloud project where you can create a service account.
- A Google Sheet to serve as the metadata index and one or more data sheets.
- Python (Jupyter or your environment of choice) with
pyoso
installed.
Step 1: Create a Service Account​
- Go to your Google Cloud Console and select (or create) the project you'll use.
- Navigate to IAM & Admin → Service Accounts.
- Click Create Service Account, give it any name (no special permissions required), and finish.
- After creating the service account, click Actions → Manage Keys → Add Key → Create New Key (JSON).
- Download and save the JSON key file; you'll Base64-encode this in a later step.
Step 2: Create a Metadata Sheet​
-
In Google Sheets, create a new spreadsheet that will serve as the "metadata sheet."
-
The metadata sheet must have at least these columns (in row 1):
- table_name
- sheet_id_and_tab
- owner
- notes (optional)
-
Populate row 2 (and beyond) with entries like:
table_name sheet_id_and_tab owner notes my_test_sheet
1A2B3C4D5E6F7G8H9I#Sheet1
your-email@… "Demo metadata" - table_name: How the sheet will appear when you query via SQL (e.g.,
my_test_sheet
). - sheet_id_and_tab: The spreadsheet’s ID (from its URL) followed by a
#
and the exact tab name. - owner: Your email (or the service account’s email).
- table_name: How the sheet will appear when you query via SQL (e.g.,
-
Share this metadata sheet with your service account’s email (the one you just created). Grant at least Viewer access.
-
You will also need to share the spreadsheet you want to query with the service account email.
Step 3: Configure the Google Sheets Connector in OSO​
- In the OSO web app, go to Organization → Connections.
- Click Add Connection and choose Google Sheets.
- Fill out the form:
- Name: Any unique, descriptive name (e.g.,
my-gsheets-connector
). - Service Account Key (Base64):
- Open the JSON key file you downloaded in Step 1.
- Base64-encode its contents (for example, run
base64 service-account.json
). - Copy the Base64 string and paste it into this field.
- Metadata Sheet ID and Tab: Copy your metadata sheet’s ID and tab name (e.g.,
1A2B3C4D5E6F7G8H9I#Sheet1
).
- Name: Any unique, descriptive name (e.g.,
- Click Submit. If everything is configured correctly, OSO will register your Google Sheets connector and create a schema named
[your_org_name]_gsheets.default
.
Note: The schema name will be based on your organization name. For example, if your organization is named "acme", the schema would be
acme_gsheets.default
. Replace[your_org_name]
in the queries below with your actual organization name.
Step 4: Verify in Pyoso​
Below is a sample Jupyter notebook workflow to confirm your sheet is visible and to join it with OSO data. Make sure you have your OSO_API_KEY
set in a .env
file or environment variable.
# Install dependencies (if you haven't already)
# pip install pyoso python-dotenv pandas
from dotenv import load_dotenv
import os
import pandas as pd
from pyoso import Client
# Load your OSO_API_KEY from a .env file or environment
load_dotenv()
OSO_API_KEY = os.environ["OSO_API_KEY"]
client = Client(api_key=OSO_API_KEY)
Check That the Sheet Is Present​
Use a simple SELECT
to preview up to 5 rows from your metadata-indexed sheet. Replace my_test_sheet
with your table_name
from the metadata sheet and [your_org_name]
with your organization name.
df_preview = client.to_pandas("""
SELECT *
FROM [your_org_name]_gsheets.default.my_test_sheet
LIMIT 5
""")
df_preview
Join the Sheet Data with OSO Projects​
This query will return rows where the oso_slug
from your Google Sheet matches a project_name
in OSO’s projects_v1
table.
df_projects = client.to_pandas("""
SELECT
p.project_id,
sheet.oso_slug,
sheet.project_name AS application_name,
sheet.season,
sheet.op_total_amount
FROM [your_org_name]_gsheets.default.my_test_sheet AS sheet
JOIN oso.projects_v1 p
ON sheet.oso_slug = p.project_name
WHERE sheet.oso_slug IS NOT NULL
""")
df_projects.tail()
That query will return rows where the oso_slug
from your Google Sheet matches a project_name
in OSO’s projects_v1
table. Example output:
project_id | oso_slug | application_name | some_value | |
---|---|---|---|---|
395 | abc123XYZ456ghi789JKLmnOPqrsTTuvWXy0123 | example-project | Example Project Name | DemoValue |
396 | def456ABC123stu789VWXyzLMNoPQRjkl345 | another-project | Another Project Example | DemoValue2 |
397 | ghi789DEF456klm123OPQrst456UVWxYZ789 | sample-project | Sample Project for Testing | DemoValue3 |
398 | jkl012GHI789nop345RSTuvw012XYZabc678 | test-project | Test Project in OSO | DemoValue4 |
399 | mno345JKL012qrs678TUVwxy901ABCdef234 | pilot-project | Pilot Project Data | DemoValue5 |
(The actual values will depend on your sheet’s contents and matching OSO projects.)
Pull Key Metrics for Those Projects​
If you want to enrich your sheet data with metrics from OSO, you can join to key_metrics_by_project_v0
. For example, assuming your sheet again has oso_slug
values:
df_joined = client.to_pandas("""
SELECT
p.project_id,
sheet.oso_slug,
sheet.project_name AS application_name,
sheet.another_column AS some_value,
m.display_name,
km.amount
FROM [your_org_name]_gsheets.default.my_test_sheet AS sheet
JOIN oso.projects_v1 AS p
ON sheet.oso_slug = p.project_name
JOIN oso.key_metrics_by_project_v0 AS km
ON p.project_id = km.project_id
JOIN oso.metrics_v0 AS m
ON km.metric_id = m.metric_id
WHERE
sheet.oso_slug IS NOT NULL
AND m.display_name IN (
'Active Contracts',
'Repositories',
'Contributors',
'Commits',
'Contract Invocations',
'Gas Fees'
)
""")
df_joined
The result will list each matching project with its key metrics. Example:
project_id | oso_slug | application_name | some_value | display_name | amount | |
---|---|---|---|---|---|---|
0 | abc123XYZ456ghi789JKLmnOPqrsTTuvWXy0123 | example-project | Example Project Name | DemoValue | Commits | 12345.0 |
1 | abc123XYZ456ghi789JKLmnOPqrsTTuvWXy0123 | example-project | Example Project Name | DemoValue | Contributors | 200.0 |
2 | def456ABC123stu789VWXyzLMNoPQRjkl345 | another-project | Another Project Example | DemoValue2 | Active Contracts | 10.0 |
3 | def456ABC123stu789VWXyzLMNoPQRjkl345 | another-project | Another Project Example | DemoValue2 | Repositories | 5.0 |
4 | ghi789DEF456klm123OPQrst456UVWxYZ789 | sample-project | Sample Project for … | DemoValue3 | Gas Fees | 42.5 |
… | … | … | … | … | … | … |
Pivoting Metrics​
To pivot this DataFrame so each project is one row and each metric gets its own column:
df_pivot = df_joined.pivot_table(
index=['project_id', 'oso_slug', 'application_name', 'some_value'],
columns=['display_name'],
values='amount',
aggfunc='sum',
fill_value=0
)
# Show top 5 by descending Gas Fees (if present)
df_pivot.sort_values(by='Gas Fees', ascending=False).head(5)
Example pivoted output:
project_id | oso_slug | application_name | some_value | Active Contracts | Commits | Contributors | Gas Fees | Repositories |
---|---|---|---|---|---|---|---|---|
abc123XYZ456ghi789JKLmnOPqrsTTuvWXy0123 | example-project | Example Project Name | DemoValue | 15 | 12,345.0 | 200 | 0 | 8 |
def456ABC123stu789VWXyzLMNoPQRjkl345 | another-project | Another Project Example | DemoValue2 | 10 | 0 | 50 | 42.5 | 5 |
ghi789DEF456klm123OPQrst456UVWxYZ789 | sample-project | Sample Project for … | DemoValue3 | 5 | 1,234.0 | 20 | 10.0 | 2 |
jkl012GHI789nop345RSTuvw012XYZabc678 | test-project | Test Project in OSO | DemoValue4 | 0 | 0 | 5 | 0 | 1 |
mno345JKL012qrs678TUVwxy901ABCdef234 | pilot-project | Pilot Project Data | DemoValue5 | 2 | 500.0 | 10 | 5.0 | 3 |
(Actual values will align with your sheet and OSO metrics.)
Step 5: Make Updates to the Sheet​
You can make updates to the sheet and they will be reflected in pyoso. For example, if you add a new row to the sheet, it will be reflected in the df_projects
DataFrame. You can even change column names.
Updates to your spreadsheet might take up to 10 minutes to be reflected in pyoso.
Troubleshooting​
-
Permission Denied
- Ensure you shared both the metadata sheet and each data sheet with your service account email.
- Verify the Base64-encoded key matches the JSON key for that service account.
-
Connector Not Found
- Check that you used the exact sheet ID (not the full URL) when you created the metadata sheet entry.
- Confirm the
#TabName
suffix matches a tab that actually exists in the spreadsheet.
-
Empty Results
- Double-check the
table_name
in your metadata sheet matches your SQLFROM
clause (including capitalization). - Run
SELECT * FROM [your_org_name]_gsheets.default.__TABLES__
to list all available sheets.
- Double-check the
Next Steps​
- Explore other pyoso tutorials to learn more query patterns.
- See the official Trino docs on Google Sheets connector for advanced settings:
Trino: Google Sheets Connector.
That's it! You’ve now connected a Google Sheet to OSO and pulled its data into Python via pyoso. Feel free to customize queries, join with other OSO tables, and build dashboards on top of your sheet data.