Subscribe to Source Datasets on BigQuery

OSO curates a variety of public datasets from the open source ecosystem. Check out our BigQuery data exchange to explore available data. If you don't have a BigQuery account, start here.

Note: If you want to explore data and metrics computed in the OSO data pipeline, we recommend that you use pyoso instead.

Here are some of our featured public datasets:

Updated weekly

OSO Production Pipeline

Subscribe on BigQuery

License: CC BY-SA 4.0

These are the final product from the data pipeline, which is served from our API.

For example, you can get a list of oss-directory projects

SELECT
  project_id,
  project_name,
  display_name,
  description
FROM `YOUR_PROJECT_NAME.oso_production.projects_v1` LIMIT 10

or all contracts on Base deployed on 2025-01-01.

SELECT count(*) as cnt
FROM `YOUR_PROJECT_NAME.oso_production.contracts_v0`
WHERE contract_namespace = 'BASE' AND deployment_date = '2025-01-01'

Note: Unless the model name is versioned with a v1 or higher, expect that the model is unstable and should not be depended on in a live production application.

Some of our most popular v1 models include:

projects_v1: a list of all projects in the OSO data pipeline
artifacts_by_project_v1: a list of all artifacts owned by a given project
projects_by_collection_v1: a list of all projects in a given collection

Some of the v0 (WIP) models that we include in analysis but may be a bit unstable:

contracts_v0: a list of all discovered contracts, downstream from their deployers and/or factories
metrics_v0: a list of all metrics available in OSO
timeseries_metrics_by_project_v0: a list of all metrics for a given project

External Datasets

GitHub Archive

View on BigQuery

Code License: MIT.
Data governed by the GitHub terms of service.
Updated hourly

GitHub data is predominantly provided by the incredible GH Archive project, which publishes a public archive of historical events to GitHub.

For example, to count the number of issues opened, closed, and reopened on 2020/01/01:

SELECT event as issue_status, COUNT(*) as cnt FROM (
  SELECT type, repo.name, actor.login,
    JSON_EXTRACT(payload, '$.action') as event,
  FROM `githubarchive.day.20200101`
  WHERE type = 'IssuesEvent'
)
GROUP by issue_status;

Open Source Insights

View on BigQuery

Reference documentation
Data License: CC-BY 4.0

Open Source Insights (aka deps.dev) is a service developed and hosted by Google to help developers better understand the structure, construction, and security of open source software packages. The service examines each package, constructs a full, detailed graph of its dependencies and their properties, and makes the results available to anyone who could benefit from them. The goal is to provide developers with a picture of how their software is put together, how that changes as dependencies change, and what the consequences might be.

For example, to determine which packages have the most dependents:

DECLARE
  Sys STRING DEFAULT 'CARGO';

SELECT
  Name,
  Version,
FROM (
  SELECT
    Name,
    Version,
    ROW_NUMBER()
      OVER (PARTITION BY
              Name
            ORDER BY
              VersionInfo.Ordinal DESC) AS RowNumber
    FROM
      `bigquery-public-data.deps_dev_v1.PackageVersionsLatest`
    WHERE
      System = Sys
      AND VersionInfo.IsRelease)
WHERE RowNumber = 1;

Ethereum

View on BigQuery

Code License: MIT.

The Google Cloud team maintains a public Ethereum dataset. This is backed by the ethereum-etl project.

For example, to get 10 transactions from the latest block

SELECT
  `hash`,
  block_number,
  from_address,
  to_address,
  value,
  gas,
  gas_price
FROM `bigquery-public-data.crypto_ethereum.transactions` AS transactions
ORDER BY block_number DESC
LIMIT 10

Superchain

Subscribe on BigQuery

Data on every OP Stack chain, maintained by OP Labs PBC code of conduct
For more details on the ETL process please visit the op-analytics documentation site.
Visit the Optimism docs to learn more about the Superchain.

All tables are partitioned by date (dt) and clustered by chain name (chain). Data is updated daily with an approximate latency of 24 hrs. Data is sourced from Goldsky using the Mirror platform, which fetches and delivers onchain data from RPC nodes to ClickHouse. From there, the OP Labs team runs an incremental ingestion process based on block number.

For example, to get all transactions to the 4337 EntryPoint contracts for all OP Stack chains:

  SELECT
    dt,
    chain,
    COUNT(*) AS txns
  FROM `optimism_superchain_raw_onchain_data.transactions`
  WHERE
    dt >= '2024-10-01'
    AND to_address IN (
      "0x5ff137d4b0fdcd49dca30c7cf57e578a026d2789",
      "0x0000000071727de22e5e9d8baf0edac6f37da032"
    )
  GROUP BY dt, chain
  ORDER BY dt, chain

Superchain 4337

Subscribe on BigQuery

The Superchain 4337 dataset provides comprehensive data about ERC-4337 (Account Abstraction) activity across all OP Stack chains. This dataset is crucial for understanding the adoption and usage patterns of smart contract wallets, bundlers, and paymasters in the Superchain ecosystem. The data helps track which applications are leading the way in improving onchain UX through account abstraction.

All tables are partitioned by date (dt) and clustered by chain name (chain). The dataset includes decoded UserOps and Paymaster IDs to trace events from unique user smart contract wallet addresses to their wallet providers, paymasters, bundlers, and the applications they interact with.

For example, to see which paymasters are handling the most UserOperations across all chains:

SELECT
  chain,
  paymaster,
  APPROX_COUNT_DISTINCT(userophash) AS user_ops
FROM `YOUR_PROJECT_NAME.optimism_superchain_4337_account_abstraction_data.useroperationevent_logs_v2`
WHERE dt > '2025-02-01'
GROUP BY 1,2
ORDER BY 3 DESC

Arbitrum One

Subscribe on BigQuery

Arbitrum is a Layer 2 scaling solution for Ethereum. We currently provide blocks, transactions, and traces for Arbitrum's flagship network, Arbitrum One. This public dataset is maintained by our partners at Goldsky.

For example, to get the average gas per transaction for each block on Arbitrum One on September 1st, 2024:

SELECT
  block_number,
  receipt_gas_used / 1 AS gas_per_txn
FROM `YOUR_PROJECT_NAME.arbitrum_one.arbitrum_transactions`
WHERE block_timestamp BETWEEN 1693526400 AND 1693612800
ORDER BY block_number

Filecoin Lily

Subscribe on BigQuery

Filecoin is a decentralized storage network designed to store humanity's most important information. This dataset mirrors the dataset offered by Lily for use in the OSO data pipeline. It includes storage deals, miners, FVM transactions, and much more.

For example, this is how to get how many messages were sent to the network in the last month:

SELECT
  COUNT(*) AS total_messages
FROM `YOUR_PROJECT_NAME.filecoin_lily.parsed_messages`
WHERE
  CAST(timestamp_seconds((height * 30) + 1598306400) AS DATE) > CURRENT_DATE() - INTERVAL 1 MONTH

Farcaster

Subscribe on BigQuery

Farcaster is a decentralized social network built on Ethereum. This dataset mirrors the dataset offered by Indexing for use in the OSO data pipeline. It includes casts, links, reactions, verifications, and profiles.

For example, to get the users with the most lifetime reactions:

SELECT
  r.target_cast_fid as fid,
  json_value(p.data, "$.display") as display_name,
  COUNT(*) as reaction_count
FROM `YOUR_PROJECT_NAME.farcaster.reactions` as r
LEFT JOIN `YOUR_PROJECT_NAME.farcaster.profiles` as p ON r.target_cast_fid = p.fid
GROUP BY fid, display_name
ORDER BY reaction_count DESC

Here's another use case, showing how to derive all the verified Ethereum addresses owned by a Farcaster user:

WITH
  profiles AS (
    SELECT
      v.fid,
      v.address,
      p.custody_address,
      JSON_VALUE(p.data, "$.username") AS username
    FROM `YOUR_PROJECT_NAME.farcaster.verifications` v
    JOIN `YOUR_PROJECT_NAME.farcaster.profiles` p ON v.fid = p.fid
    WHERE v.deleted_at IS NULL
  ),
  eth_addresses AS (
    SELECT
      fid,
      username,
      address
    FROM profiles
    WHERE LENGTH(address) = 42
    UNION ALL
    SELECT
      fid,
      username,
      custody_address AS address
    FROM profiles
  )
SELECT DISTINCT
  fid,
  username,
  address
FROM eth_addresses

Lens

View on BigQuery

Lens Protocol is an open social network. This dataset is well-documented by Lens. It includes data from Lens Chain Mainnet.

SELECT 
    address
FROM `lens-chain-mainnet.public.addresses`
LIMIT 5

Note: the Lens Polygon dataset that was previously available on BigQuery has been deprecated.

Gitcoin

Subscribe on BigQuery

Gitcoin is the hub of grantmaking in the Ethereum ecosystem. All project, round, and donation data from regendata.xyz is available in this dataset and updated daily.

You can also access the raw Gitcoin data directly on BigQuery:

SELECT 
  round_num,
  round_name,
  donor_address,
  amount_in_usd,
  recipient_address,
  project_name,
  project_id
FROM `YOUR_PROJECT_NAME.gitcoin.all_donations`

and:

SELECT 
  round_num,
  round_name,
  donor_address,
  amount_in_usd,
  recipient_address,
  project_name,
  project_id
FROM `YOUR_PROJECT_NAME.gitcoin.all_matching`

Open Collective

Subscribe on BigQuery

Open Collective is a platform for transparent finances and governance for open source projects.

The Open Collective datasets contains all transactions realized on the platform since its inception. Separate datasets are available for expenses and deposits.

For example, you can get the total amount of donations in USD made to the pandas project:

SELECT
  SUM(CAST(JSON_VALUE(amount, "$.value") AS FLOAT64)) AS total_amount
FROM
  `YOUR_PROJECT_NAME.open_collective.deposits`
WHERE
  JSON_VALUE(amount, "$.currency") = "USD"
  AND JSON_VALUE(to_account, "$.id") = "ov349mrw-gz75lpy9-975qa08d-jeybknox"

OP Atlas

Subscribe on BigQuery

OP Atlas contains data about projects contributing to the Optimism Collective and applying for Retro Funding, shepherded by the fine folks of Agora.

For example, you can get a list of all projects and their associated repository URLs:

SELECT
  p.name,
  p.description,
  r.url AS repo_url
FROM `YOUR_PROJECT_NAME.op_atlas.project` AS p
JOIN `YOUR_PROJECT_NAME.op_atlas.project_repository` AS r
  ON p.id = r.project_id

For more information, go to https://atlas.optimism.io/ or check out the OP Atlas GitHub.

OpenRank

Subscribe on BigQuery

OpenRank is a reputation protocol that enables verifiable compute for a large class of reputation algorithms, in particular those that (a) operate on a graph, (b) are iterative in nature and (c) have a tendency toward convergence. Such algorithms include EigenTrust, Collaborative Filtering, Hubs and Authorities, Latent Semantic Analysis, etc.

In this dataset, Farcaster users reputations are scored in 2 ways:

With globaltrust, we calculate global reputation scores, seeded by the trust of Optimism badgeholders.
With localtrust, you can get reputation scores of other users relative to specified user.

For example, you can get the globaltrust reputation score of vitalik.eth

SELECT
  strategy_id,
  i,
  v,
  date
FROM `YOUR_PROJECT_NAME.karma3.globaltrust`
WHERE i = 5650

Open Labels Initiative

Subscribe on BigQuery

A standardized framework and data model for EVM address labeling, maintained by growthepie.xyz.

For example, you can get project ownership labels for addresses:

SELECT
  address,
  chain_id,
  tag_value AS project_name
FROM `YOUR_PROJECT_NAME.openlabelsinitiative.labels_decoded`
WHERE tag_id = 'owner_project'

Ethereum Attestation Service

Subscribe on BigQuery

Ethereum Attestation Service (EAS) is an infrastructure public good for making attestations onchain or offchain about anything.

This dataset mirrors the dataset offered by EAS for use in the OSO data pipeline, currenty showing all attestations on the Optimism network.

For example, you can get a list of all * official * RetroPGF badgeholders:

SELECT
  recipient AS badgeholder,
  JSON_VALUE(decoded_data_json, "$[0].value.value") AS rpgfRound
FROM `YOUR_PROJECT_NAME.ethereum_attestation_service_optimism.attestations`
WHERE
  revoked = False
  AND schema_id = '0xfdcfdad2dbe7489e0ce56b260348b7f14e8365a8a325aef9834818c00d46b31b'
  AND attester = '0x621477dBA416E12df7FF0d48E14c4D20DC85D7D9'

1. Data exchange listings

For datasets listed on the OSO public data exchange, click on the "Subscribe on BigQuery" button to create a new dataset that is linked to OSO.

This has a few benefits:

Data is automatically kept live and real-time with OSO
You keep a reference to the data in your own GCP project
This gives OSO the ability to track public usage of models

2. Direct access to datasets

For datasets without a listing on the OSO public data exchange, we make the dataset open to public queries for direct queries. Click on the "View on BigQuery" button to go straight to the dataset.

You can star the dataset to keep it in your project.

star

Cost Estimation

BigQuery on-demand pricing charges based on the number of bytes scanned, with the first 1 TB free every month.

To keep track of your usage, check the bytes scanned in the top right corner before running your query.

cost estimate

BigQuery costs can rack up quickly if you are not careful optimizing your queries. For more on how you can optimize your queries, check out these guide:

Next steps

Once you've subscribed our datasets to your own Google project, you can start to run queries and analyses with a variety of tools.

OSO Production Pipeline

GitHub Archive

Open Source Insights

Ethereum ETL

Superchain

Superchain 4337

Arbitrum One

Filecoin Lily

Farcaster

Lens

Gitcoin

Open Collective

OP Atlas

OpenRank

EAS

Open Labels Initiative

OSO Production Pipeline​

External Datasets​

GitHub Archive​

Open Source Insights​

Ethereum​

Superchain​

Superchain 4337​

Arbitrum One​

Filecoin Lily​

Farcaster​

Lens​

Gitcoin​

Open Collective​

OP Atlas​

OpenRank​

Open Labels Initiative​

Ethereum Attestation Service​

Subscribe to a dataset​

1. Data exchange listings​

2. Direct access to datasets​

Cost Estimation​

Next steps​

OSO Production Pipeline

External Datasets

GitHub Archive

Open Source Insights

Ethereum

Superchain

Superchain 4337

Arbitrum One

Filecoin Lily

Farcaster

Lens

Gitcoin

Open Collective

OP Atlas

OpenRank

Open Labels Initiative

Ethereum Attestation Service

Subscribe to a dataset

1. Data exchange listings

2. Direct access to datasets

Cost Estimation

Next steps