Skip to main content

Request for Impact Metrics

· 10 min read
Carl Cervone
Co-Founder

Over the past few months, we've been hard at work creating the infrastructure to collect and analyze impact metrics for open source projects. We're excited to announce that we're ready to start doing some new analysis on data from open source projects ... and we need your help!

This post includes some of the domains we're interested in exploring as well as an initial list of impact metrics we'd like to collect. We're looking for feedback on these metrics and suggestions for additional metrics we should consider. We're also looking for contributors to help us apply these impact metrics.

Get Involved

If you'd like to get involved, here's what to do:

  1. Apply to join the Data Collective. It only takes a few minutes. We'll review, and reach out to schedule an onboarding call.
  2. Join our Discord server and say hello.
  3. Get inspiration from our Colab directory of starter notebooks for impact metrics. We also have of them in our Insights repo if you prefer to run them locally.
  4. Consult our docs and especially our impact metric spec as you prepare your analysis.

The rest of this post includes more details on the types of impact metrics we're interested in.

Areas of Interest

At a high level, we're interested in the following types of impact metrics being applied to the OSO dataset:

  • Project Health: Metrics that capture the momentum and activity of a project's contributor community and codebase.
  • Onchain Reputation: Metrics that attempt to differentiate onchain users based on behavior and trust scores.
  • Contributor Reputation: Metrics that attempt to differentiate code contributors based on behavior and PageRank-style scoring.
  • Ecosystem Impact: Metrics that capture the impact of a project on the broader health of an open source ecosystem, ie, as a "talent feeder" or "dependency" for other projects.
  • Project Metadata: Metrics that capture the licensing and open sourciness of a project.
  • Domain-specific Impact: Metrics that are relevant only to a specific domain, such as "defi" or "consumer", for onchain projects.

Metrics by Category

Here's a working list of impact metrics we're interested in collecting, including links or ideas for how to calculate them. We've flagged a few that are good first issues for contributors to work on.

Project Health

Metrics that capture the momentum and activity of a project's contributor community.

The inspiration for these metrics comes from the CHAOSS Community, which has cataloged a large suite of metrics for open source projects but is not prescriptive in how they should be calculated. We're interested in applying these same metrics to the OSO dataset and experimenting with different calculation and filtering approaches. A number of these are "good first issues" for contributors to work on because the data is ready to go and they are well-documented on the CHAOSS website. Have a look at Bus Factor and Burstiness for inspiration.

Impact MetricDescriptionComments
Bus FactorHow high is the risk to a project should the most active people leave?Good first issue
BurstinessHow are short timeframes of intense activity, followed by a corresponding return to a typical pattern of activity, observed in a project?Good first issue
VelocityWhat is the development speed for an organization?Good first issue
Self-merge ratesWhat is the rate of self-merges in a project?Requires more indexing of PR threads
Issue Response TimeHow much time passes between the opening of an issue and a response in the issue thread from another contributor?Requires more indexing of issue threads
tip

Check out our starter notebooks for impact metrics including Bus Factor and Velocity. The current Bus Factor for Open Source Observer is 0.36 - a good deal lower than the average in our dataset (0.60), but not as low as Ethereum / Protocol Guild (0.05).

Onchain Reputation

Metrics that attempt to differentiate onchain users based on behavior and trust scores.

This is a hot topic right now, with a number of projects attempting to create reputation systems for onchain users. We're integrating with many of the leading projects and bringing the data they generate into the OSO data warehouse. From there, there are all sorts of directions you can take the analysis!

The Onchain Activity metric is a good starting point for new analysts. Have a look at our working definitions for onchain users and help us create some better ones. The Network Loyalty is a spicy metric that will get even spicier as we add more networks to the OSO dataset!

Impact MetricDescriptionComments
Onchain ActivityHow active is the user onchain?Good first issue
Network LoyaltyHow much of a user's activity is on a single network?Good first issue
User SegmentWhat kind of dapps is the user most interested in?Good first issue
ENS HistoryDoes the user have a rich profile on ENS?Requires indexing of ENS data
Gitcoin PassportDoes the user have a high Gitcoin passport score?Requires indexing of Gitcoin Passport data

When working with these metrics, because they deal with LOTS of transactions, it may be helpful to grab a bunch of data in a single query request and then do further analysis in a notebook. For example, this query will grab each user's transactions with each project on each network in the last 90 days:

-- Get the number of txns by user / project / network / day
SELECT
project_id,
from_name AS user_address,
to_namespace AS network,
time,
SUM(amount) AS txns
FROM `oso.int_events_to_project`
WHERE time >= DATE_SUB(CURRENT_TIMESTAMP(), INTERVAL 90 DAY)
AND event_type = 'CONTRACT_INVOCATION_DAILY_COUNT'
GROUP BY 1,2,3,4
tip

Check out our starter notebooks for impact metrics including onchain activity and network loyalty.

Contributor Reputation

Metrics that attempt to differentiate code contributors based on behavior and PageRank-style scoring.

These metrics are useful for measuring impact within a project and for measuring the influence of a contributor across projects. For inspiration, check out the CHAOSS Community and SourceCred.

Impact MetricDescriptionComments
Contributor ConcentrationHow much of a contributor's activity is on a single project?Good first issue
Conversion RateWhat are the rates at which new contributors become more sustained contributors?Good first issue
Maintainer ScoreWhat characteristics of a contributor's activity make them a reliable maintainer?Good first issue
PageRankWhat is the PageRank of a contributor? Choose your own implementation!Good first issue
MVPRWhat is the most valuable pull request?Requires more indexing of related GitHub event data

The Contributor Concentration is a good first issue for new analysts, as it's less open-ended and can take advantage of the built-in developer categories included in the int_devs table:

-- Get the number of months each developer has been active on each project
SELECT
project_id,
from_id AS user_id,
SUM(CASE WHEN user_segment_type = 'FULL_TIME_DEV' THEN amount END) AS full_time_dev_months,
SUM(CASE WHEN user_segment_type = 'PART_TIME_DEV' THEN amount END) AS part_time_dev_months,
SUM(CASE WHEN user_segment_type = 'OTHER_CONTRIBUTOR' THEN amount END) AS other_contributor_months
FROM `oso.int_devs`
GROUP BY 1,2

The PageRank metric is another fun rabbithole to go down. For inspiration, check out some of the work we've done previously in our Insights repo and the pioneering work of SourceCred.

tip

Check out our starter notebooks for impact metrics including contributor concentration and page rank.

Ecosystem Impact

Metrics that capture the impact of a project on the broader health of an open source ecosystem, ie, as a "talent feeder" or "dependency" for other projects.

These metrics zoom out and seek to understand a project's influence within a broader open source ecosystem. One angle to consider is the role of "feeder" projects, ie, projects where users and/or contributors first enter the ecosystem and then go on to contribute to or use other projects. Another angle is the role of "dependency" projects, ie, projects that are imported by other projects in their dependency tree. As of now, we don't have our dependency data indexed, but we're working on it!

Impact MetricDescriptionComments
Contributor FeederHow many contributors has the project onboarded that go on to contribute to other projects?Good first issue
User FeederHow many onchain users has the project onboarded that go on to use other projects?Good first issue
Elephant FactorHow many projects depend on this project?Requires indexing dependency data
Downstream FeesHow much in sequencer fees are contributed by projects that are dependents of this project?Requires indexing dependency data
Downstream UsersHow many onchain users are using this project as a dependency?Requires indexing dependency data
Downstream DevelopersHow many developers are using this project as a dependency?Requires indexing dependency data
Network CentralityHow central is this project to the network?Requires indexing dependency data

The contributor and user "feeder" metrics are good first issues for new analysts.

tip

Check out our starter notebooks for contributor and user "feeders".

Project Metadata

Metrics that capture relevant metadata about the licensing and open sourciness of a project.

Project metadata metrics consider the licensing practices and the open source nature of projects. By examining License Coverage, OSI Approved Licenses, and Drips Access, we gain insights into a project's commitment to open source principles and its accessibility to the community. Drips is an amazing new way of easily streaming money to any open source project if they have a funding.json file in their repo.

Impact MetricDescriptionComments
License CoverageHow much of the code base has declared licenses?Requires more indexing of repo metadata
OSI Approved LicensesWhat percentage of a project’s licenses are OSI approved open source licenses?Requires more indexing of repo metadata
Drips AccessDoes the project have a funding.json file?Requires more indexing of repo metadata

Domain-specific Impact

Metrics that are relevant only to a specific domain, such as "DeFi" or "consumer", for onchain projects.

The first step in this process is to define a domain and create a list of projects that are relevant to that domain. For example:

-- Sample of DeFi projects
SELECT *
FROM `oso.onchain_metrics_by_project_v1`
WHERE project_name IN ('Uniswap', '1inch', 'Sushi')

From there, any of the above metrics (as well as our existing onchain impact metrics) can be applied to the projects in that domain.

This is a good first issue for people with domain expertise who are able to curate a list of projects relevant to a specific domain.

Submitting Your Impact Metrics

If you've created a new impact metric, we'd love to see it! We'll implement the best ones as new data models and make them available to everyone in the community.

You can submit your impact metric to the OSO core team by posting it in our Discussions forum. Contributions should include the following:

  • Name of impact metric
  • Brief description of your calculation steps
  • Link to your code (eg, Colab or Jupyter notebook)
  • Chart or table showing the metric applied to a set of projects

If you run into issues or have questions, let us know on Discord.

We're excited to see what you come up with!