Measuring Repo Community Health with GitHub’s API

I’m on record saying that GitHub is your Landing Page and when I think about companies having open source profiles, I think about how many developers will have the first contact with them on GitHub. If it’s a code example you’re looking for then like it or not, GitHub is considered a search engine by many developers.

With that in mind, I wanted to look at GitHub’s Community Health measure of the repositories I’m responsible for. You can view each repo’s community page separately through the web interface (look under “Insights”) but that’s not especially scalable if you have a lot of projects to track.

screeshot of the project's /community page, showing low completion

So for a more repeatable approach, I wanted to use the GitHub API. Frustratingly the community health metric isn’t in the collection endpoints, you need to fetch data per-repository to get it. Here’s the script that I use; it fetches the first 100 public repos (if you have more than 100, you’ll need to loop and fetch additional pages of results) of the given org or user. It prints all repos and number of stars, then for repos with more than 10 stars, it also fetches and output the community health measure.

The whole thing outputs some sort of pipe-separated-sort-of format … one day I will make my hacky code more perfect before I share it, but today is not that day. LibreOffice had no problem ingesting the result when stored in a text file, so I’m calling it “good enough”!

Anyway, here’s the script – you should set a GitHub access token as GITHUB_TOKEN and the org or user to use as GITHUB_ORG:

import json
import os
import requests

token = os.getenv("GITHUB_TOKEN")
org = os.getenv("GITHUB_ORG")

headers = {
    "Authorization": "token " + token,
    "Accept": "application/vnd.github.v3+json"
}

# all public repos
base_url = "https://api.github.com/"
repos_url = base_url + "orgs/" + org + "/repos?type=public&per_page=100"

repos_req = requests.get(repos_url, headers=headers)
repos_list = json.loads(repos_req.content)

# print header row
print("Project | Stars | Health")

i = 0
for r in repos_list:
    label = r['full_name'] + " | " + str(r['stargazers_count'])
    if r['stargazers_count'] >= 10:
        url = base_url + "repos/" + r['full_name'] + "/community/profile"
        req = requests.get(url, headers=headers)
        data = json.loads(req.content)

        print(label + "| " + str(data['health_percentage']) + "% health")
    else:
        print(label)
    i = i + 1

# if this is 100, it's time to build pagination
print(str(i) + " public repos in total")

I was surprised when I looked around that I couldn’t find an existing script for this, so I thought I had better share mine. That’s how the open source community works, after all! Tweaks, suggestions and additions are all welcome via the comments box, I’m happy to hear if this is useful and how you evolved it for your own needs.

Leave a Reply

Please use [code] and [/code] around any source code you wish to share.

This site uses Akismet to reduce spam. Learn how your comment data is processed.