Here's an example of the kind of GraphQL query that Code.gov might submit to the GitHub GraphQL endpoint. It also explains why the switch to using GraphQL (in GitHub API version 4) makes things a ton easier than all the REST calls we were doing before (in GitHub API version 3).
Try this query out right now! This link will load the GitHub GraphQL Explorer with the non-annotated query ready to go.
What is this query?
This is a query I cooked up when first exploring GraphQL while doing some work on Code.gov.
What this query is for: Get specific metadata for several thousand repositories, across 30+ US Government organizations, without hitting GitHub API query limits.
Why was this a problem in GitHub API version 3?
Previously, trying to do this using v3 of the GitHub API involved several thousand REST HTTP calls. One call would get us a list of 100 repos. We'd then need 100 * X more HTTP calls, where X is the number of different kinds of metadata we need for each repo: one call for the repo languages, one call for the open pull requests, etc. And then we need code to glue it all together into a document which we feed into our ElasticSearch index. (Admittedly, if we were storing this data relationally, this splitting of metadata would make much more sense.)
Not only was it all deeply inefficient, but we butted up against GitHub's API request limits.
Why is this much easier in GitHub API version 4?
In version 4, the query method has changed from "make lots of little REST API requests" to "make fewer, bigger GraphQL requests".
Not only were we making many more REST API requests before, but they were of many different types, because we needed many different types of data. Now we can specify all the different types of data we need in one query. (We still need to run that query multiple times, because GitHub only lets us fetch 100 repos at a time.)
Here's the GitHub GraphQL API (v4) docs: https://developer.github.com/v4/
You can try this query out in the GitHub GraphQL Explorer: https://developer.github.com/v4/explorer/
This query provides a list of organization IDs, then asks for a list of repositories for each organization, with metadata about each.
One of the things I like about GraphQL is that the structure of the query describes the structure of the results object. However, while the returned structure format is JSON, note that GraphQL is not valid JSON. (e.g. lack of commas, comments are valid, etc.) I'll put more things to note inline.