MichaelCurrin/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Get GitHub Files


Get the metadata and content of all files in a given GitHub repo using the GraphQL API

You might want to get a tree summary of files in a repo without downloading the repo, or maybe you want to lookup the contents of a file again without download the whole repo.
The approach here is to query data from GitHub using the Github V4 GraphQL API.
About the query

Fields

In the sample GQL file in this gist, I included some useful attributes about files in a GitHub repo. The query can be modified to work with any repo you have read access to.

name

File or path name.


mode

Usually 16384 or 33188.


type

blob for text or binary files.
tree for a directory path.


text

This is the content of your file. For larger files, this field will of coruse make your JSON response very long.
From the schema: "UTF8 text data or null if the Blob is binary".
Includes \n for line breaks in text. Note your code might have "\n" in strings too.


isBinary

Useful if you want to separate file types or not try and count lines in a binary.
Binary might be images or compiled files.


Notes


Unfortunately I could not find summary values for number of files or a count of the number of lines, so you have to work those out yourself.
Regarding the expression value for object:

See expression or GitObject in the Object reference docs.

"A Git revision expression suitable for rev-parse".


Choose a commit reference and add a colon e.g. "HEAD:". You can use master or a commit ID instead.
You will only get objects at the repo root though, unless you use a nested query or choose a path. e.g. "master: docs/".
You can also use a nested query to get multiple level down, as in the second GQL file below. But I can't see a way to nest this recursively. And a Fragment doesn't let you nest in itself.


How to use the query

Explorer

Try the query out in the explorer.

Go to the explorer and sign in - V4 explorer
Paste the GQL query from get_github_files.gql to the main pane.
Paste the sample JSON from sample_params.json into the query variables pane.
Press the play/arrow button to run.

Command-line

Use curl, or a library in Python, Ruby, etc.
Here is a generic example from the GitHub docs. This as it is will fail though, as the auth token is missing. You must generate and pass an auth token for GraphQL. The REST API lets you make requests without an auth token (within limits).
$ curl https://api.github.com/graphql \
  -d '{ "query": "query { viewer { login } }" }' \
  -H "Authorization: bearer token" 
Sample output

After executing get_github_files.gql.

Simplified JSON output
{
  "entries": [
    {
      "name": ".gitignore",
      "type": "blob",
      "object": {
        "byteSize": 32,
        "text": "node_modules/\npackage-lock.json\n"
      }
    },
    {
      "name": ".vscode",
      "type": "tree",
      "object": {}
    },
    {
      "name": "CONTRIBUTING.md",
      "type": "blob",
      "object": {
        "byteSize": 1520,
        "text": "..."
      }
    }
  ]
}

Resources


Thanks to this gist by @johndevs, for getting me going with using the Tree and Blob structure.
See Intro to GraphQL on the GraphQL website.
GitHub GraphQL in my Dev Resources.


## get_github_files.gql
query RepoFiles($owner: String!, $name: String!) {
  repository(owner: $owner, name: $name) {
    object(expression: "HEAD:") {
      ... on Tree {
        entries {
          name
          type
          mode

          object {
            ... on Blob {
              byteSize
              text
              isBinary
            }
          }
        }
      }
    }
  }
}

## get_github_files_nested.gql
query RepoFiles($owner: String!, $name: String!) {
  repository(owner: $owner, name: $name) {
    object(expression: "HEAD:") {
      # Top-level.
      ... on Tree {
        entries {
          name
          type
          object {
            ... on Blob {
              byteSize
            }

            # One level down.
            ... on Tree {
              entries {
                name
                type
                object {
                  ... on Blob {
                    byteSize
                  }
                }
              }
            }
          }
        }
      }
    }
  }
}

## sample_params.json
{ "owner": "MichaelCurrin", "name": "python-twitter-guide" }
	query RepoFiles($owner: String!, $name: String!) {
	repository(owner: $owner, name: $name) {
	object(expression: "HEAD:") {
	... on Tree {
	entries {
	name
	type
	mode

	object {
	... on Blob {
	byteSize
	text
	isBinary
	}
	}
	}
	}
	}
	}
	}