This uses the github API to walk the file tree of a github repo.
To run:
GITHUB_TOKEN="XXX" python repotree.py
Algorithm:
- Get the file tree for a given repo
- Walk every file
- Look for files with the extension .md
- Use the requests library to download the text of these documents (base64)
- Decode the base64 into text
- Count the number of characters
Caveats:
- Beware the Github API rate limits.
- Rate limits are pinned to IP address, so have a way around that.