Skip to content

Instantly share code, notes, and snippets.

@george-hawkins
Last active January 27, 2024 15:05
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save george-hawkins/035a3cca67698316c12d3071684d9928 to your computer and use it in GitHub Desktop.
Save george-hawkins/035a3cca67698316c12d3071684d9928 to your computer and use it in GitHub Desktop.
Forcing GitHub to reindex your repos

Forcing GitHub to reindex your repos

Apparently, GitHub stops indexing repos that have not had a commit for over a year. They say this is to ensure that search limits itself to providing the most relevant results.

But this is a pain if you know your own repos contain some particular information but GitHub turns up no search results because of this constraint.

A way to force a reindex of a repo is to search it - you may get no results but GitHub will put the repo in a queue to be reindexed and in about 5 minutes the same search may produce a result if the repo contains the relevant term.

However, if you don't know the relevant repo and want to search all your repos, it's somewhat inconvenient to have to manually search them all to force a reindex.

You can automate the procee the the GitHub CLI tool gh.

Install it as per the GitHub instructions.

Then run gh auth login.

I skipped the public key step as I'd already uploaded my public key to GitHub.

Then retrieve the names of all your repos (replace george-hawkins below with your GitHub username) and pipe that through a loop that triggers a search on each repo:

$ owner=george-hawkins
$ gh search repos --limit 512 --owner $owner --json fullName --jq '.[].fullName' | while read fullName
do
    echo "$fullName"
    gh search code foo --repo "$fullName" --json path > /dev/null
    sleep 30
done

The searched for term (foo above) can be anything - the important thing is to trigger a search. The --json path above is just to be nice - as we're not interested in the results, this just minimizes the amount of data that GitHub has to return. The sleep 30 is there because GitHub has an extremely low rate limit and you really will hit it if you go below about 20.

References:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment