Skip to content

Instantly share code, notes, and snippets.

@jimrutherford
Created July 3, 2012 22:39
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jimrutherford/3043884 to your computer and use it in GitHub Desktop.
Save jimrutherford/3043884 to your computer and use it in GitHub Desktop.
Generate a bunch of JSON files from GithubArchive
#! /bin/sh
# requires that BigQuery CLI tools are installed
# http://code.google.com/p/google-bigquery-tools/
echo Running Query to grab most popular languages
bq --format=csv --quiet query 'SELECT TOP(repository_language, 10), count(*) FROM [githubarchive:github.timeline]' > tmpLanguages.csv
echo Removing top two lines
awk '{if (NR>2) {print}}' tmpLanguages.csv > ttmpLanguages.csv
echo Extracting languages
awk -F "\"*,\"*" '{print $1}' ttmpLanguages.csv > tttmpLanguages.csv
languages=( `cat tttmpLanguages.csv | tr '\n' ' '` )
for language in ${languages[@]}
do
echo "Running query for $language"
bq --format=prettyjson query "SELECT repository_name, count(repository_name) as pushes, repository_description, repository_url FROM [githubarchive:github.timeline] WHERE type='PushEvent' AND repository_language='$language' GROUP BY repository_name, repository_description, repository_url ORDER BY pushes DESC LIMIT 20" > $language.json
done
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment