Skip to content

Instantly share code, notes, and snippets.

@JustinBeckwith
Created July 27, 2016 21:35
Show Gist options
  • Save JustinBeckwith/a3f752610c193c4d6a2eb7ad3a92651f to your computer and use it in GitHub Desktop.
Save JustinBeckwith/a3f752610c193c4d6a2eb7ad3a92651f to your computer and use it in GitHub Desktop.
-- warning: this query looks at ~2 TB of data and will cost ~$10 to run.
SELECT
COUNT(*) as cnt, package
FROM
JS(
(SELECT content FROM [bigquery-public-data:github_repos.contents] WHERE id IN (
SELECT id FROM [bigquery-public-data:github_repos.files] WHERE RIGHT(path, 12) = "package.json"
)),
content,
"[{ name: 'package', type: 'string'}]",
"function(row, emit) {
try {
x = JSON.parse(row.content);
if (x.dependencies) {
Object.keys(x.dependencies).forEach(function(dep) {
emit({ package: dep });
});
}
} catch (e) {}
}"
)
GROUP BY package
ORDER BY cnt DESC
LIMIT 1000
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment