Last active
February 6, 2017 16:47
-
-
Save bostelk/c9a9a820cdb251e3a25bbdadfd119364 to your computer and use it in GitHub Desktop.
Use BigQuery to find the Unity editor versions in use.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
SELECT | |
line AS version, | |
COUNT(*) AS count | |
FROM | |
FLATTEN( ( | |
SELECT | |
SPLIT(REGEXP_EXTRACT(content, r'm_EditorVersion: (.+)'), '\n') AS line, | |
FROM ( | |
SELECT | |
id, | |
content | |
FROM | |
[bigquery-public-data:github_repos.sample_contents] | |
WHERE | |
REGEXP_MATCH(content, r'm_EditorVersion: (.+)')) AS C | |
JOIN ( | |
SELECT | |
id | |
FROM | |
[bigquery-public-data:github_repos.sample_files] | |
WHERE | |
RIGHT(path, 34) = 'ProjectSettings/ProjectVersion.txt' | |
GROUP BY | |
id) AS F | |
ON | |
C.id = F.id), line) | |
GROUP BY | |
version | |
HAVING | |
version IS NOT NULL | |
ORDER BY | |
count DESC |
2 notes:
- [sample_contents] has only a sample of all [contents], so you can test your queries on smaller tables.
- identical files are de-duplicated, hence you'll need to account for all copies when counting:
Note for example that id='c4684cd58231b8d76c633a2bc45068594ed3961e' has 549 copies:
SELECT id, COUNT(*) copies
FROM [bigquery-public-data:github_repos.files]
WHERE RIGHT(path, 34) = 'ProjectSettings/ProjectVersion.txt'
GROUP BY id
ORDER BY copies DESC
This is one of those 549 files, all identical:
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Using the github_repos dataset, this query will return ~6k files but only ~220 contents. Is the contents table incomplete?