Skip to content

Instantly share code, notes, and snippets.

@marianogappa
Created September 4, 2018 01:55
Show Gist options
  • Save marianogappa/f00ff9d378697f6961e8031b263a1586 to your computer and use it in GitHub Desktop.
Save marianogappa/f00ff9d378697f6961e8031b263a1586 to your computer and use it in GitHub Desktop.
Big Query SQL to get how many go files are main package vs non-main package on Github
SELECT
IF(package_line = 'main', 'main', 'non_main') AS is_main, COUNT(1) AS count, LEFT(GROUP_CONCAT(UNIQUE(package_line)), 100) AS package_names
FROM (
SELECT
REGEXP_EXTRACT(content, r'package\s+([^\s]+)\s') AS package_line
FROM
(
SELECT
id,
content
FROM
[bigquery-public-data:github_repos.sample_contents]
WHERE
REGEXP_MATCH(content, r'package\s+[^\s]+\s')
) AS C
JOIN (
SELECT
id
FROM
[bigquery-public-data:github_repos.sample_files]
WHERE
path LIKE '%.go' AND path NOT LIKE '%vendor/%'
GROUP BY
id
) AS F
ON C.id = F.id
) lines
GROUP BY is_main
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment