Skip to content

Instantly share code, notes, and snippets.

@sAbakumoff
Created January 30, 2017 06:40
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sAbakumoff/6c892cf07586a067ea416e692688f6be to your computer and use it in GitHub Desktop.
Save sAbakumoff/6c892cf07586a067ea416e692688f6be to your computer and use it in GitHub Desktop.
SELECT
files.id AS id,
FIRST(files.repo_name) AS repo,
REGEXP_EXTRACT(FIRST(files.ref), r"refs/heads/(.*)$") AS branch,
FIRST(files.path) AS path,
REGEXP_EXTRACT(FIRST(files.path), r'\.([^\.]*)$') AS ext,
FIRST(lines.line) AS line,
REGEXP_EXTRACT(FIRST(lines.line), r'stackoverflow.com/(?:q|questions)/([0-9]+)') question_id,
REGEXP_EXTRACT(FIRST(lines.line), r'stackoverflow.com/(?:a|answers)/([0-9]+)') answer_id,
STRING(FIRST(lines.pos)) AS line_number
FROM
[bigquery-public-data:github_repos.files] files
JOIN (
SELECT
line,
id,
POSITION(line) AS pos
FROM (
SELECT
id,
SPLIT(REPLACE(content, "\n", " \n"), "\n") AS line
FROM
[fh-bigquery:github_extracts.contents_top_repos_top_langs] )) lines
ON
lines.id = files.id
WHERE
lines.line LIKE '%stackoverflow.com%'
GROUP BY
id
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment