Skip to content

Instantly share code, notes, and snippets.

@jiagengliu
Forked from jennynz/gharchive_bq_example.sql
Created October 4, 2023 00:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jiagengliu/07b59b7638391b3254660298d803a2c7 to your computer and use it in GitHub Desktop.
Save jiagengliu/07b59b7638391b3254660298d803a2c7 to your computer and use it in GitHub Desktop.
Query for getting PR and review-related fields from GHArchive on BigQuery
SELECT
repo.name as repo,
type,
created_at,
actor.login,
JSON_VALUE(payload, '$.action') as action,
-- *** PR columns
JSON_VALUE(payload, '$.pull_request.node_id') as pr_node_id,
JSON_VALUE(payload, '$.pull_request.state') as pr_state,
JSON_VALUE(payload, '$.pull_request.user.login') as pr_user_login,
JSON_VALUE(payload, '$.pull_request.user.type') as pr_user_type,
JSON_VALUE(payload, '$.pull_request.number') as pr_number,
JSON_VALUE(payload, '$.pull_request.title') as pr_title,
-- Option to include PR description (can really blow out the output size)
-- JSON_VALUE(payload, '$.pull_request.body') as pr_body,
JSON_VALUE(payload, '$.pull_request.created_at') as pr_created_at,
JSON_VALUE(payload, '$.pull_request.updated_at') as pr_updated_at,
JSON_VALUE(payload, '$.pull_request.closed_at') as pr_closed_at,
JSON_VALUE(payload, '$.pull_request.merged_at') as pr_merged_at,
JSON_VALUE(payload, '$.pull_request.draft') as pr_draft,
JSON_VALUE(payload, '$.pull_request.requested_reviewers') as pr_requested_reviewers,
-- Option to include PR labels (can be quite big)
-- JSON_EXTRACT_ARRAY(payload, '$.pull_request.labels') as pr_labels,
JSON_VALUE(payload, '$.pull_request.merged') as pr_merged,
JSON_VALUE(payload, '$.pull_request.merged_by') as pr_merged_by,
JSON_VALUE(payload, '$.pull_request.commits') as pr_commits,
JSON_VALUE(payload, '$.pull_request.changed_files') as pr_changed_files,
-- *** Comment columns (includes PR comments, but not review body comments)
JSON_VALUE(payload, '$.comment.pull_request_review_id') as comment_review_id,
JSON_VALUE(payload, '$.comment.node_id') as comment_node_id,
JSON_VALUE(payload, '$.comment.commit_id') as commit_id,
JSON_VALUE(payload, '$.comment.user.login') as comment_user,
JSON_VALUE(payload, '$.comment.user.type') as comment_user_type,
-- Option to include comment body (can really blow out the output size)
-- JSON_VALUE(payload, '$.comment.body') as comment_body,
JSON_VALUE(payload, '$.comment.created_at') as comment_created_at,
JSON_VALUE(payload, '$.comment.updated_at') as comment_updated_at,
-- *** Review columns
JSON_VALUE(payload, '$.review.node_id') as review_node_id,
JSON_VALUE(payload, '$.review.user.login') as review_user,
JSON_VALUE(payload, '$.review.state') as review_state,
JSON_VALUE(payload, '$.review.submitted_at') as review_submitted_at,
JSON_VALUE(payload, '$.review.body') as review_body,
id as event_id
FROM `githubarchive.month.202205`
WHERE
repo.name in ('vuejs/vue', 'tensorflow/tensorflow', 'ansible/ansible', 'kubernetes/kubernetes', 'Microsoft/vscode', 'facebook/react', 'python/cpython', 'rust-lang/rust')
and type in ('PullRequestEvent', 'PullRequestReviewCommentEvent', 'PullRequestReviewEvent')
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment