Skip to content

Instantly share code, notes, and snippets.

View jiagengliu's full-sized avatar

Jiageng jiagengliu

  • MIT
  • Cambridge, MA
View GitHub Profile
@jennynz
jennynz / gharchive_bq_example.sql
Created June 22, 2022 01:56
Query for getting PR and review-related fields from GHArchive on BigQuery
SELECT
repo.name as repo,
type,
created_at,
actor.login,
JSON_VALUE(payload, '$.action') as action,
-- *** PR columns
JSON_VALUE(payload, '$.pull_request.node_id') as pr_node_id,
JSON_VALUE(payload, '$.pull_request.state') as pr_state,
JSON_VALUE(payload, '$.pull_request.user.login') as pr_user_login,
@gousiosg
gousiosg / README.md
Last active November 8, 2023 05:20
Restoring the GHTorrent MongoDB database

This is a collection of scripts to restore a full GHTorrent MongoDB database from the dumps available at http://ghtorrent-downloads.ewi.tudelft.nl.

To do the restore:

  1. Open a MongoDB terminal and run the createCollections.js script to create the necessary collections. You can block_compressor to either snappy or zlib to make your databases compressed. I am using none here, as I am using compression at the filesystem level.

  2. Run restore-cummulative-dumps.sh to restore the cummulative dumps. Wait 3-4 days.