Skip to content

Instantly share code, notes, and snippets.

@ohlol
Forked from rjurney/sample_jsonl.py
Last active June 10, 2018 08:09
Show Gist options
  • Save ohlol/d5da698c559bf096d096c92927a81d5e to your computer and use it in GitHub Desktop.
Save ohlol/d5da698c559bf096d096c92927a81d5e to your computer and use it in GitHub Desktop.
Create a random sample of records from a JSON Lines file
import json
import math
with open("data/repos.jsonl") as f:
records = [json.loads(x) for x in f]
count = len(records)
sample_count = math.ceil(.01*count)
sample_records = random.sample(records, sample_count)
with open("data/repos_sample_0.01.jsonl", "w") as f:
for record in sample_records:
f.write(json.dumps(record) + "\n")
print("Sampled {} records from {} original records.".format(
sample_count,
count
))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment