Skip to content

Instantly share code, notes, and snippets.

@cj2001
Last active February 9, 2021 23:31
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save cj2001/4dd19cd213270ea11a4c39f28fd908c0 to your computer and use it in GitHub Desktop.
Save cj2001/4dd19cd213270ea11a4c39f28fd908c0 to your computer and use it in GitHub Desktop.
Load arXiv data
file = "./arxiv-metadata-oai-snapshot.json"
metadata = []
lines = 100000 # 100k for testing
with open(file, 'r') as f:
for line in tqdm(f):
metadata.append(json.loads(line))
lines -= 1
if lines == 0: break
df = pd.DataFrame(metadata)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment