Skip to content

Instantly share code, notes, and snippets.

@97-109-107
Created November 18, 2014 11:41
Show Gist options
  • Save 97-109-107/bf9211c4a160deb4ee15 to your computer and use it in GitHub Desktop.
Save 97-109-107/bf9211c4a160deb4ee15 to your computer and use it in GitHub Desktop.
A tiny python thing to split big json files into smaller junks.
#!/usr/bin/env python
# based on http://stackoverflow.com/questions/7052947/split-95mb-json-array-into-smaller-chunks
# usage: python json-split filename.json
# produces multiple filename_0.json of 1.49 MB size
import json
import sys
with open(sys.argv[1],'r') as infile:
o = json.load(infile)
chunkSize = 4550
for i in xrange(0, len(o), chunkSize):
with open(sys.argv[1] + '_' + str(i//chunkSize) + '.json', 'w') as outfile:
json.dump(o[i:i+chunkSize], outfile)
@anthnyprschka
Copy link

Hi there thanks for the script! I was trying it on an 800mb json file - unfortunately didn't work. Do you have any idea why it only spit out two files, filename_0 with about 700mb and filename_1 with 200mb? Might it be something with the len(o) (since it is not specifified whether its counting bytes, lines etc.)?

And I don;t have to call for !/usr/bin/env python if i want to run it in the terminal right?

Also i believe in the terminal command there needs to be the ".py" after the json-split filename right?

@dilipbobby
Copy link

what if I don't know about chunkSize value?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment