Skip to content

Instantly share code, notes, and snippets.

@abraham
Created July 7, 2018 23:25
Show Gist options
  • Save abraham/46875b4b1c4b855cb3d3cafe540a22b2 to your computer and use it in GitHub Desktop.
Save abraham/46875b4b1c4b855cb3d3cafe540a22b2 to your computer and use it in GitHub Desktop.
Split a large JSON file into multiple files
[
{"one":"two"},
{"three":"four"}
]
source_file = 'example.json'
line_number = 0
page_size = 25_000
def write(file_name, data)
File.write(file_name, data, mode: 'a')
end
File.readlines(source_file).each do |line|
page = line_number / page_size
file_name = "data/#{page.abs}.json"
write(file_name, "[\n") if !line_number.zero? && (line_number % page_size).zero?
if ((line_number + 1) % page_size).zero?
write(file_name, line.chomp(",\n"))
write(file_name, "\n]") unless line.end_with?("]\n")
puts "Finished file #{page + 1}"
else
write(file_name, line)
end
line_number += 1
end
@Lowell130
Copy link

it's work with nested json files?

@abraham
Copy link
Author

abraham commented Jan 20, 2020

This only works with flat arrays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment