Skip to content

Instantly share code, notes, and snippets.

@chriswhong
Last active March 10, 2017 18:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save chriswhong/fd2941fba840262d0657daaa26e87bab to your computer and use it in GitHub Desktop.
Save chriswhong/fd2941fba840262d0657daaa26e87bab to your computer and use it in GitHub Desktop.
Chunk a csv into many files
#!/bin/bash
FILENAME=cpdb_spending.csv
HDR=$(head -1 $FILENAME) # Pick up CSV header line to apply to each file
split -l 200000 $FILENAME xyz # Split the file into chunks of 20 lines each
n=1
for f in xyz* # Go through all newly created chunks
do
if [n -gt 1]
then
echo $HDR > Part${n}.csv # Write out header to new file called "Part(n)"
fi
cat $f >> Part${n}.csv # Add in the lines from the "split" command
zip -r Part${n}.zip Part${n}.csv
rm $f # Remove temporary file
rm Part${n}.csv
((n++)) # Increment name of output part
done
# Found on this quora post and adapted https://www.quora.com/How-can-I-parse-a-CSV-string-with-Javascript
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment