Skip to content

Instantly share code, notes, and snippets.

@jonathan-dejong
Last active September 18, 2019 18:35
Show Gist options
  • Save jonathan-dejong/2db9170ac6159350b2f8c291860f114e to your computer and use it in GitHub Desktop.
Save jonathan-dejong/2db9170ac6159350b2f8c291860f114e to your computer and use it in GitHub Desktop.
Split CSV files by megabyte and retain head row (row 1)
#!/bin/bash
if [ ! -d "./backup" ]; then
mkdir backup
fi
lines=2000
for i in *.csv; do
split -b ${1:-$lines} $i ${i%.csv}-
for j in ${i%.csv}-a*; do
if [[ "$j" != *-aa ]]
then
echo $(head -n 1 $i)|cat - $j > /tmp/out && mv /tmp/out $j
fi
mv $j $j.csv
done
mv $i backup/$i
done
#!/bin/bash
if [ ! -d "./backup" ]; then
mkdir backup
fi
megabyte=20
for i in *.csv; do
split -b ${1:-$megabyte}m $i ${i%.csv}-
for j in ${i%.csv}-a*; do
if [[ "$j" != *-aa ]]
then
echo $(head -n 1 $i)|cat - $j > /tmp/out && mv /tmp/out $j
fi
mv $j $j.csv
done
mv $i backup/$i
done
@jonathan-dejong
Copy link
Author

Caveat: if your CSV contains newline characters within the actual records the point where you split the file may not split cleanly between two records but rather in the middle of one.

If you're just dealing with a few splitted files I found it easiest to just open each up and copy paste the correct data over from each other and delete the faulty rows.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment