Last active
September 18, 2019 18:35
-
-
Save jonathan-dejong/2db9170ac6159350b2f8c291860f114e to your computer and use it in GitHub Desktop.
Split CSV files by megabyte and retain head row (row 1)
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [ ! -d "./backup" ]; then | |
mkdir backup | |
fi | |
lines=2000 | |
for i in *.csv; do | |
split -b ${1:-$lines} $i ${i%.csv}- | |
for j in ${i%.csv}-a*; do | |
if [[ "$j" != *-aa ]] | |
then | |
echo $(head -n 1 $i)|cat - $j > /tmp/out && mv /tmp/out $j | |
fi | |
mv $j $j.csv | |
done | |
mv $i backup/$i | |
done |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
if [ ! -d "./backup" ]; then | |
mkdir backup | |
fi | |
megabyte=20 | |
for i in *.csv; do | |
split -b ${1:-$megabyte}m $i ${i%.csv}- | |
for j in ${i%.csv}-a*; do | |
if [[ "$j" != *-aa ]] | |
then | |
echo $(head -n 1 $i)|cat - $j > /tmp/out && mv /tmp/out $j | |
fi | |
mv $j $j.csv | |
done | |
mv $i backup/$i | |
done |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Caveat: if your CSV contains newline characters within the actual records the point where you split the file may not split cleanly between two records but rather in the middle of one.
If you're just dealing with a few splitted files I found it easiest to just open each up and copy paste the correct data over from each other and delete the faulty rows.