Skip to content

Instantly share code, notes, and snippets.

@steezeburger
Last active November 10, 2022 09:55
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 9 You must be signed in to fork a gist
  • Save steezeburger/98114746b2e4c5fa1ad1 to your computer and use it in GitHub Desktop.
Save steezeburger/98114746b2e4c5fa1ad1 to your computer and use it in GitHub Desktop.
Bash script for splitting large CSV files into 100 lines while keeping the header.
#!/bin/bash
FILENAME=file-to-split.csv
HDR=$(head -1 ${FILENAME})
split -l 100 ${FILENAME} xyz
n=1
for f in xyz*
do
if [[ ${n} -ne 1 ]]; then
echo ${HDR} > part-${n}-${FILENAME}.csv
fi
cat ${f} >> part-${n}-${FILENAME}.csv
rm ${f}
((n++))
done
@arobinski
Copy link

This writes the header twice in the first file.

@arobinski
Copy link

Also, there's an error in line 9: missing .csv at the end.

@steezeburger
Copy link
Author

@arobinski Thanks for catching those errors! I've updated the script.

@bauerdLucd
Copy link

Thanks for posting this.. save me some time.

@madurapa
Copy link

Thanks.

A couple of improvements can be done though.

  1. The first set takes including the header so the data count always stays as n-1 for the first one.
  2. adding the extension on lines 9 and 11 makes doubled up when writing the files.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment