Skip to content

Instantly share code, notes, and snippets.

@oeon
Last active August 8, 2019 21:51
Show Gist options
  • Save oeon/6d0c61d45b08b4d9f60064fcea17cdc6 to your computer and use it in GitHub Desktop.
Save oeon/6d0c61d45b08b4d9f60064fcea17cdc6 to your computer and use it in GitHub Desktop.

this has been slightly modifed for my use cases ~joe

I've had to deal with uploading big data csv files a lot. It's a pain when you encounter timeouts and upload limits, sometimes your only choice is to split the file into smaller files.

I want to show you how to do this in 3 easy steps with the Terminal!

Splitting the file:

split -l 10000 companies.csv ./split-files/chunk-

(10000 is the number of lines you want for each file.)

(./split-files/chunk- is the directory/filename you would like each file to be stored under. Terminal will automatically append an incrementing string to each file e.g: aa,ab,ac)

Appending ‘.csv'

Each file that's generating during the split won't have any file extension. We'll fix that with the following command.

TEST:

for f in *; do echo mv "$f" "$f.csv"; done

Remove the 'echo' to complete the change:

for f in *; do mv "$f" "$f.csv"; done  

Adding the header:

This is where you add the header (a set of comma delimited column names) to each file. When you split the csv file, the header will only be on the first file (so you may want to remove it from the first file before running this command).

for i in *.csv; do sed -i '' '1i\  
First column name, Second column name, etc, etc  
' $i; done

NOTE: This command has to be broken into multiple lines.

This one I worked out through a number of different sources

Hopefully this saves you some time!

For Windows, this seems to work: https://stackoverflow.com/questions/20602869/batch-file-to-split-csv-file/20603219#20603219

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment