Skip to content

Instantly share code, notes, and snippets.

@mathcass
Last active February 14, 2016 02:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mathcass/921027d4e169059bb813 to your computer and use it in GitHub Desktop.
Save mathcass/921027d4e169059bb813 to your computer and use it in GitHub Desktop.
This is a simple Makefile spec that allows you to transform every .tsv file in your current directory to a .csv with a given tsv2csv python script
# You can call this with:
# make -j N all_csv
# where N can be the number of processors you wish to use in parallel
# This Make rule uses the Python script `tsv2csv.py` to turn the first argument (the .tsv file) and dumps it to the output file
%.csv: %.tsv
python tsv2csv.py $< > $@
# This gets all files in this directory that end in .tsv
TSV_SOURCES:=$(shell find $(SOURCEDIR) -name '*.tsv')
# This makes all of those .tsv files .csv and stores them in a variable
CSV_SOURCES:=$(TSV_SOURCES:.tsv=.csv)
# This is a dummy target that runs the command on each file
all_csv: $(CSV_SOURCES)
# Essentially copied from here: https://gist.github.com/nsonnad/7598574
import sys
import csv
csv.field_size_limit(sys.maxsize)
# tabin = csv.reader(sys.stdin, dialect=csv.excel_tab)
file_in = sys.argv[1]
# Handle newlines properly
tabin = csv.reader(open(file_in, 'rU'), dialect=csv.excel_tab)
commaout = csv.writer(sys.stdout, dialect=csv.excel)
for row in tabin:
commaout.writerow(row)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment