Skip to content

Instantly share code, notes, and snippets.

@tomleo
Created June 10, 2014 17:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tomleo/8112e3ecb548d5ae5275 to your computer and use it in GitHub Desktop.
Save tomleo/8112e3ecb548d5ae5275 to your computer and use it in GitHub Desktop.
Cut down on the amount of data in data set.
import csv
import codecs
fin = 'quick-statistics.csv'
threshold = 40
lines_to_keep = []
with open(fin, 'r') as f:
keep_line = 0
reader = csv.reader(f)
lines_to_keep.append(reader.next())
for line in reader:
if keep_line == threshold:
lines_to_keep.append(line)
keep_line = 0
else:
keep_line +=1
fout = '{}-out.csv'.format(fin.split('.')[0])
with codecs.open(fout, mode='w+', encoding='utf-8') as f:
for line in lines_to_keep:
l = ','.join(line) + '\n'
f.write(l)
print "Complete"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment