Skip to content

Instantly share code, notes, and snippets.

@thinrhino
Created April 20, 2014 19:19
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thinrhino/11122670 to your computer and use it in GitHub Desktop.
Save thinrhino/11122670 to your computer and use it in GitHub Desktop.
Kaggle: Acquire Valued Shoppers Challenge: reducing the dataset from 22GB to 1GB
import pandas
df = pandas.read_csv('offers.csv.gz', compression='gzip')
categories = df.category.tolist()
subset = open('subset.csv', 'w')
fl = open('transactions.csv', 'r')
fl.readline()
while True:
l = fl.readline()
if l == '':
break
if numpy.int64(l.split(',')[3]) in categories:
subset.write(l)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment