Skip to content

Instantly share code, notes, and snippets.

@eyaltrabelsi
Created December 19, 2016 09:25
Show Gist options
  • Save eyaltrabelsi/ebb8da1bad2b79cf732fccb432790780 to your computer and use it in GitHub Desktop.
Save eyaltrabelsi/ebb8da1bad2b79cf732fccb432790780 to your computer and use it in GitHub Desktop.
Read a small random sample from a big CSV file into a Python data frame
# There is the python way
import pandas
import random
n = 1000000 #number of records in file
s = 10000 #desired sample size
filename = "data.csv"
skip = sorted(random.sample(xrange(n),n-s))
df = pandas.read_csv(filename, skiprows=skip)
# There is the bash way
shuf -n 100000 data.csv > data_sample.csv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment