Skip to content

Instantly share code, notes, and snippets.

@GuidoTournois
Last active December 11, 2018 07:35
Show Gist options
  • Save GuidoTournois/af125bc8b1fb8b0a444b14851c0f32d4 to your computer and use it in GitHub Desktop.
Save GuidoTournois/af125bc8b1fb8b0a444b14851c0f32d4 to your computer and use it in GitHub Desktop.
import pandas
import random
filename = "data.csv"
n = sum(1 for line in open(filename))-1 # Calculate number of rows in file
s = n//10 # sample size of 10%
skip = sorted(random.sample(range(1, n+1), n-s)) # n+1 to compensate for header
df = pandas.read_csv(filename, skiprows=skip)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment