Skip to content

Instantly share code, notes, and snippets.

@otaviomguerra
Created December 6, 2018 15:53
Show Gist options
  • Save otaviomguerra/9c3127470beed203331990f5bdd1eb0c to your computer and use it in GitHub Desktop.
Save otaviomguerra/9c3127470beed203331990f5bdd1eb0c to your computer and use it in GitHub Desktop.
Selecionar algumas linhas de dataset grande
import pandas as pd
import random
filename = "data.csv"
n = sum(1 for line in open(filename)) - 1 #numero de linhas no arquivo
s = 10000 #numero de amostras desejado (linhas do dataset que se quer)
skip = sorted(random.sample(range(1,n+1),n-s)) #the 0-indexed header will not be included in the skip list
df = pd.read_csv(filename, skiprows=skip)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment