Created
December 6, 2018 15:53
-
-
Save otaviomguerra/9c3127470beed203331990f5bdd1eb0c to your computer and use it in GitHub Desktop.
Selecionar algumas linhas de dataset grande
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import pandas as pd | |
import random | |
filename = "data.csv" | |
n = sum(1 for line in open(filename)) - 1 #numero de linhas no arquivo | |
s = 10000 #numero de amostras desejado (linhas do dataset que se quer) | |
skip = sorted(random.sample(range(1,n+1),n-s)) #the 0-indexed header will not be included in the skip list | |
df = pd.read_csv(filename, skiprows=skip) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment