Skip to content

Instantly share code, notes, and snippets.

@Arkadeep-sophoIITG
Last active September 6, 2020 13:41
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save Arkadeep-sophoIITG/abe67d65223339678f28b5bcc2ae14b3 to your computer and use it in GitHub Desktop.
Save Arkadeep-sophoIITG/abe67d65223339678f28b5bcc2ae14b3 to your computer and use it in GitHub Desktop.
Accepts an input csv file and shuffles the rows using python pandas dataframe
'''
Title : Pandas Row Shuffler
Author : Arkadeep
'''
import numpy as np
import pandas as pd
import sys
arguments = sys.argv[1]
args = arguments.strip('.csv');
def shuffler(filename):
df = pd.read_csv(filename, header=0,dtype=object,na_filter=False)
# return the pandas dataframe
return df.reindex(np.random.permutation(df.index))
def main(outputfilename):
shuffler(arguments).to_csv(outputfilename, sep=',',encoding = 'utf-8',index = False)
if __name__ == '__main__':
main(args+'-shufffled.csv')
@ranajoyviraj
Copy link

Hi Arkadeep how to shuffle when the filesize is 34 GB and you have 16 MiB of RAM. Will this work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment