Skip to content

Instantly share code, notes, and snippets.

@tim1357
Last active June 28, 2016 17:42
Show Gist options
  • Save tim1357/7c638290d50a9bccd57d35a5ad4646cc to your computer and use it in GitHub Desktop.
Save tim1357/7c638290d50a9bccd57d35a5ad4646cc to your computer and use it in GitHub Desktop.
This python script randomly splits a file into n parts.
#! /bin/python
'''Tim Sears - 2016
Randomly splits a file into n parts'''
from random import randint
import argparse
import gzip
parser = argparse.ArgumentParser(description='Split file into n random partitions')
parser.add_argument('--file', type=str, default='/dev/stdin',
help='the file to read from')
parser.add_argument('--n', type=int, default=10,
help='the number of files to split to')
parser.add_argument('--prefix', type=str, default='partition',
help='the prefix for the split files')
parser.add_argument('--gzip', default=False, action='store_true',
help='gzip the output files')
args = parser.parse_args()
with open(args.file) as source:
if not args.gzip:
files =[open(args.prefix+str(i),'w') for i in range(args.n)]
else:
files = [gzip.open(args.prefix+str(i)+'.gz','w') for i in range(args.n)]
try:
for line in source:
files[randint(0,args.n-1)].write(line)
finally:
[a.close() for a in files]
@tim1357
Copy link
Author

tim1357 commented Jun 28, 2016

add gzip capability

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment