Skip to content

Instantly share code, notes, and snippets.

@jbosboom
Created August 30, 2020 21:51
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jbosboom/57cfb85ee87c323b445a088aebf9b8d9 to your computer and use it in GitHub Desktop.
Save jbosboom/57cfb85ee87c323b445a088aebf9b8d9 to your computer and use it in GitHub Desktop.
#!/usr/bin/env python3
import sys, os
samples = 100
sample_size = 25 * 1024 * 1024
with open(sys.argv[1], 'rb') as input, open(sys.argv[2], 'w+b') as output:
total_size = os.fstat(input.fileno()).st_size
sampled_size = samples * sample_size
excluded_size = total_size - sampled_size
intron_size = int(excluded_size / (samples - 1))
for i in range(samples):
sample = input.read(sample_size)
output.write(sample)
input.seek(int(intron_size), os.SEEK_CUR)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment