Skip to content

Instantly share code, notes, and snippets.

@jrherr
Last active December 16, 2015 02:09
Show Gist options
  • Save jrherr/5360038 to your computer and use it in GitHub Desktop.
Save jrherr/5360038 to your computer and use it in GitHub Desktop.
I use this quick shell script (Mac OS X shell tools -- I don't have the linux 'shuf' installed) to take a smaller random sample of a fasta file. This can be modified for FASTQ files too.
cat name.fasta |\ # identify file name
awk '/^>/ { if(i>0) printf("\n"); i++; printf("%s\t",$0); next;} {printf("%s",$0);} END { printf("\n");}' |\ # read data
perl -MList::Util -e 'print List::Util::shuffle <>' |\ # random sample of sequences with shuffle
head -n 50000 |\ # break fasta file into sections of 50000 sequences in length
awk '{printf("%s\n%s\n",$1,$2)}' > name_1.fasta # write sequence output
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment