iracooke/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Split a Fasta file

This method relies on bioawk . First make sure you have bioawk installed.  Then download the file split_fasta.awk from this repository.  Instructions below assume you have this file available in your working directory
Installing bioawk (instructions specific for JCU HPC)


Make a bin directory if you haven't already

cd ~
mkdir bin

Put this directory on your path (if you haven't already)

echo "export PATH=${PATH}:${HOME}/bin" >> ~/.bash_profile

Clone bioawk

git clone https://github.com/lh3/bioawk.git

Build bioawk and copy to ~/bin

cd bioawk
make
cp bioawk maketab ../bin/

Cleanup

cd ~
rm -r bioawk
Usage

To split a file with default parameters
cat input.fasta | bioawk -c fastx -f split_fasta.awk
To customise the prefix
cat input.fasta | bioawk -c fastx -v prefix="mycustom_" -f split_fasta.awk
To customise the number of records per chunk
cat input.fasta | bioawk -c fastx -v prefix="mycustom_" -v nrec=5000 -f split_fasta.awk

  
## split_fasta.awk
BEGIN{
	if( prefix == ""){
		prefix="chunk_";
	}
	if( nrec == ""){
		nrec=1000
	}
}
{
	if( (NR-1)%nrec==0 ){
		file=sprintf("%s%d.fa",prefix,(NR-1));
	}

	printf(">%s\t%s\n%s\n",$name,$comment,$seq) >> file
}
	BEGIN{
	if( prefix == ""){
	prefix="chunk_";
	}
	if( nrec == ""){
	nrec=1000
	}
	}
	{
	if( (NR-1)%nrec==0 ){
	file=sprintf("%s%d.fa",prefix,(NR-1));
	}

	printf(">%s\t%s\n%s\n",$name,$comment,$seq) >> file
	}