Skip to content

Instantly share code, notes, and snippets.

@janxkoci
Created January 6, 2023 22:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save janxkoci/54d4749e1e18c8d353302302a35208ee to your computer and use it in GitHub Desktop.
Save janxkoci/54d4749e1e18c8d353302302a35208ee to your computer and use it in GitHub Desktop.
a wrapper script for vcf2hetfa,pl from the SGDP project
#!/bin/bash
## bash vcf2hetfa.sh input.vcf.gz
VCF=$1
OUT=$(basename -s .vcf.gz $VCF).hetfa.fa
TMP=${OUT}.tmp
rm $OUT # output build by appending, so remove old versions first
for chr in {1..22}
do
echo ">"${chr} >> $OUT
bcftools view -t $chr $VCF | perl vcf2hetfa.pl --fasta_sample_prefix $TMP
fold $TMP >> $OUT
echo "" >> $OUT
done
rm $TMP
@janxkoci
Copy link
Author

janxkoci commented Jan 6, 2023

vcf2hetfa.sh

A simple wrapper script for vcf2hetfa.pl from the Simons Genome Diversity Project (SGDP). Takes a path/to/file.vcf.gz as argument and produces file.hetfa.fa in current directory (the original script doesn't support stdout). I use symlinks to have input files at reasonable paths. 👍️

Possible improvements:

  • Don't hardcode input format (.vcf.gz is what I need right now, and it is a reasonable default).
  • xargs or GNU parallel version would be awesome, possibly followed by seqkit sort or some such, for faster throughput. 👈️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment