Created
January 6, 2023 22:26
-
-
Save janxkoci/54d4749e1e18c8d353302302a35208ee to your computer and use it in GitHub Desktop.
a wrapper script for vcf2hetfa,pl from the SGDP project
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
## bash vcf2hetfa.sh input.vcf.gz | |
VCF=$1 | |
OUT=$(basename -s .vcf.gz $VCF).hetfa.fa | |
TMP=${OUT}.tmp | |
rm $OUT # output build by appending, so remove old versions first | |
for chr in {1..22} | |
do | |
echo ">"${chr} >> $OUT | |
bcftools view -t $chr $VCF | perl vcf2hetfa.pl --fasta_sample_prefix $TMP | |
fold $TMP >> $OUT | |
echo "" >> $OUT | |
done | |
rm $TMP |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
vcf2hetfa.sh
A simple wrapper script for vcf2hetfa.pl from the Simons Genome Diversity Project (SGDP). Takes a
path/to/file.vcf.gz
as argument and producesfile.hetfa.fa
in current directory (the original script doesn't supportstdout
). I use symlinks to have input files at reasonable paths. 👍️Possible improvements:
.vcf.gz
is what I need right now, and it is a reasonable default).xargs
or GNUparallel
version would be awesome, possibly followed byseqkit sort
or some such, for faster throughput. 👈️