Skip to content

Instantly share code, notes, and snippets.

@janxkoci
Last active June 29, 2022 08:27
Show Gist options
  • Save janxkoci/25d495e6cb9f21d5ee4af3005fb3c77a to your computer and use it in GitHub Desktop.
Save janxkoci/25d495e6cb9f21d5ee4af3005fb3c77a to your computer and use it in GitHub Desktop.
This simple script prepares your VCF dataset for pruning with Plink. It takes name of the input VCF as argument and produces new, annotated VCF.
@janxkoci
Copy link
Author

janxkoci commented Feb 23, 2021

It's possible that the entire script and all it's dependencies can be replaced with this awk oneliner:

awk '!/#/ {sub($3, $1"_"$2)}1' input.vcf

However I have to test it properly with real VCF file first.

Update: Actually this seems to do the trick:

awk 'BEGIN{OFS="\t"} !/#/ {sub(/\./, $1"_"$2, $3)}1' input.vcf

@janxkoci
Copy link
Author

janxkoci commented Feb 18, 2022

Looks like I missed a feature of bcftools when I was preparing the original script (or it was added later). Now I noticed a new parameter bcftools annotate --set-id with the following example included:

bcftools annotate --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' file.vcf

https://samtools.github.io/bcftools/bcftools.html#annotate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment