Skip to content

Instantly share code, notes, and snippets.

@janxkoci
Last active June 29, 2022 08:27
Show Gist options
  • Save janxkoci/25d495e6cb9f21d5ee4af3005fb3c77a to your computer and use it in GitHub Desktop.
Save janxkoci/25d495e6cb9f21d5ee4af3005fb3c77a to your computer and use it in GitHub Desktop.
This simple script prepares your VCF dataset for pruning with Plink. It takes name of the input VCF as argument and produces new, annotated VCF.
@janxkoci
Copy link
Author

janxkoci commented Jun 5, 2018

Pro tip: you can click on the "Raw" button and then wget URL directly on your HPC ;)

@janxkoci
Copy link
Author

janxkoci commented Sep 29, 2020

TODO

Figure out how to implement PREFIX without adding a new parameter. Relevant working code:

VCF=$1
PREFIX=$2 # better take the prefix from VCF filename

# now you are ready to run plink pruning on the output file, e.g.
plink --indep 50 5 2 --vcf $VCF --out $PREFIX --allow-extra-chr
plink --extract ${PREFIX}.prune.in --vcf $VCF --genome --recode --out $PREFIX --allow-extra-chr # --allow-extra-ch is needed for nonhuman organisms
plink --file  $PREFIX --read-genome  ${PREFIX}.genome --cluster --mds-plot 2 --out $PREFIX --allow-extra-chr # --allow-extra-ch is needed for nonhuman organisms

@janxkoci
Copy link
Author

janxkoci commented Dec 2, 2020

See also these plink parameters:

@janxkoci
Copy link
Author

janxkoci commented Feb 23, 2021

It's possible that the entire script and all it's dependencies can be replaced with this awk oneliner:

awk '!/#/ {sub($3, $1"_"$2)}1' input.vcf

However I have to test it properly with real VCF file first.

Update: Actually this seems to do the trick:

awk 'BEGIN{OFS="\t"} !/#/ {sub(/\./, $1"_"$2, $3)}1' input.vcf

@janxkoci
Copy link
Author

janxkoci commented Feb 18, 2022

Looks like I missed a feature of bcftools when I was preparing the original script (or it was added later). Now I noticed a new parameter bcftools annotate --set-id with the following example included:

bcftools annotate --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' file.vcf

https://samtools.github.io/bcftools/bcftools.html#annotate

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment