A duplicate variant is when multiple records have the same CHROM, POS, REF, and ALT. This script will pick one of the duplicate variants and discard the rest. The variant that is picked is the one that comes first in sorting order.
Use remove_VCF_duplicates.sh
if you have bcftools
installed. If you don't have bcftools
use remove_VCF_duplicates.non_bcftools.sh
, but keep in mind that
this method is much slower and only accepts uncompressed VCF files.
vcf, vcf.gz, bcf, bcf.gz
vcf
./remove_VCF_duplicates.sh dups.vcf.gz > nodups.vcf
bcftools 1.3+
does this work?