Skip to content

Instantly share code, notes, and snippets.

@sephraim
Last active September 11, 2024 04:41
Show Gist options
  • Save sephraim/11ede658cf19624d6a249f517679c8c6 to your computer and use it in GitHub Desktop.
Save sephraim/11ede658cf19624d6a249f517679c8c6 to your computer and use it in GitHub Desktop.
Remove duplicate variants from VCF file
#!/bin/bash
myvcf="$1"
# Print header
bcftools view -h -O v "$myvcf"
# Sort and remove duplicates per chromosome
for i in {1..22} MT X Y
do
bcftools view -H -r "$i" -O v "$myvcf" | sort -u -k2,2n -k4,4d -k5,5d
done
#!/bin/bash
# PLEASE NOTE: Only use this version if you do not have bcftools available.
# This method is much slower than the other method and only accepts
# uncompressed VCF files. It works, but it's not recommended,
# especially for large VCF files.
$myvcf="$1"
cat \
<(grep '^#' "$myvcf") \
<(grep -E '^[[:digit:]]' "$myvcf" | sort -u -k1,1n -k2,2n -k4,4d -k5,5d) \
<(grep -E '^[[:alpha:]]' "$myvcf" | sort -u -k1,1d -k2,2n -k4,4d -k5,5d)

Remove duplicate variants from a VCF file

A duplicate variant is when multiple records have the same CHROM, POS, REF, and ALT. This script will pick one of the duplicate variants and discard the rest. The variant that is picked is the one that comes first in sorting order.

Use remove_VCF_duplicates.sh if you have bcftools installed. If you don't have bcftools use remove_VCF_duplicates.non_bcftools.sh, but keep in mind that this method is much slower and only accepts uncompressed VCF files.

Input

vcf, vcf.gz, bcf, bcf.gz

Output

vcf

Example usage

./remove_VCF_duplicates.sh dups.vcf.gz > nodups.vcf

Requirements

bcftools 1.3+

@vinaydeep26
Copy link

does this work?

@vinaydeep26
Copy link

Sorry your scripts don't work.
image

@Milor123
Copy link

Milor123 commented Nov 2, 2023

Amazing!! Thanks bro ❤️ . In my old phone i need remade the VCF, because surely it have a windows format and dont detect, but now works nice.
I am not sure but i think that maybe could considerate use unix2dos input.vcf output.vcf for linux users

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment