Skip to content

Instantly share code, notes, and snippets.

@dinovski
Last active April 12, 2023 14:19
Show Gist options
  • Save dinovski/835fbdf9a94766126bede7e344e22128 to your computer and use it in GitHub Desktop.
Save dinovski/835fbdf9a94766126bede7e344e22128 to your computer and use it in GitHub Desktop.
compute transition/transversion rate across all sites in a VCF file
#!/bin/bash
## calculate Ts/Tv across all sites (includes AC=0)
## file must be b/gzipped
## ./tstv.sh file.vcf.gz
VCF=$1
# get count for transitions:
ag=$(zcat ${VCF} | awk '! /\#/' | awk '{if(length($4) == 1 && length($5) == 1) print}' | \
awk '($4 == "A" && $5 == "G" || $4 == "G" && $5 == "A")' | wc -l)
echo "AG:" $ag
ct=$(zcat ${VCF} | awk '! /\#/' | awk '{if(length($4) == 1 && length($5) == 1) print}' | \
awk '($4 == "C" && $5 == "T" || $4 == "T" && $5 == "C")' | wc -l)
echo "CT:" $ct
# transversions:
ac=$(zcat ${VCF} | awk '! /\#/' | awk '{if(length($4) == 1 && length($5) == 1) print}' | \
awk '($4 == "A" && $5 == "C" || $4 == "C" && $5 == "A")' | wc -l)
echo "AC:" $ac
at=$(zcat ${VCF} | awk '! /\#/' | awk '{if(length($4) == 1 && length($5) == 1) print}' | \
awk '($4 == "A" && $5 == "T" || $4 == "T" && $5 == "A")' | wc -l)
echo "AT:" $at
cg=$(zcat ${VCF} | awk '! /\#/' | awk '{if(length($4) == 1 && length($5) == 1) print}' | \
awk '($4 == "G" && $5 == "C" || $4 == "C" && $5 == "G")' | wc -l)
echo "CG:" $cg
gt=$(zcat ${VCF} | awk '! /\#/' | awk '{if(length($4) == 1 && length($5) == 1) print}' | \
awk '($4 == "G" && $5 == "T" || $4 == "T" && $5 == "G")' | wc -l)
echo "GT:" $gt
sum_ts=$((ag + ct))
sum_tv=$((ac + at + cg + gt))
ts_tv=$(awk "BEGIN {printf \"%.2f\",${sum_ts}/${sum_tv}}")
echo "Ts = $sum_ts"
echo "Tv = $sum_tv"
echo "Ts/Tv = $ts_tv"
@Gadji-M
Copy link

Gadji-M commented Aug 25, 2022

Hi Dinovski,
I tried to use you scripts and adjust to calculate transition and transversion contain in a .csv file, but i couldn't succed when i run the commands line tstv.sh snsp.csv
Below is the shell scripts i used for that:
#!/bin/sh

get count for transitions:

ag=$(cat ${CSV} | awk '! /#/' | awk '{if(length($3)== 1 && length($4) == 1) print}' | \ awk '($3== "A" && $4 == "G" || $3 == "G" && $4 == "A") | wc -l
cat snps.csv | awk '! /#/' | awk '{if(length($3)== 1 && length($4) == 1) print}' | \ awk '($3== "A" && $4 == "G" || $3 == "G" && $4 == "A") | wc -l

ct=$(cat ${CSV} | awk '! /#/' | awk '{if(length($3) == 1 && length($4) == 1) print}' |
awk '($3 == "C" && $4 == "T" || $3 == "T" && $4 == "C")' | wc -l)
echo "CT:" $ct

transversions:

ac=$(cat ${CSV} | awk '! /#/' | awk '{if(length($3) == 1 && length($4) == 1) print}' |
awk '($3 == "A" && $4 == "C" || $3 == "C" && $4 == "A")' | wc -l)
echo "AC:" $ac

at=$(cat ${CSV} | awk '! /#/' | awk '{if(length($3) == 1 && length($4) == 1) print}' |
awk '($3 == "A" && $4 == "T" || $3 == "T" && $4 == "A")' | wc -l)
echo "AT:" $at

cg=$(cat ${CSV} | awk '! /#/' | awk '{if(length($3) == 1 && length($4) == 1) print}' |
awk '($3 == "G" && $4 == "C" || $3 == "C" && $4 == "G")' | wc -l)
echo "CG:" $cg

gt=$(cat ${CSV} | awk '! /#/' | awk '{if(length($3) == 1 && length($5) == 1) print}' |
awk '($3 == "G" && $4 == "T" || $3 == "T" && $4 == "G")' | wc -l)
echo "GT:" $gt

sum_ts=$((ag + ct))
sum_tv=$((ac + at + cg + gt))
ts_tv=$(awk "BEGIN {printf "%.2f",${sum_ts}/${sum_tv}}")
echo "Ts = $sum_ts"
echo "Tv = $sum_tv"
echo "Ts/Tv = $ts_tv"

Please could you help me with this??

Cheers

Gadji_M

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment