Skip to content

Instantly share code, notes, and snippets.

View mt1022's full-sized avatar

Hong Zhang mt1022

View GitHub Profile
@mt1022
mt1022 / TajimaD.R
Last active October 14, 2023 04:41
R script to calculate Tajima's D
#' R script to calculate Tajima's D based on site frequency spectrum
#' Reference:
#' Tajima F., 1983 Evolutionary relationship of DNA sequences in finite
#' populations. Genetics 123: 437–460.
#' Tajima F., 1989 Statistical method for testing the neutral mutation
#' hypothesis by DNA polymorphism. Genetics 123: 585–595.
#' Justin C. Fay and Chung-I Wu, 2000 Hitchhiking Under Positive Darwinian
#' Selection. Genetics 155: 1405-1413``
TajimaD <- function(sfs){
@mt1022
mt1022 / sam2tsv.py
Created November 30, 2017 07:58
a python version of sam2tsv
"""
a python version of sam2tsv from jarkit (http://lindenb.github.io/jvarkit/Sam2Tsv.html)
# for testing
sam_file = pysam.AlignmentFile("/home/zh/local/src/samtools-1.3/examples/toy.sam", "r", check_sq=False)
ref_fa = pysam.FastaFile("/home/zh/local/src/samtools-1.3/examples/toy.fa")
=================================
author: mt1022 (zh)
date: 2017-11-28 22:39

histogram of read length distribution

# in fastq file (modified https://www.biostars.org/p/104314/)
awk 'NR%4 == 2 {cnt[length($0)]++} END {n = asorti(cnt, cnt2); for(i = 1; i <= n; i++) print cnt2[i], cnt[cnt2[i]];}' in.fq

# alignment bam (for bowtie output, stranded RNA-seq reads mapped to transcriptome)
samtools view in.bam | awk '$2 == 0 {cnt[$6]++}END{n = asorti(cnt, cnt2); for(i = 1; i <= n; i++) print cnt2[i], cnt[cnt2[i]];}'
``
@mt1022
mt1022 / bed12_slop.py
Last active August 26, 2019 02:31
Extending the `bedtools slop` for bed12 format.
"""
adjust the size of interval stored in bed12 format;
=================================
author: mt1022 (zh)
date: 2017-01-03 19:45
"""
import sys
import argparse