Skip to content

Instantly share code, notes, and snippets.

@williamrowell
Last active February 13, 2024 16:49
Show Gist options
  • Save williamrowell/cceb718298a05426591c7497707d1696 to your computer and use it in GitHub Desktop.
Save williamrowell/cceb718298a05426591c7497707d1696 to your computer and use it in GitHub Desktop.
Calculate the mean and standard deviation of a coverage depth distribution as if it were normally distributed.
#!/usr/bin/env python3
"""Coverage mean and standard deviation of autosomes
Estimate the mean and standard deviation from a mosdepth coverage BED for
positions with coverage in the range (0, 2 * non-zero mode). This estimate
behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and
GRCh38, and may be useful for other situations as well.
$ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}"
$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py
"""
__author__ = "William Rowell"
__version__ = "0.2.0"
__license__ = "MIT"
import sys
import numpy as np
from scipy.stats import norm, mode
def nonzero_mode(values):
"""Return mode of values > 0"""
return mode(values[values > 0])[0]
def mu_sigma(values):
"""Return mu, sigma of values from 1 to 2*mode"""
return norm.fit(values[(values > 0) & (values < (2*nonzero_mode(values)))])
def main():
"""For a mosdepth bed file provided to stdin, determine the mean and stddev as
if the data were normal, and report these values.
Usage: tabix ${BAM%.*}.median.regions.bed.gz `seq 1 22` | python depth_mean_stddev.py
"""
# pull coverage depth into an array
values = np.loadtxt(sys.stdin, delimiter='\t', usecols=3, dtype=np.float64)
print(mu_sigma(values))
if __name__ == "__main__":
"""$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py
> (mean, stddev)
"""
main()
@williamrowell
Copy link
Author

zcat ${BAM%.*}.median.regions.bed.gz \
    |  awk -v OFS='\t' -v MEAN=$MEAN -v STDEV=$STDEV '{print $1, $2, $3, ($4 - MEAN) / STDEV}' \
    > ${BAM%.*}.median.regions.zscore.bed.gz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment