Last active
February 13, 2024 16:49
-
-
Save williamrowell/cceb718298a05426591c7497707d1696 to your computer and use it in GitHub Desktop.
Calculate the mean and standard deviation of a coverage depth distribution as if it were normally distributed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env python3 | |
"""Coverage mean and standard deviation of autosomes | |
Estimate the mean and standard deviation from a mosdepth coverage BED for | |
positions with coverage in the range (0, 2 * non-zero mode). This estimate | |
behaves well for PacBio HiFi WGS of human germline aligned to either hs37d5 and | |
GRCh38, and may be useful for other situations as well. | |
$ bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}" | |
$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py | |
""" | |
__author__ = "William Rowell" | |
__version__ = "0.2.0" | |
__license__ = "MIT" | |
import sys | |
import numpy as np | |
from scipy.stats import norm, mode | |
def nonzero_mode(values): | |
"""Return mode of values > 0""" | |
return mode(values[values > 0])[0] | |
def mu_sigma(values): | |
"""Return mu, sigma of values from 1 to 2*mode""" | |
return norm.fit(values[(values > 0) & (values < (2*nonzero_mode(values)))]) | |
def main(): | |
"""For a mosdepth bed file provided to stdin, determine the mean and stddev as | |
if the data were normal, and report these values. | |
Usage: tabix ${BAM%.*}.median.regions.bed.gz `seq 1 22` | python depth_mean_stddev.py | |
""" | |
# pull coverage depth into an array | |
values = np.loadtxt(sys.stdin, delimiter='\t', usecols=3, dtype=np.float64) | |
print(mu_sigma(values)) | |
if __name__ == "__main__": | |
"""$ tabix ${BAM%.*}.median.regions.bed.gz {1..22} | python depth_mean_stddev.py | |
> (mean, stddev) | |
""" | |
main() |
zcat ${BAM%.*}.median.regions.bed.gz \
| awk -v OFS='\t' -v MEAN=$MEAN -v STDEV=$STDEV '{print $1, $2, $3, ($4 - MEAN) / STDEV}' \
> ${BAM%.*}.median.regions.zscore.bed.gz
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
DESCRIPTION
Calculate coverage mean and standard deviation of autosomes from read alignments.
Estimate the mean and standard deviation from a mosdepth coverage BED for positions with coverage in the range (0, 2 * non-zero mode). This estimate behaves well for WGS of human germline aligned to either hs37d5 and GRCh38, and may be useful for other situations as well.
APPROACH
DEPENDENCY INSTALLATION
pip install numpy scipy
STEP-BY-STEP
bash mosdepth --threads 3 --no-per-base --by 500 -m "${BAM%.*}.median" "${BAM}"
tabix ${BAM%.*}.median.regions.bed.gz <space delimited list of autosomes> | python depth_mean_stddev.py
Output:
<mean> <standard deviation>
DISCLAIMER
THIS WEBSITE AND CONTENT AND ALL SITE-RELATED SERVICES, INCLUDING ANY DATA, ARE PROVIDED "AS IS," WITH ALL FAULTS, WITH NO REPRESENTATIONS OR WARRANTIES OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, ANY WARRANTIES OF MERCHANTABILITY, SATISFACTORY QUALITY, NON-INFRINGEMENT OR FITNESS FOR A PARTICULAR PURPOSE. YOU ASSUME TOTAL RESPONSIBILITY AND RISK FOR YOUR USE OF THIS SITE, ALL SITE-RELATED SERVICES, AND ANY THIRD PARTY WEBSITES OR APPLICATIONS. NO ORAL OR WRITTEN INFORMATION OR ADVICE SHALL CREATE A WARRANTY OF ANY KIND. ANY REFERENCES TO SPECIFIC PRODUCTS OR SERVICES ON THE WEBSITES DO NOT CONSTITUTE OR IMPLY A RECOMMENDATION OR ENDORSEMENT BY PACIFIC BIOSCIENCES.