Last active
March 3, 2024 20:28
-
-
Save opplatek/cc0601e6777a9f279dd2c4785a2e51ba to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/bin/bash | |
# | |
# Speed up deepTools computeMatrix by splitting the references into smaller chunks and then merging the matrices together | |
# | |
positions=5000 | |
threads=12 | |
rnd=$RANDOM | |
# split reference into chunks by number of lines | |
split -l $positions ref.bed ref.chunks${rnd} | |
for chunk in ref.chunks${rnd}*; do | |
# Rename name column (4) in bed to avoid potential problems which deepTools naming which might happen if the reference position name are not unique | |
name=$(basename $chunk) | |
name=${name##*.} | |
cat $chunk | awk -v name=$name 'BEGIN {FS = "\t"; OFS = "\t"} {print $1,$2,$3,name,$5,$6}' > tmp.$rnd && mv tmp.$rnd $chunk | |
done | |
# calculate matrix for each chunk | |
for chunk in ref.chunks${rnd}*; do | |
computeMatrix reference-point \ | |
--referencePoint TSS \ | |
-R $chunk \ | |
-S input.bw \ | |
-b 500 -a 500 \ | |
--skipZeros \ | |
--missingDataAsZero \ | |
--binSize 10 \ | |
--averageTypeBins median \ | |
--numberOfProcessors $threads \ | |
--outFileName ${chunk}.gz | |
done | |
# merge the chunks back to one file | |
computeMatrixOperations rbind -m ref.chunks${rnd}*.gz -o ref.matrix.gz && rm ref.chunks${rnd}*.gz | |
# make heatmaps | |
plotHeatmap \ | |
-m ref.matrix.gz \ | |
--sortUsing mean \ | |
--averageTypeSummaryPlot mean \ | |
--missingDataColor "#440154" \ | |
--colorMap viridis \ | |
--zMax 100 \ | |
--linesAtTickMarks \ | |
--refPointLabel "TSS" \ | |
--heatmapHeight 20 \ | |
--heatmapWidth 10 \ | |
--dpi 300 \ | |
--outFileName ref.png | |
rm ref.chunks${rnd}* |
Hi @opplatek , thanks for getting back to me! I did try sorting the matrices to see if it was an ordering issue, but they still ended up different. I also did not have time to dig into this more thoroughly though, and am hoping to have time to revisit this week. I didn't plot the matrices to see if they look the same, so I will try that as well. Thank you!
Hi @opplatek, thanks for this useful post. When running though i keep getting a list of errors such as
Skipping chunks9399am_r1505, due to being absent in the computeMatrix output.
Any ideas what this is referring to? The plotted output shows nothing, so im guessing it is dropping all the data for some reason. Thanks!
Hi @mbassalbioinformatics. Not sure why you're getting the error message. This post seems to explain some of the reasons why this is happening.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hey @oligomyeggo, It has been a while since I last used deepTools. I think I only compared the number of lines and the final plots, not the actual content of the matrices when I was working on this. The final plots looked the same.
I don't have any data (or time, sobbing emoji) to check it now. But just trying to think of something - is it possible that the matrices (lines) are just ordered differently?