Skip to content

Instantly share code, notes, and snippets.

@ewels
Last active August 18, 2016 15:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ewels/e6e30a4b1bef7cb9566a20219bb83023 to your computer and use it in GitHub Desktop.
Save ewels/e6e30a4b1bef7cb9566a20219bb83023 to your computer and use it in GitHub Desktop.
Quick and dirty script to check for common warning signs in a Bismark BS-seq methylation analysis
#!/bin/bash
# Check bisulfite analysis results for any modes of failure that I can think of.
# Assumes data is processed by Cluster Flow / bismark and is in subdirectories.
# For a basic quick summary, just run:
# bash check_bismark_analysis.sh
# To filter which subdirectories are checked, pass a glob expression. For example:
# bash check_bismark_analysis.sh run2_*
# Get all subdirectories if no pattern specified
if [ $# -ge 1 ]
then
directories="$@"
else
directories=$(ls -d */)
fi
num_align_logs=0
num_dedup_logs=0
num_meth_logs=0
num_eol_mentions=0
num_no_fn=0
num_temp_files=0
table="Directory\tAlign\tDedup\tMeth\tEOL\tNo fn\t*.temp\n"
for d in $directories
do
align_logs=$(grep 'Final Cytosine Methylation Report' $d/*PE_report.txt 2> /dev/null | wc -l)
dedup_logs=$(grep 'Total count of deduplicated leftover sequences' $d/*deduplication_report.txt 2> /dev/null | wc -l)
meth_logs=$(grep 'C methylated in CHH context' $d/*splitting_report.txt 2> /dev/null | wc -l)
eol_mentions=$(grep -i 'eol' $d/*.txt 2> /dev/null | wc -l)
no_fn=$(grep 'No file names found' $d/*.txt 2> /dev/null | wc -l)
temp_files=$(ls -1 $d/*temp 2> /dev/null | wc -l)
(( num_align_logs += align_logs ))
(( num_dedup_logs += dedup_logs ))
(( num_meth_logs += meth_logs ))
(( num_eol_mentions += eol_mentions ))
(( num_no_fn += no_fn ))
(( num_temp_files += temp_files ))
table+="$d\t$align_logs\t$dedup_logs\t$meth_logs\t$eol_mentions\t$no_fn\t$temp_files\n"
done
echo -e $table | column -ts $'\t'
echo -e "\n\n========="
echo " SUMMARY "
echo "---------"
echo "Complete align log files: $num_align_logs"
echo "Complete deduplication log files: $num_dedup_logs"
echo "Complete methylation log files: $num_meth_logs"
echo "Mentions of EOL: $num_eol_mentions"
echo "Mentions of 'No file names found': $num_no_fn"
echo "*.temp files: $num_temp_files"
echo "============================================"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment