Skip to content

Instantly share code, notes, and snippets.

View marcelm's full-sized avatar

Marcel Martin marcelm

  • Stockholm
  • 03:58 (UTC +02:00)
View GitHub Profile
@marcelm
marcelm / qualtrim.py
Created March 7, 2023 09:42
Quality trimming experiment
#!/usr/bin/env python3
"""Quality trimming using a running sum from the 5' to 3' end"""
import sys
from argparse import ArgumentParser
import dnaio
def qual_trim_index(qualities_ascii, threshold):
qualities = [ord(c) - 33 for c in qualities_ascii]
@marcelm
marcelm / split.awk
Created September 28, 2022 09:50
Split reads at adapter occurrences
# Split reads in a FASTQ file at adapter occurrences
#
# Run:
# cutadapt -O 100 --times=1000 -g MYADAPTERSEQ --info-file=info.txt -o /dev/null reads.fastq.gz
#
# Then:
# awk -F "\t" -f split.awk info.txt | gzip > split.fastq.gz
# Relevant info file fields:
@marcelm
marcelm / fasta2fastq
Last active January 29, 2021 14:11
FASTA to FASTQ conversion
#!/usr/bin/env python3
"""
Run with:
fasta2fastq < in.fasta > out.fastq
"""
import dnaio
import sys
with dnaio.open(sys.stdin.buffer) as inf:
with dnaio.open(sys.stdout.buffer, mode="w", fileformat="fastq") as outf:
for record in inf:
@marcelm
marcelm / condalock.sh
Created September 17, 2019 09:32
Create a Conda environment.lock.yml for macOS while running on Linux
#!/bin/bash
# This script creates both
# - environment.osx.lock.yml and
# - environment.linux.lock.yml
# regardless of the operating system it is running on. The trick is
# temporarily setting the subdir and subdirs keys in .condarc to
# what would be appropriate for the other operating system.
#
# It assumes that there exists a (manually managed) environment.yml file
@marcelm
marcelm / kill-zombie.sh
Created October 4, 2018 09:14
Hanging Nextflow job workaround
#!/bin/bash
# A workaround for an issue with Nextflow (which may actually be a bash bug),
# see <https://github.com/SciLifeLab/Sarek/issues/420>
#
# The problem is that Nextflow does not notice that a job has finished and
# hangs indefinitely.
#
# This script looks for zombie processes that are children of a script named
# .command.stub, and kills that script. This seems to let the pipeline continue
#!/usr/bin/env python3
"""
Mask low-quality bases in a FASTQ file with 'N'.
Adjust cutoff_front and cutoff_back below to use
different thresholds (currently: 20 at 5' end,
0 at 3' end).
Usage:
python3 qualmask.py input.fastq.gz > output.fastq
@marcelm
marcelm / bambai
Created December 8, 2015 13:20
Index a BAM file while sorting it
#!/bin/bash
set -euo pipefail
if [ $# -ne 1 -o x$1 == x-h -o x$1 == x--help ]; then
echo \
"Usage:
samtools sort -O bam -T prefix ... | bambai BAMPATH
Read a sorted BAM file from standard input, write it to BAMPATH and
index it at the same time (creating BAMPATH.bai)."
@marcelm
marcelm / mismatches.py
Created September 16, 2015 09:06
Use pysam and pyfaidx to find mismatches in an interval
from pysam import AlignmentFile
from pyfaidx import Fasta
def has_mismatch_in_interval(reference, bamfile, chrom, start, end):
"""
Return whether there is a mismatch in the interval (start, end) in any read mapping to the given chromosome.
reference -- a pyfaidx.Fasta object or something that behaves similarly
"""
for column in bamfile.pileup(chrom, start, end):
@marcelm
marcelm / pdfpages_oo.py
Created June 24, 2015 14:23
Plot multiple figures into a single PDF with matplotlib, using the object-oriented interface
"""
Plot multiple figures into a single PDF with matplotlib, using the
object-oriented interface.
"""
from matplotlib.backends.backend_pdf import FigureCanvasPdf, PdfPages
from matplotlib.figure import Figure
import numpy as np
with PdfPages('multi.pdf') as pages:
for i in range(10):
@marcelm
marcelm / snakemake-pure-python.py
Last active November 29, 2023 00:45
pure Python module that uses snakemake to construct and run a workflow
#!/usr/bin/env python3
"""
Running this script is (intended to be) equivalent to running the following Snakefile:
include: "pipeline.conf" # Should be an empty file
shell.prefix("set -euo pipefail;")
rule all:
input: