Skip to content

Instantly share code, notes, and snippets.

View bede's full-sized avatar

Bede Constantinides bede

View GitHub Profile
@bede
bede / bioinformatics.patch
Created January 10, 2024 14:08
Necessary modifications to oup-authoring-template.tex for Oxford Bioinformatics submission
22,23c22,23
< \documentclass[unnumsec,webpdf,contemporary,large]{oup-authoring-template}%
< %\documentclass[unnumsec,webpdf,contemporary,large,namedate]{oup-authoring-template}% uncomment this line for author year citations and comment the above
---
> % \documentclass[unnumsec,webpdf,contemporary,large]{oup-authoring-template}%
> \documentclass[unnumsec,webpdf,contemporary,large,namedate]{oup-authoring-template}% uncomment this line for author year citations and comment the above
957,958c957,958
< %\bibliographystyle{abbrvnat}
< %\bibliography{reference}
---
@bede
bede / concat_by_barcode.py
Last active October 19, 2023 16:50
Concatenate demultiplexed ONT FASTQs by barcode (for one or more runs)
"""
Purpose: Concatenate demultiplexed FASTQs by barcode for one or more ONT runs
Usage: python concat_by_barcode.py run1/fastq_pass run2/fastq_pass
Author: Bede Constantinides
"""
import subprocess
import sys
from collections import defaultdict
@bede
bede / custom_check.py
Last active May 26, 2023 15:49
Pandera MWE – I want a single failure case when region_is_valid fails indicating the sample_name of the row that failed (cDNA-VOC-1-v4-1)
from io import StringIO
import pandas as pd
import pandera as pa
import pandera.extensions as extensions
from pandera.typing import Index, Series
csv_string = """
sample_name,country,region
cDNA-VOC-1-v4-1,USA,Bretagne
@bede
bede / split_summary_by_barcode.py
Created February 11, 2021 10:25
Split Guppy sequencing summaries by barcode
def split_summary_by_barcode(summary_path, out_dir, run_name):
'''Given a sequencing summary file path, write per barcode summaries to an output directory'''
dtypes = {
'filename_fastq': 'object',
'filename_fast5': 'object',
'read_id': 'object',
'run_id': 'category',
'channel': 'int64',
'mux': 'int64',
import pandas as pd
from bokeh.models.widgets import Select
from bokeh.layouts import widgetbox
from bokeh.models import ColumnDataSource, DataTable, TableColumn, CustomJS
from bokeh.io import show, output_file, output_notebook, reset_output
from bokeh.layouts import row, column, layout
raw_data = {'ORG': ['APPLE', 'ORANGE', 'MELON'],
'APPROVED': [5, 10, 15],
@bede
bede / cluster_df.py
Last active May 8, 2019 12:53
Distance matrix clustering
import pandas as pd
from scipy.spatial.distance import squareform
from scipy.cluster.hierarchy import fcluster, linkage
def cluster_df(df, method='single', threshold=100):
'''
Accepts a square distance matrix as an indexed DataFrame and returns a dict of index keyed flat clusters
Performs single linkage clustering by default, see scipy.cluster.hierarchy.linkage docs for others
'''
@bede
bede / gist:77f37fed5857d852ac69
Last active January 21, 2019 15:03
Bioinformatics flavoured Mojave
# Resilio, Dropbox, JupyterLab, Firefox, Sublime, Atom, Zotero, VPNs,
# Slack, Bitwarden, Office, Sketch, Typora
# Brew, Miniconda, Jupyterlab, RStudio, Docker
# Install custom jupyterlab envs https://ipython.readthedocs.io/en/latest/install/kernel_install.html#kernels-for-different-environments
# Create /opt/ and /opt/bin, place nextflow inside latter.
# Put toggle-able symlinks to brewed GCC inside here, prepend to path in .bashrc (`gcc_links_on`, `gcc_links_off`)
# Nextflow
# .ssh
# Generate keys
# Create /opt and opt/bin, add to path
@bede
bede / Dockerfile
Created July 27, 2017 21:27
IVA Ubuntu Dockerfile
FROM ubuntu:16.04
RUN apt-get update && \
apt-get --yes install \
kmc smalt python3-pip zlib1g-dev libncurses5-dev libncursesw5-dev mummer samtools
RUN pip3 install iva
ENTRYPOINT ["iva"]
@bede
bede / sam_to_consensus_fa.sh
Last active July 17, 2017 17:15
Bioinformatics code golf: SAM to consensus FASTA
# SAM to consensus FASTA code golf, inspired by http://lab.loman.net/2015/07/28/calling-haploid-consensus-sequence/
# Starting with a SAM:
samtools view -bS seqs.sam | samtools sort - seqs # Generate and sort BAM
samtools index seqs.bam # Index BAM
# Starting with an indexed BAM:
samtools mpileup -ud 1000 -f seqs_ref.fasta seqs.bam | bcftools call -c | vcfutils.pl vcf2fq | seqtk seq -a - > seqs.consensus.fa # Generate pileup, call variants, convert to fq, convert to fa
# Who can do better? The bar is set low...
@bede
bede / simple_python3_parallelism.ipynb
Last active January 23, 2017 13:17
Simple Python3 parallelism
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.