Skip to content

Instantly share code, notes, and snippets.

@rchikhi
rchikhi / nthash.py
Last active December 26, 2022 15:50
NTHash rolling nucleotide (kmer) hashing. NThash1 version, pure Python implementation of https://academic.oup.com/bioinformatics/article/32/22/3492/2525588
# nthash1, pure python implementation
#
# (not nthash2, so only for k values < 64)
# ported from https://github.com/luizirber/nthash/
h = {'A': 0x3c8b_fbb3_95c6_0474,
'C': 0x3193_c185_62a0_2b4c,
'G': 0x2032_3ed0_8257_2324,
'T': 0x2955_49f5_4be2_4456,
'N': 0}
@rchikhi
rchikhi / estimate-insert-sizes
Last active October 25, 2022 06:16
Quickly estimates insert sizes of read datasets, given some sequence(s) they can be mapped to. Requires BWA. Short usage: <reference> <*.fastq>
#!/usr/bin/env python
doc = """
Quickly estimates insert sizes of read datasets, given some sequence(s) they can be mapped to.
Author: Rayan Chikhi
short usage: <reference> <*.fastq>
example:
estimate-insert-sizes contigs.fa readsA_1.fq readsA_2.fq readsB_1.fq readsB_2.fq
Wheeler graphs
Gagie, Manzini, Siren
Theoretical Computer Science, 2017
https://www.sciencedirect.com/science/article/pii/S0304397517305285
Notes of a whiteboard presentation to the Bonsai team in Lille.
These notes largely follow the paper.
Rayan Chikhi, 2019
@rchikhi
rchikhi / alg1-ropebwt2.txt
Last active August 29, 2015 14:04
Illustration of Algorithm 1 in RopeBWT2 article
This document is a partial presentation of the RopeBWT2 pre-print
http://arxiv.org/abs/1406.0426
It is the transcript of a presentation made within the Medvedev group at Penn State
in July 2014. It focuses on illustrating some notions from the methods, and
illustrating and proving Algorithm 1. While this document does not cover the main
contribution of the RopeBWT2 paper, I hope it can be helpful towards understanding
the theoretical foundations that led to Algorithms 2 and 3.