Skip to content

Instantly share code, notes, and snippets.

View ndaniel's full-sized avatar

Daniel Nicorici ndaniel

View GitHub Profile
@ndaniel
ndaniel / 0.00README.md
Created May 24, 2017 10:27 — forked from lh3/0.00README.md
Mapping short reads with a ~50bp INDEL

This is a small experiment on the alignment of ~50bp INDELs. The query sequences are shown in 0.01.fq below, where seq_ori is a 204bp sequence extracted from the human reference genome, seq_del54 contains a 54bp deletion in the middle, seq_del84 contains a 84bp deletion in a 120bp read, and seq_ins40 contains a 40bp insertion in a 140bp read. These four short sequences were mapped to the human reference genome with Bowtie2, BWA-MEM, LAST, Novoalign, SNAP and Stampy with default settings. Non-default scoring functions were also tested for Bowtie2 (--rdg 5,1 --rfg 5,1), BWA-MEM (-A2 -E1) and LAST (-r2 -q4). The output by various mappers/settings can be found in this gist. The following table gives my summary:

Mapper Setting -84bp -54bp +40bp
BBMAP default Yes Yes Yes
Bowtie2 default No No No
Bowtie2 --rdg 5,1 --rfg 5,1 as insertion as insertion Yes
BWA-MEM default as split Yes Yes
BWA-MEM -A2 -E1 Yes Yes Yes
LAST default as split as split
@ndaniel
ndaniel / do_hyper.R
Created January 30, 2019 10:59 — forked from slowkow/do_hyper.R
Compute a hypergeometric p-value for a gene set of interest.
# Try this with:
# - https://github.com/jefworks/genesets
# - https://github.com/slowkow/tftargets
#' Compute a hypergeometric p-value for your gene set of interest relative to
#' a universe of genes that you have defined.
#'
#' @param ids A vector with genes of interest.
#' @param universe A vector with all genes, including the genes of interest.
@ndaniel
ndaniel / gist:a4d7a65ce80421aa63706fb44bc156b8
Created November 23, 2023 08:03 — forked from why-not/gist:4582705
Pandas recipe. I find pandas indexing counter intuitive, perhaps my intuitions were shaped by many years in the imperative world. I am collecting some recipes to do things quickly in pandas & to jog my memory.
"""making a dataframe"""
df = pd.DataFrame([[1, 2], [3, 4]], columns=list('AB'))
"""quick way to create an interesting data frame to try things out"""
df = pd.DataFrame(np.random.randn(5, 4), columns=['a', 'b', 'c', 'd'])
"""convert a dictionary into a DataFrame"""
"""make the keys into columns"""
df = pd.DataFrame(dic, index=[0])