Skip to content

Instantly share code, notes, and snippets.

View ctokheim's full-sized avatar

Collin Tokheim ctokheim

  • Dana-Farber Cancer Institute
  • Boston, MA
View GitHub Profile
@ctokheim
ctokheim / profile_python.md
Created November 11, 2014 18:54
Profiling Python Code

Profiling Python Code

Line-by-line profiling

Many times a line-by-line timing description is necessary to understand where slow downs in the code exist. Just using %timeit in python can give you an understanding of how long functions take but not inidividual lines. There is a line-by-line profiler that has been developed on github here.

Install by:

$ pip install line_profiler
@ctokheim
ctokheim / pytables_tricks.md
Last active March 9, 2016 22:44
Pytables: persistent matrices using HDF5

Pytables

Pytables allows out-of-core operations on large tables/matrices. Pytables utilizes HDF5 to store arrays. Conveniently, numpy arrays can be saved directly to HDF5 and then directly retrieved without need for expensive conversion operations.

Advantages

The advantage of pytables and HDF5 is that arrays can be stored in compressed binary format and then be retreived by indexed access. This means operations on a matrix can be performed even if it does not fit into memory by accessing the data in "chunks". Pytables already provides an automatic "chunking" size for performing operations. Element wise expressions is performed by numexpr under the hood. Matrix multiplication, however, requires a thin wrapper that grabs blocks of the matrix and performs matrix multiplication using numpy. The resulting block is then saved back to disk.

Installation

@ctokheim
ctokheim / matplotlib_barplot.md
Last active May 18, 2024 11:45
Matplotlib: Stacked and Grouped Bar Plot

Stacked and Grouped Bar Plot

Oddly enough ggplot2 has no support for a stacked and grouped (position="dodge") bar plot. The seaborn python package, although excellent, also does not provide an alternative. However, I knew it was surely possible to make such a plot in regular matplotlib. Matplotlib, although sometimes clunky, gives you enough flexibility to precisely place plotting elements which is needed for a stacked and grouped bar plot.

Below is a working example of making a stacked and grouped bar plot.

import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
@ctokheim
ctokheim / cython_tricks.md
Last active March 4, 2024 23:27
cython tricks

Cython

Cython has two major benefits:

  1. Making python code faster, particularly things that can't be done in scipy/numpy
  2. Wrapping/interfacing with C/C++ code

Cython gains most of it's benefit from statically typing arguments. However, statically typing is not required, in fact, regular python code is valid cython (but don't expect much of a speed up). By incrementally adding more type information, the code can speed up by several factors. This gist just provides a very basic usage of cython.

@ctokheim
ctokheim / sqlite_tricks.sql
Created July 2, 2014 13:45
Sqlite3 tricks
-- output query into file
sqlite3 -header -separator $'\t' data/2020plus.db "select * from mutation;" > ../2020p/data/mutations.txt
@ctokheim
ctokheim / longest_snvbox_tx.sql
Last active August 29, 2015 14:02
Retrieves the longest transcript for each gene in SNVBox
/* Retrieves the longest transcript for each gene in SNVBox.
The longest RefSeq transcript is selected first. However, if there are no
RefSeq transcripts for a gene, then the longest Ensembl transcript is selected.
To output results to a file use mysql from the command line.
$ mysql [options] < longest_snvbox_tx.sql > output.txt
Please change the database name your_snvbox_db to your actual SNVBox db name.

Background

Last week when trying to generate some awsome summary graphic, I came across the painful ordeal of trying to combine ggplot elements with a number of regular R graphic elements.

First things first, I had used grid.arrange module from gridExtra package to juxtapose ggplot elements in a panel figure. Furthermore, I had combined regular R [read non-ggplot] figures thru the good old par functionality.

Brief summary:

We will be making calls to viewport module from the gridBase package to generate---guess what---view ports into which ggplot objects are plotted. Interestingly, ggplot elements need not be a single simple ggplot object, but could be an arrangement of multiple objects bundled together using arrangeGrob module from gridExtra.

@ctokheim
ctokheim / numpy_scipy_tricks.py
Last active August 29, 2015 14:02
Numpy/Scipy tricks
import numpy as np
import scipy as sp
# print what numpy is compiled against (e.g. BLAS)
np.show_config()
# count number of nan's
np.isnan(A).sum()
# basic info about arrays
@ctokheim
ctokheim / ggplot2_tricks.R
Last active June 1, 2018 14:35
ggplot2 tricks
# The aes function describes how features are mapped to visual attributes
myTextTheme <- theme(axis.text.x = element_text(size=16, angle=-90),
axis.text.y = element_text(size=16),
title = element_text(size=24),
axis.title.x = element_text(size=20),
axis.title.y = element_text(size=20)) # change text size
# plot confusion matrix
tile <- ggplot() + geom_tile(aes(x=Actual, y=Predicted, fill=Percent),

tmux shortcuts & cheatsheet

start new:

tmux

start new with session name:

tmux new -s myname