Skip to content

Instantly share code, notes, and snippets.

@twiecki
twiecki / dask_sparse_corr.py
Created August 17, 2018 11:26
Compute large, sparse correlation matrices in parallel using dask.
import dask
import dask.array as da
import dask.dataframe as dd
import sparse
@dask.delayed(pure=True)
def corr_on_chunked(chunk1, chunk2, corr_thresh=0.9):
return sparse.COO.from_numpy((np.dot(chunk1, chunk2.T) > corr_thresh))
def chunked_corr_sparse_dask(data, chunksize=5000, corr_thresh=0.9):
# Inspired by the following sentence that I ran across this morning:
#
# "f_lineno is the current line number of the frame - writing to
# this from within a trace function jumps to the given line
# (only for the bottom-most frame). A debugger can implement a
# Jump command (aka Set Next Statement) by writing to f_lineno."
#
# https://docs.python.org/2/reference/datamodel.html
#
# There is an older implementation of a similar idea:
@brantfaircloth
brantfaircloth / get_protein.py
Created April 3, 2011 23:50
Get protein sequences from Genbank given a genomic accession number and a gene name
import sys
import time
from Bio import Entrez
Entrez.email = "your.email@domain.tld"
if not Entrez.email:
print "you must add your email address"
sys.exit(2)
# create an empty list we will fill with the gene names
@saketkc
saketkc / TEST.rb
Created July 14, 2011 06:37
CodeChef(SPOJ) Problem1 Ruby Solution
a=[]
while STDIN.readline.chomp!="42"
a.push($_)
end
a.each { |s| puts s }