Skip to content

Instantly share code, notes, and snippets.

View vsbuffalo's full-sized avatar

Vince Buffalo vsbuffalo

View GitHub Profile
@vsbuffalo
vsbuffalo / pairwise_cov.r
Created March 1, 2019 19:06
comparing two implementations of covariance with pairwise complete cases
library(tidyverse)
library(MASS)
pcov <- function(x) {
xs <- scale(x, scale=FALSE)
dd <- as.integer(!is.na(x))
dim(dd) <- dim(x)
denom <- (t(dd) %*% dd) - 1L
no_obs <- denom == 0L
xs[is.na(xs)] <- 0
@vsbuffalo
vsbuffalo / foo.R
Created April 13, 2016 18:42
23andme and bioc blog post
Title: Using Bioconductor to Analyze your 23andme Data
Bioconductor is one of the open source projects of which I am most
fond. The documentation is excellent, the community wonderful, the
development fast-paced, and the software *very* well written.
There's a new package in the development branch (due to be released as
2.10 very soon) called `gwascat`. `gwascat` is a package that serves
as an interface to the [NHGRI's](http://www.genome.gov/) database of
genome-wide association studies.
@vsbuffalo
vsbuffalo / Makefile
Last active November 28, 2018 14:04
finally, a LaTeX makefile that captures your anger and frustration
# Thanks https://github.com/EBI-predocs/latex-thesis/blob/master/Makefile for
# some tips
LATEXMK = latexmk -xelatex
# CONFIG
target = manuscript
references = bib.bib
# SETUP
includes := $(shell ls *.tex) ${references}
library(purrr)
foo <- function(x) {
return(function(y) {
y + x
})
}
args <- list(1, 2)
foos_map <- map(args, foo)
@vsbuffalo
vsbuffalo / .ycm_extra_conf.py
Created March 2, 2016 21:10
example YouCompleteMe file
# This file is NOT licensed under the GPLv3, which is the license for the rest
# of YouCompleteMe.
#
# Here's the license text for this file:
#
# This is free and unencumbered software released into the public domain.
#
# Anyone is free to copy, modify, publish, use, compile, sell, or
# distribute this software, either in source code form or as a compiled
# binary, for any purpose, commercial or non-commercial, and by any
@vsbuffalo
vsbuffalo / draw_lineage.js
Created September 5, 2015 16:53
example standlone d3 svg image generator
var fs = require('fs');
var d3 = require('d3');
var jsdom = require('node-jsdom');
var xmlserializer = require('xmlserializer');
var margin = {top: 2, right: 4, bottom: 2, left: 4};
var width = 200 - margin.left - margin.right,
height = 14 - margin.top - margin.bottom;
@vsbuffalo
vsbuffalo / summarizeByTile.R
Created November 8, 2013 22:54
Example of GenomicRanges's tileGenome, which I think demonstrates its power. This might be a bit faster as a custom script in Python or C, but (1) this would take longer and (2) this is much more interactive (3) on real data, it's actually pretty fast. Stuff like this is why Bioconductor should be in every bioinformatician's toolkit.
library(GenomicRanges)
summarizeByTile <-
# given a GRanges (or some sort of ranged data) object `x`, and a
# *corresponding* vector values to summarize `y` (these *must*
# correspond), calculate the summary per tile with the function `fun`.
# Note: this is still beta; wider tests coming, use with caution.
function(x, y, tiles, fun, mcol_name="y") {
stopifnot(length(x) == length(y))
@vsbuffalo
vsbuffalo / entropy_class.py
Created September 26, 2013 18:16
Version of entropy function we wrote in class
from __future__ import division
from collections import Counter
from math import log
def entropy(seq, unit="bit"):
"""
Returns entropy of DNA sequence.
The entropy formula is:
entropy = -sum_i (log(p_i) * p_i)
@vsbuffalo
vsbuffalo / entropy_vince.py
Created September 26, 2013 17:57
Vince's version of entropy in Python
"""
entropy.py
Calculate entropy of a given list.
"""
from math import log, log10
from collections import Counter
import pdb
def entropy(x, logfun=lambda x: log(x, 2)):
@vsbuffalo
vsbuffalo / naive_nshared.py
Created September 5, 2013 23:33
Calculate number of minor alleles (not in consensus sequence).
import sys
from readfq import readfq
from itertools import combinations
from datetime import datetime
def num_shared(seq_a, seq_b, consensus_seq):
"""
Given two alignment sequences in multiple alignment FASTA format,
calculate the number of shared SNPs (for minor alleles only, not
in consensus).