Skip to content

Instantly share code, notes, and snippets.

FROM ubuntu
RUN dpkg-divert --local --rename --add /sbin/initctl
RUN ln -s /bin/true /sbin/initctl
RUN echo "deb http://archive.ubuntu.com/ubuntu precise main universe" > /etc/apt/sources.list
RUN apt-get update
RUN apt-get upgrade -y
RUN apt-get -y install mysql-client mysql-server
#!/bin/sh
# Converts a mysqldump file into a Sqlite 3 compatible file. It also extracts the MySQL `KEY xxxxx` from the
# CREATE block and create them in separate commands _after_ all the INSERTs.
# Awk is choosen because it's fast and portable. You can use gawk, original awk or even the lightning fast mawk.
# The mysqldump file is traversed only once.
# Usage: $ ./mysql2sqlite mysqldump-opts db-name | sqlite3 database.sqlite
# Example: $ ./mysql2sqlite --no-data -u root -pMySecretPassWord myDbase | sqlite3 database.sqlite
@hardingnj
hardingnj / apply.function.colwise
Last active August 29, 2015 14:00
A simple function that takes a dataframe and compares named columns to one another
apply.function.colwise <- function(FUNC, x, x.columns = rownames(x), y.columns = colnames(x), ignore.diag = identical(x.columns, y.columns), ...) {
checks <- c(is.numeric, is.character);
# passing in a factor results in unexpected results, due to implicit numeric recasting
stopifnot(
any(sapply(checks, function(FUN) FUN(y.columns))),
any(sapply(checks, function(FUN) FUN(x.columns)))
);
@hardingnj
hardingnj / maskfasta
Last active August 29, 2015 14:00
Mask regions in fasta where no coverage in a bam file. Useful for calculating null trinucleotide etc distributions
#!/bin/bash
BAM=this.bam
FASTA=that.fa
OUT=theother.fa
bedtools genomecov -ibam $BAM -bga | awk '$4>0' | bedtools maskfasta -fi $FASTA -bed - -fo $OUT;
@hardingnj
hardingnj / speedtest.R
Last active August 29, 2015 14:01
Trivial speed example in R
method.list <- list(
slow = function(iter) {
var <- NULL;
for (i in 1:iter) {
var <- c(
var,
sqrt(i)
);
}
var
@hardingnj
hardingnj / hdf5_compression_test.py
Last active August 29, 2015 14:03
Python compression test
#! /usr/bin/python
#
# This example creates and writes GZIP compressed dataset.
#
import h5py
import numpy as np
#
# Create files
file_gzip = h5py.File('gzip.h5','w')
file_lzf = h5py.File('lzf.h5' ,'w')
@hardingnj
hardingnj / hdf5_compression_test.R
Created July 8, 2014 10:42
hd5 compression test, R doesn't like lzf
#! /usr/bin/R
library(rhdf5)
# Suceeds:
dat_gzip <- tryCatch(
{ dat_gzip <- h5read('gzip.h5', "/"); print(summary(dat_gzip)); },
error = function(e) { stop(e) }
)
# Fails:
#! /usr/bin/python
#
# This example creates and writes GZIP compressed dataset.
#
import h5py
import numpy as np
import random
import string
#
nrow = 1000000;
#! /usr/bin/R
library(rhdf5)
print(h5ls('gzip.h5'));
test_limits <- seq(1e3, 1e6, 1e3)
for(limit in test_limits) {
print(limit)
dat_gzip <- h5read('gzip.h5', "DS1/", index = list(1:20,1:limit));
@hardingnj
hardingnj / shapeit_bug_report
Last active August 29, 2015 14:08
Bash script to highlight potential bug in shapeIt
#! /bin/bash
curl -O https://mathgen.stats.ox.ac.uk/genetics_software/shapeit/files/example.tar.gz
tar -xvzf example.tar.gz
FILE=example/GLs.vcf
gunzip ${FILE}.gz
shapeit --input-vcf $FILE --output-max unzipped.haps unzipped.sample
gzip -c ${FILE} > ${FILE}.gzip.vcf.gz