Skip to content

Instantly share code, notes, and snippets.

View lh3's full-sized avatar

Heng Li lh3

View GitHub Profile
@lh3
lh3 / occ1.c
Last active August 29, 2015 14:14
Get a k-mer count from the fermi2 index
/* To compile this program, you need rld0.{h,c} from ropebwt2 or fermi2:
*
* gcc -g -O2 -Wall -o occ1 rld0.c occ1.c
*/
#include <string.h>
#include <stdint.h>
#include <stdio.h>
#include "rld0.h"
@lh3
lh3 / vqsr-b37.mak
Last active August 29, 2015 14:19
Makefile for VQSR
# VQSR for single human sample mapped to b37. Usage:
#
# make -f vqsr-b37.mak prefix=my-vcf-prefix
#
# For this to work, the input VCF must be named as "my-vcf-prefix.vcf" and
# there must be "GenomeAnalysisTK.jar" and "b37" directory from the GATK
# resource bundle in the working directory.
#
# References:
#

Querying Genotypes with Standard SQL

This document demonstrates that we can use three conceptual SQL tables for variant annotations, sample phenotypes and genotypes, respectively, to achieve flexible query of genotype data. The tables are conceptual in that they are primarily used for constructing queries but not for storing real data as they are not efficient enough for large-scale data.

We will use an example to show the structure of the three tables and how to query data in standard SQL. You can find all the SQL statements in

@lh3
lh3 / mastermind.c
Last active August 29, 2015 14:23
MasterMind game, with the computer as the code breaker. Dedicated to my daughter.
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <time.h>
const char *mm_colors = "RGBWOPYSE";
const char *mm_numbers = "123456789";
long *mm_enumerate(int m, int n, int rep, long *_z) // m colors, n columns
var max_leftover = 0.1, max_diff = 0.05, min_mapq = 20;
var file = arguments.length == 0? new File() : new File(arguments[0]);
var buf = new Bytes();
var re = /(\d+)([MIDSH])/g;
var decoy = {};
while (file.readline(buf) >= 0) {
var m, line = buf.toString();
if ((m = /^@SQ.*\tSN:(\S+).*\tLN:(\d+)/.exec(line)) != null) {
@lh3
lh3 / calD.c
Last active October 11, 2015 12:28
Compute the D-statistic from 4 consensus FASTQ
#include <zlib.h>
#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <math.h>
#include "kseq.h"
@lh3
lh3 / max.js
Created July 19, 2013 18:00
Given a discrete one-dimention function f(x), fit it with Bernstein polynomial and the find the max. k8 is required.
var getopt = function(args, ostr) {
var oli; // option letter list index
if (typeof(getopt.place) == 'undefined')
getopt.ind = 0, getopt.arg = null, getopt.place = -1;
if (getopt.place == -1) { // update scanning pointer
if (getopt.ind >= args.length || args[getopt.ind].charAt(getopt.place = 0) != '-') {
getopt.place = -1;
return null;
}
if (getopt.place + 1 < args[getopt.ind].length && args[getopt.ind].charAt(++getopt.place) == '-') { // found "--"
@lh3
lh3 / sam_uniq.pl
Created July 19, 2013 18:07
Given a SAM containing multiple hits and the AS tag, choose one with the maximum AS.
#!/usr/bin/env perl
use strict;
use warnings;
&main;
sub main {
my $last = '';
my @lines;
@lh3
lh3 / 00README.md
Last active February 9, 2016 18:38

This gist implements a proof-of-concept web server to provide remote access to multiple BAMs. You can launch the server with:

go run hts-server.go -e /path/to/samtools bam-dir1 bam-dir2

and test it with:

curl -s http://127.0.0.1:8000/

The server regards bam-dir1 and bam-dir2 as study accessions. For a BAM file bam-dir1/myfile1.bam, the file accession is myfile1. It is required that every file has a unique file name across all input directories (i.e. even in two directories, two files must be named differently).

@lh3
lh3 / 00_fsi.md
Last active March 6, 2016 04:32
Recoding-free fast BAM slicing for BAMs generated by samtools-0.1.8+

This gist demonstrates fast BAM slicing without re-compressing the alignment data. It only works for BAMs generated by samtools-0.1.8+ and htslib, as it requires the BAM to be block aligned.

To use this gist, you need to first acquire and compile htslib. After that:

cd htslib; gcc -g -O2 -Wall -o fsi -I. /path/to/fsi.c libhts.a -lz
./fsi aln.bam > aln.fsi
perl fsi-get.pl aln.bam aln.fsi chr11:1000000-1200000 > slice.bam