Skip to content

Instantly share code, notes, and snippets.

View lh3's full-sized avatar

Heng Li lh3

View GitHub Profile
@lh3
lh3 / calD.c
Last active October 11, 2015 12:28
Compute the D-statistic from 4 consensus FASTQ
#include <zlib.h>
#include <stdint.h>
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <ctype.h>
#include <string.h>
#include <math.h>
#include "kseq.h"
@lh3
lh3 / gem2sam.pl
Last active May 1, 2017 16:53
Convert GEM alignment format to SAM. Mate positions are not computed in this version.
#!/usr/bin/env perl
use strict;
use warnings;
die("Usage: gem2sam.pl <gem.map>\n") if (@ARGV == 0 && -t STDIN);
while (<>) {
chomp;
my @t = split("\t");
@lh3
lh3 / max.js
Created July 19, 2013 18:00
Given a discrete one-dimention function f(x), fit it with Bernstein polynomial and the find the max. k8 is required.
var getopt = function(args, ostr) {
var oli; // option letter list index
if (typeof(getopt.place) == 'undefined')
getopt.ind = 0, getopt.arg = null, getopt.place = -1;
if (getopt.place == -1) { // update scanning pointer
if (getopt.ind >= args.length || args[getopt.ind].charAt(getopt.place = 0) != '-') {
getopt.place = -1;
return null;
}
if (getopt.place + 1 < args[getopt.ind].length && args[getopt.ind].charAt(++getopt.place) == '-') { // found "--"
@lh3
lh3 / sam_uniq.pl
Created July 19, 2013 18:07
Given a SAM containing multiple hits and the AS tag, choose one with the maximum AS.
#!/usr/bin/env perl
use strict;
use warnings;
&main;
sub main {
my $last = '';
my @lines;
@lh3
lh3 / human_missing.txt
Created July 3, 2014 17:47
Supposed human sequences having no hits to GRCh38+ALT+hs37d5+EBV
>670363126:1315126374:97-3576:131-3480:1-3350 no hit?!
TTGAACAAGTTCAATTTAAAATATTTCATAAGCTTGTATGATATCATTGGGGCTAGAGTAGACGTGATACTAAATGTAGGAAATAAATGGAATAAACACTCAGAGAGGAACTATATGTTTTGTCTTAAAACTTCATTAAGTCTAAGAATTGATATTTTGTTGCTGGATGAATTGGTGGTTTGTGAGTTATAGTGAATATATCTCCCTGTTACATTCCTGTCGTTGTCCTATCCATTGTCCTATTGTTGGATGTTTCACTTGTGTATCCCATGTTACATAATTTTATCATAGAGAGCAAAAGAATTTGACTACAGTAAACTTAATGCTTTGCCATGTACAGATGTATGACAAAGAACATGTTATTTTGAAAGTTTTTGCTCTTATATTTTCTTTGCAGTCTGACTTCTTAATATTTGCTTCATTTTCATGTATGTGTTTTTACAAAGCATTCTCTTGGTATCACTAACCAAATACCTGTTTAGAGAAACAAAAGGCCAACTGGGGATGAAAACAGTTTATTTACTTACTGCACAGTATGTTTCTAGCTCTCTGTTATAAGAACAACAGTGTATTTGGTGTGTGTGTAGAACATACTTATTAAAAGTTTGATTCCTCTTTCCCCCGCAGATTCCATGATGAAGCCTTGTGATTTTGTGTCTTTACAAACTGTTCACTGATCTGGCAATGAAAATTGCCTCGATGTTAATAAATATTAGTTCATTAGCTATGCTTAATTAACACAAATGAAACTGATGAACCCAGTGTTGTGTGTTATTGCAAGCTTTCTATTTATATCAGCAAACCTCAGCGTTTATGATTTATTAAATAATTTATTCAATAGGCATTAATTGAGTATCAAGTATCAGTCAGAGACTTTGCTAAATGTTGCAAACCCAAATATGTCTTAAAGGAGTATACTGTCTTTCAAAAAATAGATGGCTAAAAAATAGTTATAATATAGATGATATA
// gcc -g -O2 -Wall fmr-rld.c mrope.c rle.c rope.c rld0.c -o fmr-rld -pthread
#include <stdio.h>
#include <stdint.h>
#include "rle.h"
#include "rld0.h"
#include "mrope.h"
int main(int argc, char *argv[])
{
@lh3
lh3 / kfreq.c
Last active August 29, 2015 14:05
#include <zlib.h>
#include <stdio.h>
#include <string.h>
#include <assert.h>
#include "kseq.h"
KSEQ_INIT(gzFile, gzread)
unsigned char seq_nt6_table[256] = {
0, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5,
@lh3
lh3 / BSdownload.md
Last active August 23, 2023 19:44
Download files from Illumina's BaseSpace

References:

Steps:

  1. Follow steps 1-5 in the first link above to acquire access_token. This will take a while, but you only need to do this once. Never share this token!!
  2. Find the file you want to download. Copy the link, which looks something like: https://basespace.illumina.com/sample/9804795/files/tree/NA12878-L1_S1_L001_R1_001.fastq.gz?id=515013503. The "id" is the unique file identifier.
  3. Download the file with: wget -O filename 'https://api.basespace.illumina.com/v1pre3/files/{id}/content?access_token={token}', where {token} is from step 1 and {id} from step 2.
@lh3
lh3 / 00_cmd.sh
Last active August 29, 2015 14:06
# generate the list of novel segments for each sample
seqtk seq -L200 001*.mag.gz | ./fermi2 match -dk101 hs38a.fmd /dev/stdin | k8 hs38d.js fm2seg -p 001 | gzip > 001.novel.gz
k8 hs38d.js tab2fa 001.novel.gz | bwa-r868 mem s38a.fa - | k8 hs38d.js dropsim | gzip -1 > 001.dropped.gz
tabtk isct -c 001.dropped.gz 001.novel.gz | gzip -1 > 001.div1.gz
#
zcat div1/*.div1.gz | cut -f1-5 | sort -k5,5 -k1,1 -S60G | k8 hs38d.js fixname | gzip -1 > 01ori.txt.gz
# tabtk isct -c gbk.novel.dropped gbk.novel.gz | sort -k5,5 -k1,1 -S60G | grep -v NNNNNNNNNN | gzip -1 > 01ori.txt.gz
tabtk isct -d_ -c contaminated.txt 01ori.txt.gz | gzip -1 > 02contam.txt.gz
# zcat 01ori.txt.gz|grep -vFf DoNotUseList_May2011.list | gzip -1 > 02contam.txt.gz
@lh3
lh3 / duprate.tex
Last active August 29, 2015 14:07
\documentclass[pdftex,10pt]{article}
\usepackage{amsmath}
\usepackage{amssymb}
\title{PCR and mapping duplicate rate in NGS}
\author{Heng Li}
\begin{document}