Skip to content

Instantly share code, notes, and snippets.

View lh3's full-sized avatar

Heng Li lh3

View GitHub Profile
#include <unistd.h>
#include <assert.h>
#include <stdlib.h>
#include <stdio.h>
#include "kstring.h"
#include "sam.h"
#include "kbtree.h"
typedef struct {
int pos, n, sum;
@lh3
lh3 / gem2sam.pl
Last active May 1, 2017 16:53
Convert GEM alignment format to SAM. Mate positions are not computed in this version.
#!/usr/bin/env perl
use strict;
use warnings;
die("Usage: gem2sam.pl <gem.map>\n") if (@ARGV == 0 && -t STDIN);
while (<>) {
chomp;
my @t = split("\t");
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;
my %opts = (t=>8, k=>17, m=>0.6, s=>200, p=>'wtasm');
getopts('t:p:k:m:s:', \%opts);
die (qq/Usage: smartdenovo.pl [options] <reads.fa>
Options:
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;
my %opts = ();
getopts('fd:', \%opts);
die("Usage: pexe.pl [options] <exe1> [exe2 [...]]
# OUTSIDE VM: launch an instance
gcloud compute instances create lh3test --machine-type n1-standard-1 --image-project ubuntu-os-cloud --image-family ubuntu-1604-lts
#
# INSIDE VM!!!
#
# install dependencies
sudo apt-get update
sudo apt-get install libboost-all-dev cmake libstdc++6 clang-3.8 cmake git-all python3 zlib1g-dev

Writing a program for efficient matrix multiplication is tricky. A naive implementation usually looks like this (I will use a square matrix for simplicity):

for (i = 0; i < n; ++i)
  for (j = 0; j < n; ++i)
    for (k = 0, m[i][j] = 0.; k < n; ++i)
      m[i][j] += a[i][k] * b[k][j];

The problem is the inner loop a[i][k] * b[k][j] where b[k][j] and b[k+1][j] are not adjacent in memory. This leads to frequent cache misses for large matrices. The better implementation is to transpose matrix b. The implementation will look like:

for (i = 0; i &lt; n; ++i) // transpose
#!/usr/bin/perl
use strict;
use warnings;
use Getopt::Std;
my %opts = (t=>'ct', m=>50);
getopts('t:m:3', \%opts);
my $type;
#!/usr/bin/env perl
use strict;
use warnings;
use Getopt::Std;
my %opts = (g=>37, l=>2, L=>100);
getopts('g:l:L:o:Huc', \%opts);
# check path
@lh3
lh3 / 00_fsi.md
Last active March 6, 2016 04:32
Recoding-free fast BAM slicing for BAMs generated by samtools-0.1.8+

This gist demonstrates fast BAM slicing without re-compressing the alignment data. It only works for BAMs generated by samtools-0.1.8+ and htslib, as it requires the BAM to be block aligned.

To use this gist, you need to first acquire and compile htslib. After that:

cd htslib; gcc -g -O2 -Wall -o fsi -I. /path/to/fsi.c libhts.a -lz
./fsi aln.bam > aln.fsi
perl fsi-get.pl aln.bam aln.fsi chr11:1000000-1200000 > slice.bam