Skip to content

Instantly share code, notes, and snippets.

View lh3's full-sized avatar

Heng Li lh3

View GitHub Profile
@lh3
lh3 / Dockerfile
Created March 18, 2020 20:26
Tmp dockerfile
#creates a base image from condo
FROM continuumio/miniconda3
SHELL ["/bin/bash", "-c"]
COPY environment.yml .
#run environment
#RUN conda env create -f environment.yml
RUN conda init bash
// To compile:
// gcc -g -O2 example.c libminimap2.a -lz
#include <stdlib.h>
#include <assert.h>
#include <stdio.h>
#include <zlib.h>
#include "minimap.h"
#include "kseq.h"
KSEQ_INIT(gzFile, gzread)
@lh3
lh3 / fast-sqrtf.c
Created August 24, 2019 00:58
Fast square root
// a combination of inverse square root (see wiki) and inversion: https://bits.stephan-brumme.com/inverse.html
static inline float mg_sqrtf(float x)
{
union { float f; uint32_t i; } z = { x };
z.i = 0x5f3759df - (z.i >> 1);
z.f *= (1.5f - (x * 0.5f * z.f * z.f));
z.i = 0x7EEEEEEE - z.i;
return z.f;
}
@lh3
lh3 / paf2sam.js
Created August 23, 2016 17:23
Preliminary and incomplete PAF->SAM converter (for wgsim_eval.pl)
var getopt = function(args, ostr) {
var oli; // option letter list index
if (typeof(getopt.place) == 'undefined')
getopt.ind = 0, getopt.arg = null, getopt.place = -1;
if (getopt.place == -1) { // update scanning pointer
if (getopt.ind >= args.length || args[getopt.ind].charAt(getopt.place = 0) != '-') {
getopt.place = -1;
return null;
}
if (getopt.place + 1 < args[getopt.ind].length && args[getopt.ind].charAt(++getopt.place) == '-') { // found "--"
===> Where do you most often perform bioinformatics analysis <===
1 Personal computer (laptop/desktop)
2 Lab server
3 Departmental server
4 University server/cluster
5 Cloud
6 Other
===> Type of problems <===
@lh3
lh3 / Makefile
Last active October 30, 2018 03:20
Position-specific gap open penalty
gg:ksw2_ggd.c cli.c ksw2.h
$(CC) -Wall -g -O2 -o $@ ksw2_ggd.c cli.c
clean:
rm -fr *.o *.dSYM gg
@lh3
lh3 / x86-simd.c
Created September 5, 2017 17:06
Get supported SIMD instruction sets (x86 only)
#include <stdio.h>
#define SIMD_SSE 0x1
#define SIMD_SSE2 0x2
#define SIMD_SSE3 0x4
#define SIMD_SSSE3 0x8
#define SIMD_SSE4_1 0x10
#define SIMD_SSE4_2 0x20
#define SIMD_AVX 0x40
#define SIMD_AVX2 0x80
@lh3
lh3 / fq_download.md
Last active December 30, 2017 22:22
Downloading gzip'd fastq
Source Dst. file type Protocol Time (s) Command Line
NCBI .sra ftp 296 wget
NCBI .fastq.gz sra toolkit ~23000 fastq-dump -Z --gzip --split-spot
local file sra=>fastq.gz sra toolkit ~15000 fastq-dump --gzip --split-spot --split-3
EBI .fastq.gz aspera 513+492 aspera -QT -l 300m
EBI .fastq.gz ftp 1876+1946 wget

Notes:

CREATE TABLE seq (
checksum TEXT,
ac TEXT, -- INSDC sequence accession, when available
len INTEGER, -- could be of type "TEXT"; no need to implement "less than"
seq TEXT,
PRIMARY KEY (checksum) -- what about collisions?
);
CREATE INDEX seq_len ON seq (len)
CREATE INDEX seq_ac ON seq (ac) -- different checksums may have the same AC

NA12878 DirectRNA reads were obtained [here][raw-data] (passed reads only) and aligned with [minimap2][minimap2] v2.5 against the no_alt_analysis_set of GRCh38 plus SIRV contigs. It took <1 wall-clock hour across 16 CPU cores with command-line options: -cx splice -k14 --cs -uf -N20 -t16. Alignments were converted to BED with the misc/splice2bed.js script from minimap2 and then converted to BigBed. Ribosome-related genes (RPL*, RPS*, EEF* and RPSA) were excluded to reduce the file size. The final BigBed is hosted [at OSF][osf-prj].

A UCSC custom track is configured with

track type=bigBed name=NA12878-DirectRNA.minimap2-2.5 useScore=1 visibility=4 itemRgb="On" bigDataUrl=https://files.osf.io/v1/resources/b5nm2/providers/osfstorage/5a2347599ad5a10272ed5739?action=download&version=1&direct

You can access this track with the [following link][direct-link]. A GMAP alignment track is temporarily available [here][gmap]. This track contains 1/4 of reads. GMAP is still running. It will take 4–5 wall-clock