Skip to content

Instantly share code, notes, and snippets.

View lh3's full-sized avatar

Heng Li lh3

View GitHub Profile
@lh3
lh3 / eigen.c
Created November 8, 2014 04:53
Eigenvalues/vectors for symmetric and asymmetric matrices
#include <math.h>
#include <stdio.h>
#include <stdlib.h>
#define RADIX 2.0
/*************************
* balance a real matrix *
*************************/
@lh3
lh3 / inthash.md
Last active September 5, 2023 21:10 — forked from badboy/inthash.md
@lh3
lh3 / BSdownload.md
Last active August 23, 2023 19:44
Download files from Illumina's BaseSpace

References:

Steps:

  1. Follow steps 1-5 in the first link above to acquire access_token. This will take a while, but you only need to do this once. Never share this token!!
  2. Find the file you want to download. Copy the link, which looks something like: https://basespace.illumina.com/sample/9804795/files/tree/NA12878-L1_S1_L001_R1_001.fastq.gz?id=515013503. The "id" is the unique file identifier.
  3. Download the file with: wget -O filename 'https://api.basespace.illumina.com/v1pre3/files/{id}/content?access_token={token}', where {token} is from step 1 and {id} from step 2.
@lh3
lh3 / inthash.c
Last active December 11, 2022 07:27
Invertible integer hash functions
/*
For any 1<k<=64, let mask=(1<<k)-1. hash_64() is a bijection on [0,1<<k), which means
hash_64(x, mask)==hash_64(y, mask) if and only if x==y. hash_64i() is the inversion of
hash_64(): hash_64i(hash_64(x, mask), mask) == hash_64(hash_64i(x, mask), mask) == x.
*/
// Thomas Wang's integer hash functions. See <https://gist.github.com/lh3/59882d6b96166dfc3d8d> for a snapshot.
uint64_t hash_64(uint64_t key, uint64_t mask)
{
key = (~key + (key << 21)) & mask; // key = (key << 21) - key - 1;
@lh3
lh3 / getopt.c
Last active November 12, 2022 17:23
Portable getopt/getopt_long from musl
#include <stddef.h>
#include <stdio.h>
#include <string.h>
#include "getopt.h"
char *optarg;
int optind=1, opterr=1, optopt, __optpos, optreset=0;
#define optpos __optpos
@lh3
lh3 / 0.00README.md
Last active April 28, 2022 21:04
Mapping short reads with a ~50bp INDEL

This is a small experiment on the alignment of ~50bp INDELs. The query sequences are shown in 0.01.fq below, where seq_ori is a 204bp sequence extracted from the human reference genome, seq_del54 contains a 54bp deletion in the middle, seq_del84 contains a 84bp deletion in a 120bp read, and seq_ins40 contains a 40bp insertion in a 140bp read. These four short sequences were mapped to the human reference genome with Bowtie2, BWA-MEM, LAST, Novoalign, SNAP and Stampy with default settings. Non-default scoring functions were also tested for Bowtie2 (--rdg 5,1 --rfg 5,1), BWA-MEM (-A2 -E1) and LAST (-r2 -q4). The output by various mappers/settings can be found in this gist. The following table gives my summary:

Mapper Setting -84bp -54bp +40bp
BBMAP default Yes Yes Yes
Bowtie2 default No No No
Bowtie2 --rdg 5,1 --rfg 5,1 as insertion as insertion Yes
BWA-MEM default as split Yes Yes
BWA-MEM -A2 -E1 Yes Yes Yes
LAST default as split as split
@lh3
lh3 / 00_cmd.sh
Last active December 16, 2021 13:02
Scripts and command lines to create universal mask for hs37d5
#
# generate 01compositional.bed.gz
#
# low-complexity by mDUST
mdust hs37d5.fa -c -w7 -v28 \
| hs37d5.mdust-w7-v28.txt \
| cut -f1,3,4 \
| gawk -vOFS="\t" '{--$2;print}' \
| bgzip > 01compositional/hs37d5.mdust-w7-v28.bed.gz
@lh3
lh3 / crc32.c
Last active January 21, 2021 17:30
Source code for computing CRC32 and SHA1 checksum
#include <zlib.h>
#include <fcntl.h>
#include <stdio.h>
#define BUF_LEN 0x100000
int main(int argc, char *argv[])
{
uLong crc = crc32(0L, Z_NULL, 0);
unsigned char buf[BUF_LEN];
@lh3
lh3 / Dockerfile
Created March 18, 2020 20:26
Tmp dockerfile
#creates a base image from condo
FROM continuumio/miniconda3
SHELL ["/bin/bash", "-c"]
COPY environment.yml .
#run environment
#RUN conda env create -f environment.yml
RUN conda init bash