Original link: http://www.concentric.net/~Ttwang/tech/inthash.htm
Taken from: http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm
Reformatted using pandoc
Thomas Wang, Jan 1997
last update Mar 2007
#include <math.h> | |
#include <stdio.h> | |
#include <stdlib.h> | |
#define RADIX 2.0 | |
/************************* | |
* balance a real matrix * | |
*************************/ |
Original link: http://www.concentric.net/~Ttwang/tech/inthash.htm
Taken from: http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm
Reformatted using pandoc
Thomas Wang, Jan 1997
last update Mar 2007
References:
Steps:
https://basespace.illumina.com/sample/9804795/files/tree/NA12878-L1_S1_L001_R1_001.fastq.gz?id=515013503
. The "id" is the unique file identifier.wget -O filename 'https://api.basespace.illumina.com/v1pre3/files/{id}/content?access_token={token}'
, where {token} is from step 1 and {id} from step 2./* | |
For any 1<k<=64, let mask=(1<<k)-1. hash_64() is a bijection on [0,1<<k), which means | |
hash_64(x, mask)==hash_64(y, mask) if and only if x==y. hash_64i() is the inversion of | |
hash_64(): hash_64i(hash_64(x, mask), mask) == hash_64(hash_64i(x, mask), mask) == x. | |
*/ | |
// Thomas Wang's integer hash functions. See <https://gist.github.com/lh3/59882d6b96166dfc3d8d> for a snapshot. | |
uint64_t hash_64(uint64_t key, uint64_t mask) | |
{ | |
key = (~key + (key << 21)) & mask; // key = (key << 21) - key - 1; |
#include <stddef.h> | |
#include <stdio.h> | |
#include <string.h> | |
#include "getopt.h" | |
char *optarg; | |
int optind=1, opterr=1, optopt, __optpos, optreset=0; | |
#define optpos __optpos |
This is a small experiment on the alignment of ~50bp INDELs. The query sequences are shown in 0.01.fq
below, where seq_ori
is a 204bp sequence extracted from the human reference genome, seq_del54
contains a 54bp deletion in the middle, seq_del84
contains a 84bp deletion in a 120bp read, and seq_ins40
contains a 40bp insertion in a 140bp read. These four short sequences were mapped to the human reference genome with Bowtie2, BWA-MEM, LAST, Novoalign, SNAP and Stampy with default settings. Non-default scoring functions were also tested for Bowtie2 (--rdg 5,1 --rfg 5,1), BWA-MEM (-A2 -E1) and LAST (-r2 -q4). The output by various mappers/settings can be found in this gist. The following table gives my summary:
Mapper | Setting | -84bp | -54bp | +40bp |
---|---|---|---|---|
BBMAP | default | Yes | Yes | Yes |
Bowtie2 | default | No | No | No |
Bowtie2 | --rdg 5,1 --rfg 5,1 | as insertion | as insertion | Yes |
BWA-MEM | default | as split | Yes | Yes |
BWA-MEM | -A2 -E1 | Yes | Yes | Yes |
LAST | default | as split | as split |
# | |
# generate 01compositional.bed.gz | |
# | |
# low-complexity by mDUST | |
mdust hs37d5.fa -c -w7 -v28 \ | |
| hs37d5.mdust-w7-v28.txt \ | |
| cut -f1,3,4 \ | |
| gawk -vOFS="\t" '{--$2;print}' \ | |
| bgzip > 01compositional/hs37d5.mdust-w7-v28.bed.gz |
#include <zlib.h> | |
#include <fcntl.h> | |
#include <stdio.h> | |
#define BUF_LEN 0x100000 | |
int main(int argc, char *argv[]) | |
{ | |
uLong crc = crc32(0L, Z_NULL, 0); | |
unsigned char buf[BUF_LEN]; |
#creates a base image from condo | |
FROM continuumio/miniconda3 | |
SHELL ["/bin/bash", "-c"] | |
COPY environment.yml . | |
#run environment | |
#RUN conda env create -f environment.yml | |
RUN conda init bash |