Skip to content

Instantly share code, notes, and snippets.

View EvanTheB's full-sized avatar

Evan Benn EvanTheB

  • Sydney, Australia
View GitHub Profile
@EvanTheB
EvanTheB / likes.sh
Created October 1, 2019 02:47
confluence most liked pages
#!/usr/bin/env bash
set -euo pipefail
# set -x
rm likes
curl -sS -n -G -H 'Content-Type: application/json' https://intranet.gimr.garvan.org.au/rest/api/content/ --data-urlencode limit=500 --data-urlencode type=page --data-urlencode type=page > tmpdata
for i in $(seq 500); do
##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##reference=ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/phase2_reference_assembly_sequence/hs37d5.fa.gz
##contig=<ID=1,length=249250621,md5=1b22b98cdeb4a9304cb5d48026a85128>
##contig=<ID=2,length=243199373,md5=a0d9851da00400dec1098a9255ac712e>
@EvanTheB
EvanTheB / gist:5b64eafb84eeaf51c289295ac06e1b0b
Created July 31, 2019 01:46
Why does one awk go faster than the other?
File is ~100 million lines, with ~30 categories in the first column. So sort type 1 should execute ~30 subprocesses.
Takes ~ 180 seconds, so subprocess overhead should be very minimal.
Type 1, multiple sub-sorts:
$ time env time -v -o sort1.time awk -v cmd="sort -k2,2n" '$1 != prev {close(cmd); prev=$1} {print | cmd}' vcf.bed > vcf.sort1.bed
real 2m57.867s
user 0m46.381s
sys 4m14.309s
function autolike () {
[].forEach.call(
document.getElementById("likes-and-labels-container")
.getElementsByClassName("like-button-text"),
function (e) {
console.error(e);
if (e.innerText === "Like") {
e.parentNode.click();
// [].forEach.call(
// document.getElementsByClassName("like-button"),
#!/usr/bin/env bash
set -euo pipefail
CACHE_FILE="$HOME"/.cache/qstat.map
# a 1 minute timed cache
# does not actually handle failures properly. todo
if ! test -e $CACHE_FILE || (( $(find "$CACHE_FILE" -mmin +1 | wc -l) > 0)); then
CACHE_FILE_TMP="$CACHE_FILE.$$.$RANDOM"
trap 'rm '"$CACHE_FILE_TMP" EXIT
import random
import collections
def num_cards():
cures = [0] * 48 + [1] * 4
random.shuffle(cures)
def split(x, n):
return [x[i:len(x):n] for i in range(n)]
cures = [c + [2] for c in split(cures, 5)]
[random.shuffle(c) for c in cures]
#!/bin/bash
set -xeuo pipefail
IFS=$'\n\t'
# git clone https://github.com/cloudflare/zlib &
[ -d "zlib-1.2.8" ] || curl -L https://github.com/cloudflare/zlib/archive/v1.2.8.tar.gz | tar zx &
# git clone https://github.com/ebiggers/libdeflate.git &
[ -d "libdeflate-1.0" ] || curl -L https://github.com/ebiggers/libdeflate/archive/v1.0.tar.gz | tar zx &
# git clone https://github.com/samtools/htslib.git &
[ -d "htslib-1.8" ] || curl -L https://github.com/samtools/htslib/releases/download/1.8/htslib-1.8.tar.bz2 | tar jx &
task bwa_mem_tool {
Int threads
Int min_seed_length
Int min_std_max_min
command {
echo ${threads} ${min_seed_length} ${sep=',' min_std_max_min+} > output.sam
}
output {
File sam = "output.sam"
@EvanTheB
EvanTheB / c timeit
Created January 29, 2018 01:47
helper macro for timing a function in c - beware of optimisation
#define TIMEIT(F, N, REPS) \
{\
clock_t start = clock();\
for (int i = 0; i < N; ++i)\
{\
}\
clock_t diff_loop = clock() - start;\
double results[REPS];\
for (int i = 0; i < REPS; ++i)\
{\