Skip to content

Instantly share code, notes, and snippets.

View mlin's full-sized avatar

Mike Lin mlin

View GitHub Profile
@mlin
mlin / Dockerfile
Last active February 16, 2024 04:11
PhyloCSF Dockerfile (basic)
# Basic Dockerfile for PhyloCSF.
# Example usage:
# docker build -t mlin:PhyloCSF https://gist.githubusercontent.com/mlin/31c0a7623f99d3bf3222/raw/Dockerfile
# docker run -v /path/to/host/data:/data mlin:PhyloCSF 29mammals /data/input.fa
# PhyloCSF homepage: https://github.com/mlin/PhyloCSF/wiki
FROM ubuntu:trusty
MAINTAINER Mike Lin <mlin@mlin.net>
RUN apt-get update
RUN DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-suggests --no-install-recommends ca-certificates software-properties-common time git build-essential gfortran
RUN add-apt-repository ppa:avsm/ppa
@mlin
mlin / README.md
Last active December 1, 2023 13:56
static.wiki database compression

Context: static.wiki and Show HN post

We downloaded static.wiki's 40.3 GiB SQLite database of English Wikipedia and created a compressed version of it with sqlite_zstd_vfs, our read/write Zstandard compression layer for SQLite3. The compressed version is 10.4 GiB (26%), and the VFS supports HTTP random access in the spirit of the original (although we don't yet have a WebAssembly build; it's a library for CLI & desktop apps for now). You can try it out on Linux or macOS x86-64:

pip3 install genomicsqlite
genomicsqlite https://f000.backblazeb2.com/file/mlin-public/static.wiki/en.zstd.db \
    "select text from wiki_articles where title = 'SQLite'"
@mlin
mlin / wga_dotplot.R
Last active July 26, 2023 06:42
whole-genome alignment dotplots
require(ggplot2)
require(data.table)
require(compiler)
# First run LASTZ to produce MAF output. Options for matching near-identical assemblies: --notransition --seed=match15 --step=10 --hspthresh=100000 --gapped --ydrop=3400 --gappedthresh=400000 --ambiguous=iupac --allocate:traceback=200M
# Then we postprocess the MAF with: cat alignments.maf | awk 'NF' | tr -s " " | cut -d ' ' -f1-6 | paste - - - | tr " " "\t" | tr "=" "\t" | cut -f 3,5-9,11-15 | gzip -c > alignments.gz
# read and filter hits
fn <- "alignments.gz"
all.hits <- read.delim(gzfile(fn),header=FALSE)
colnames(all.hits) <- c("score","target","tpos","thitlen","tstrand","tlen","query","qpos","qhitlen","qstrand","qlen")
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta name="generator" content="pandoc" />
<meta http-equiv="X-UA-Compatible" content="IE=EDGE" />
@mlin
mlin / Dockerfile
Last active December 20, 2021 09:42
miniwdl udocker-in-docker PoC
FROM ubuntu:20.04
RUN apt-get -qq update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
wget curl python3-pip python-is-python3
RUN pip3 install --system miniwdl==1.4.2
ENV UDOCKER_VERSION=1.3.1
WORKDIR /usr/local
RUN wget -nv https://github.com/indigo-dc/udocker/releases/download/v${UDOCKER_VERSION}/udocker-${UDOCKER_VERSION}.tar.gz \
&& tar zxf udocker-${UDOCKER_VERSION}.tar.gz \
&& rm udocker-${UDOCKER_VERSION}.tar.gz
@mlin
mlin / sqlite_seekscan_regression.py
Created June 1, 2021 06:06
SQLite OP_SeekScan regression
#!/usr/bin/env python3
# run this script using LD_LIBRARY_PATH to manipulate the SQlite3 library version
import os
import random
import time
import sqlite3
N = 100000
random.seed(42)
@mlin
mlin / htsget-openapi-docs.html
Created May 12, 2021 05:06
htsget-openapi-docs.html
This file has been truncated, but you can view the full file.
<!DOCTYPE html>
<html>
<head>
<meta charset="utf8" />
<title>htsget</title>
<!-- needed for adaptive design -->
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>
body {
@mlin
mlin / dx-ci-init.py
Last active October 14, 2020 06:25
Suggested best practices for DNAnexus workflow development & continuous integration
#!/usr/bin/env python
#
# Initializes a git repository with some suggested best practices for DNAnexus
# workflow development & continuous integration. Run dx-ci-init.py in the root
# of your git repository; it creates the following, which you should then
# customize as you like:
#
# applets/hello-world
# Trivial applet template which you can build on or copy.
#
@mlin
mlin / paste_wdl_imports.py
Last active March 28, 2020 03:50
paste_wdl_imports.py
#!/usr/bin/env python3
"""
Generate a standalone WDL document from a given workflow using imported tasks. Requires: miniwdl
python3 paste_wdl_imports.py [-o STANDALONE.wdl] WORKFLOW.wdl
For each "call imported_namespace.task_name [as alias]" in the workflow, appends the task's source
code with the task name changed to "imported_namespace__task_name", and rewrites the call to refer
to this new name (keeping the original alias). Also blanks out the import statements.