Skip to content

Instantly share code, notes, and snippets.

View heuermh's full-sized avatar

Michael L Heuer heuermh

View GitHub Profile
@heuermh
heuermh / polymath-m1-install.md
Last active April 22, 2023 11:41
samim23/polymath M1 installation walkthrough

Clone samim32/polymath from Github

$ git clone https://github.com/samim23/polymath
$ cd polymath/

Install conda and activate

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-arm64.sh
$ bash Miniconda3-latest-MacOSX-arm64.sh -b -p `pwd`/miniconda
@heuermh
heuermh / duckdb-to-parquet.sh
Created January 11, 2023 17:39
Convert fastq to Parquet with zstd compression via duckdb
#!/bin/bash
echo "converting FASTQ to tab-delimited text format, one read per line..."
dsh-bio fastq-to-text -i seqkit-benchmark-data/dataset_C.fq -o seqkit-benchmark-data/dataset_C.txt
echo "dataset_C.txt:"
head -n 2 seqkit-benchmark-data/dataset_C.txt
echo "CREATE TABLE reads(description VARCHAR, sequence VARCHAR, quality VARCHAR);" > convert.sql
echo "COPY reads FROM 'seqkit-benchmark-data/dataset_C.txt' (AUTO_DETECT TRUE);" >> convert.sql
@heuermh
heuermh / xz-zstd-compression.sh
Created March 31, 2022 21:30
Benchmark xz and zstd performance
#!/bin/bash
set -x -e
SAMPLE="dataset_C"
# Compress using `xz`
time xz --compress --stdout $SAMPLE.fq > $SAMPLE.default.fq.xz
time xz --compress --stdout -0 $SAMPLE.fq > $SAMPLE.0.fq.xz
time xz --compress --stdout -9 $SAMPLE.fq > $SAMPLE.9.fq.xz
time xz --compress --stdout --extreme $SAMPLE.fq > $SAMPLE.extreme.fq.xz
@heuermh
heuermh / prepare_fastq_benchmark.nf
Created May 17, 2021 18:29
Spark shell in Nextflow process example
#!/usr/bin/env nextflow
/*
* The authors of this file license it to you under the
* Apache License, Version 2.0 (the "License"); you may not
* use this file except in compliance with the License. You
* may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*

freebayes performance notes

Small dataset, GIAB whole exome, chr 21 and 22 only

BAM → VCF

On laptop with 8 cores, 32G ram

$ time freebayes --fasta-reference /data/Homo_sapiens_assembly19.fasta --strict-vcf /data/151002_7001448_0359_AC7F6GANXX_Sample_HG002-EEogPU_v02-KIT-Av5_AGATGTAC_L008.posiSrt.markDup.21.22.bam > /data/151002_7001448_0359_AC7F6GANXX_Sample_HG002-EEogPU_v02-KIT-Av5_AGATGTAC_L008.posiSrt.markDup.21.22.vcf
@heuermh
heuermh / README.md
Created March 3, 2020 00:01
GFA 1.0 notes

GFA 1.0 notes

Compress with dsh-bio compress-gfa1 (this parses and validates GFA 1.0 records before writing) to Block Gzipped (BGZF, .bgz), Bzip2 (.bz2), Gzip (.gz), and Zstandard (.zst) formats

gunzip human__pan.AF0__18.gfa.gz
dsh-bio compress-gfa1 -i human__pan.AF0__18.gfa -o human__pan.AF0__18.gfa.bgz
dsh-bio compress-gfa1 -i human__pan.AF0__18.gfa -o human__pan.AF0__18.gfa.bz2
@heuermh
heuermh / Gfa1Example.java
Last active December 10, 2019 21:27
Example for dishevelled-bio assembly gfa1 package
import static org.dishevelled.compress.Readers.reader;
import java.io.*;
import org.dishevelled.bio.assembly.gfa1.*;
void read(@Nullable File file) {
// this Readers.reader method has some magic in it:
// if file is null, use stdin
@heuermh
heuermh / .bash_profile
Created March 22, 2019 15:59
Bash prompt
# Generated 2015-09-14 by http://www.gilesorr.com/Code/bpb/ ,
# The Bash Prompt Builder written by Giles Orr.
case ${TERM} in
xterm*|rxvt*)
TITLEBAR='\[\033]0;\u@\h:\w\007\]';;
*)
TITLEBAR='';;
esac;
# Elegant code courtesy of nitrous.io:
[INFO] --- exec-maven-plugin:1.5.0:exec (doc-r) @ adam-r-spark2_2.11 ---
R version 3.1.1 (2014-07-10) -- "Sock it to Me"
Copyright (C) 2014 The R Foundation for Statistical Computing
Platform: x86_64-redhat-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
#!/bin/bash
cannoli-submit \
interleaveFastq \
sample_1.fq \
sample_2.fq \
sample.ifq
cannoli-submit \
bwa \