Skip to content

Instantly share code, notes, and snippets.

View standage's full-sized avatar

Daniel Standage standage

View GitHub Profile
@standage
standage / noble-procedure.sh
Created December 28, 2017 23:19
This has been replaced by a Snakemake workflow, but thought I should save it for posterity.
#!/usr/bin/env bash
set -eo pipefail
echo -n "Simulate random genome sequences..."
starttime="$(date +%s)"
nuclmm simulate --out helium-refr.fa --order 6 --numseqs 1 --seqlen 2500000 --seed 8013785666 human.order6.mm
nuclmm simulate --out neon-refr.fa --order 6 --numseqs 1 --seqlen 25000000 --seed 7066509186 human.order6.mm
nuclmm simulate --out argon-refr.fa --order 6 --numseqs 1 --seqlen 250000000 --seed 3015700578 human.order6.mm
gzip -f {helium,neon,argon}-refr.fa
elapsed="$(($(date +%s)-starttime))"
#!/usr/bin/env python
from __future__ import print_function
import glob
import os
m4a_files = glob.glob('*/*/*.m4a')
mp3_files = glob.glob('*/*/*.mp3')
to_delete = list()
for m4a in m4a_files:
import argparse
import khmer
import pandas
parser = argparse.ArgumentParser()
parser.add_argument('--counts', nargs='+')
parser.add_argument('--samples', nargs='+')
parser.add_argument('window', nargs='+')
args = parser.parse_args()
from __future__ import print_function
from collections import defaultdict
import argparse
import khmer
import statistics
import sys
import time
allocators = {
'ct': khmer.Counttable,
@standage
standage / bwa-mem-dockstore.cwl
Last active July 14, 2017 20:35
Playing around with wrapping "bwa mem" in Common Workflow Language
class: CommandLineTool
doc: BwaMem
id: bwa-mem-0.7.15
label: "bwa mem v0.7.5"
cwlVersion: v1.0
dct:creator:
"@id": "http://orcid.org/0000-0003-0342-8531"
foaf:name: Daniel Standage
template<typename HasherType, typename StorageType>
class Sketch
{
protected:
StorageType storage;
public:
Sketch(size_t Wordsize, size_t table_size, size_t num_tables = 4);
add(std::string& kmer);
get(std::string& kmer);
# 1. Dump reads that match reference genome perfecly
kevlar dump GRCh38_full_analysis_set_plus_decoy_hla.fa NA91238.bam | gzip -c > NA19238.dump.fq.gz
kevlar dump GRCh38_full_analysis_set_plus_decoy_hla.fa NA91239.bam | gzip -c > NA19239.dump.fq.gz
kevlar dump GRCh38_full_analysis_set_plus_decoy_hla.fa NA91240.bam | gzip -c > NA19240.dump.fq.gz
# 2. Count k-mers, find interesting k-mers, print out corresponding reads
## Option a: the old way
kevlar novel \
@standage
standage / id-replace.py
Created March 30, 2017 05:12
Script for replacing gene IDs with protein IDs for Laurynne's citrus psyllid RNASeq analysis.
#!/usr/bin/env python
from __future__ import print_function
import argparse
import tag
#------------------------------------------------------------------------------
# Declare the command-line interface:
# how the user specifies input files and adjusts settings
#------------------------------------------------------------------------------
>ERR899711.82875865/2
CGGGGTTTCCCCGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCATATATGTTTTTAAGA
>ERR899711.228335376/1
CGTGTTAGCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCATATATGTTTTTAAGAAAACTTTTTTT
>ERR894724.61176304/2
GCCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCATATATGTTTTTAAGAAAACTTTTTTTGGATGCC
>ERR899711.203563977/2
CCAGGATGGTCTCGATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCATATATGTTTTTAAGAAAACTTTCTTTGGATGCCC
>ERR899709.43552857/1
GATCTCCTGACCTCGTGATCCGCCCGCCTCGGCCTCCCAAAGTGCTGGGATTACAGGCGTGAGCCACCGCGCCCGGCCATATATGTTTTTAAGAAAACTTTTTTTGGATGCCCAGGCCGACAGATC
for i in 2 7 8
do
fastq-dump --split-files --defline-seq '@$ac.$si.$sg/$ri' --defline-qual '+' -Z --gzip SRR03025${i} \
> SRR03025${i}.fq.gz
done
for i in 2 7 8
do
trim-low-abund.py -k 17 -M 250M --variable-coverage -o SRR03025${i}.trim.fq.gz --gzip SRR03025${i}.fq.gz \
2> SRR03025{}.trim.log &