Skip to content

Instantly share code, notes, and snippets.

View colindaven's full-sized avatar

Colin Davenport colindaven

  • Germany
View GitHub Profile
@colindaven
colindaven / gist:27f49a8e79fa384499c5f81b2fa888de
Last active February 20, 2024 10:20
Using Nextflow in Galaxy
# Using Nextflow in Galaxy
* Nextflow is the workflow management tool used widely in bioinformatics
* Galaxy is a server based GUI tool with a nice interface and is also used to start workflows
## Can it be done ?
* There are few reports online, but it is possible. eg https://galaxyproject.org/blog/2022-08-15-making-nextflow-work-with-galaxy-at-cfsan-fda/
* The main requirements:
* a) install Nextflow using a direct JAR download, not via conda. This may be because it collides with the internal Galaxy conda environments
@vals
vals / Exploratory analysis with scVI.ipynb
Last active April 13, 2024 18:23
Exploratory analysis with scVI
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

freebayes performance notes

Small dataset, GIAB whole exome, chr 21 and 22 only

BAM → VCF

On laptop with 8 cores, 32G ram

$ time freebayes --fasta-reference /data/Homo_sapiens_assembly19.fasta --strict-vcf /data/151002_7001448_0359_AC7F6GANXX_Sample_HG002-EEogPU_v02-KIT-Av5_AGATGTAC_L008.posiSrt.markDup.21.22.bam > /data/151002_7001448_0359_AC7F6GANXX_Sample_HG002-EEogPU_v02-KIT-Av5_AGATGTAC_L008.posiSrt.markDup.21.22.vcf
@elowy01
elowy01 / BCFtools cheat sheet
Last active June 29, 2024 05:24
BCFtools cheat sheet
*bcftools filter
*Filter variants per region (in this example, print out only variants mapped to chr1 and chr2)
qbcftools filter -r1,2 ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.hg38.vcf.gz
*printing out info for only 2 samples:
bcftools view -s NA20818,NA20819 filename.vcf.gz
*printing stats only for variants passing the filter:
bcftools view -f PASS filename.vcf.gz
@tubaterry
tubaterry / parallelize.bash
Last active August 12, 2023 09:20
bash job queue with GNU Parallel
#!/bin/bash
#Tail a temp file, pipe it into GNU Parallel.
#Generally FIFOish but don't count on it, no guarantees about execution order between setup and tear-down
#Tips:
#I threw this together for a network based load -
# for disk-bound (esp magnetic hard drives) loads, lower -j to 1 or switch to --semaphore
# most load you can probably just omit -j and run with defaults. (One concurrent job per CPU core)
#todo:
@darencard
darencard / maker_genome_annotation.md
Last active March 7, 2024 08:50
In-depth description of running MAKER for genome annotation.

Please see the most up-to-date version of this protocol on my blog at https://darencard.net/blog/.

Genome Annotation using MAKER

MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.

Software & Data

Software prerequisites:

  1. RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (ver
@caseywdunn
caseywdunn / docker.md
Last active August 10, 2023 18:13
Docker cheat sheet

Docker cheat sheet

Introduction

Docker is a tool for bundling together applications and their dependencies into images that can than be run as containers on many different types of computers.

Docker and other containerization tools are useful to scientists because:

  • It greatly simplifies distribution and installation of complex work flows that