$ time freebayes --fasta-reference /data/Homo_sapiens_assembly19.fasta --strict-vcf /data/151002_7001448_0359_AC7F6GANXX_Sample_HG002-EEogPU_v02-KIT-Av5_AGATGTAC_L008.posiSrt.markDup.21.22.bam > /data/151002_7001448_0359_AC7F6GANXX_Sample_HG002-EEogPU_v02-KIT-Av5_AGATGTAC_L008.posiSrt.markDup.21.22.vcf
# Using Nextflow in Galaxy | |
* Nextflow is the workflow management tool used widely in bioinformatics | |
* Galaxy is a server based GUI tool with a nice interface and is also used to start workflows | |
## Can it be done ? | |
* There are few reports online, but it is possible. eg https://galaxyproject.org/blog/2022-08-15-making-nextflow-work-with-galaxy-at-cfsan-fda/ | |
* The main requirements: | |
* a) install Nextflow using a direct JAR download, not via conda. This may be because it collides with the internal Galaxy conda environments |
*bcftools filter | |
*Filter variants per region (in this example, print out only variants mapped to chr1 and chr2) | |
qbcftools filter -r1,2 ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.hg38.vcf.gz | |
*printing out info for only 2 samples: | |
bcftools view -s NA20818,NA20819 filename.vcf.gz | |
*printing stats only for variants passing the filter: | |
bcftools view -f PASS filename.vcf.gz |
#!/bin/bash | |
#Tail a temp file, pipe it into GNU Parallel. | |
#Generally FIFOish but don't count on it, no guarantees about execution order between setup and tear-down | |
#Tips: | |
#I threw this together for a network based load - | |
# for disk-bound (esp magnetic hard drives) loads, lower -j to 1 or switch to --semaphore | |
# most load you can probably just omit -j and run with defaults. (One concurrent job per CPU core) | |
#todo: |
Please see the most up-to-date version of this protocol on my blog at https://darencard.net/blog/.
MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.
- RepeatModeler and RepeatMasker with all dependencies (I used NCBI BLAST) and RepBase (ver
Docker is a tool for bundling together applications and their dependencies into images that can than be run as containers on many different types of computers.
Docker and other containerization tools are useful to scientists because:
- It greatly simplifies distribution and installation of complex work flows that