John Blischak 2014-05-14
Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).
# turn a find or cut (cut delimiter, get first column) output into a list | |
/etc find . -name "*bash*" | xargs | |
cut -d, -f1 file.csv | xargs | |
# find a file and grep for a word in the file | |
find . -name "*.java" | xargs grep "Stock" | |
# handeling filenames which have WHITESPACE | |
ls *txt | xargs -d '\n' grep "cost" |
*bcftools filter | |
*Filter variants per region (in this example, print out only variants mapped to chr1 and chr2) | |
qbcftools filter -r1,2 ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.hg38.vcf.gz | |
*printing out info for only 2 samples: | |
bcftools view -s NA20818,NA20819 filename.vcf.gz | |
*printing stats only for variants passing the filter: | |
bcftools view -f PASS filename.vcf.gz |
*bcftools filter | |
*Filter variants per region (in this example, print out only variants mapped to chr1 and chr2) | |
qbcftools filter -r1,2 ALL.chip.omni_broad_sanger_combined.20140818.snps.genotypes.hg38.vcf.gz | |
*printing out info for only 2 samples: | |
bcftools view -s NA20818,NA20819 filename.vcf.gz | |
*printing stats only for variants passing the filter: | |
bcftools view -f PASS filename.vcf.gz |
Identify a directory which does not contain a particular file | |
find base_dir -mindepth 2 -maxdepth 2 -type d '!' -exec test -e "{}/cover.jpg" ';' -print |
John Blischak 2014-05-14
Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).
John Blischak 2014-05-14
Multiple users have observed that submitting jobs via Snakemake requires much more memory than is necessary to run the command (e.g. mailing list post, [Bitbucket issue][issue]).
Links: | |
http://quinlanlab.org/tutorials/bedtools/bedtools.html | |
Use Case 1: Given a.bam and b.regions.bed. how to get the parts of b.regions.bed that are not covered by a.bam? | |
Answer: | |
bedtools genomecov -ibam aln.bam -bga \ | |
| awk '$4==0' | | |
| bedtools intersect -a regions -b - > foo | |
Option -bga Report depth in BedGraph format, as above (i.e., -bg). However with this option, regions with zero coverage are also reported. This allows one to quickly extract all regions of a genome with 0 coverage by applying: “grep -w 0$” to the output. |
#!/usr/bin/env python | |
# Script to identify gaps regions in an assembly | |
# input : fasta | |
# output : bed | |
# usage : get_gap_postions.py fasta bed | |
# Import necessary packages | |
import argparse | |
import re | |
from Bio import SeqIO |
# turn a find or cut (cut delimiter, get first column) output into a list | |
/etc find . -name "*bash*" | xargs | |
cut -d, -f1 file.csv | xargs | |
# find a file and grep for a word in the file | |
find . -name "*.java" | xargs grep "Stock" | |
# handeling filenames which have WHITESPACE | |
ls *txt | xargs -d '\n' grep "cost" |
MAKER is a great tool for annotating a reference genome using empirical and ab initio gene predictions. GMOD, the umbrella organization that includes MAKER, has some nice tutorials online for running MAKER. However, these were quite simplified examples and it took a bit of effort to wrap my head completely around everything. Here I will describe a de novo genome annotation for Boa constrictor in detail, so that there is a record and that it is easy to use this as a guide to annotate any genome.