Skip to content

Instantly share code, notes, and snippets.

View elsherbini's full-sized avatar

Joseph Elsherbini elsherbini

View GitHub Profile
@elsherbini
elsherbini / Snakefile
Created September 19, 2016 20:12
dada2_snakefile
configfile:
"config.yaml"
from snakemake.utils import R
import glob
import os
import numpy as np
SETS = ["02_frac", "5_frac", "1_and_63_frac", "unfrac"]
SET_TO_SAMPLES = {s: [os.path.splitext(os.path.basename(fn))[0].split(".")[0] for fn in os.listdir("data/{}/1.cat".format(s))] for s in SETS}
Name t0N t4.5N t7.5N t10.5N t13.5N t16.5N t28.5N t0B t4.5B t7.5B t10.5B t13.5B t16.5B t28.5B t0C t4.5C t7.5C t10.5C t13.5C t16.5C t28.5C t0M t4.5M t7.5M t10.5M t13.5M t16.5M t28.5M t0P t4.5P t7.5P t10.5P t13.5P t16.5P t28.5P
10N.286.45.E6 93 46 793 1662 51 1435 331 93 5 0 2366 97 1965 489 93 175 582 0 1673 0 212 93 547 623 7 1499 10 2364 93 8 1536 742 120 4682 98
10N.286.46.F8 57 37 2 1 4 6 18 57 2 0 1503 83 418 360 57 194 564 0 1451 0 284 57 531 472 2 1577 5 3801 57 3 875 239 86 767 72
10N.286.51.B1 19 22 32 48 3 22 18 19 2 0 771 21 174 115 19 99 217 0 345 0 55 19 257 229 0 420 10 536 19 1 742 142 80 406 37
10N.286.48.E2 70 29 508 961 70 1312 199 70 0 0 2410 140 977 563 70 83 345 0 1632 0 282 70 319 315 0 1341 3 3820 70 7 805 618 567 3521 84
10N.261.52.E5 67 112 544 567 46 982 142 67 325 0 1535 74 813 443 67 198 491 21 1407 147 269 67 573 366 3 952 19 1940 67 254 879 386 216 3266 280
FF-50 21 234 15 13 0 25 6 21 742 0 584 13 257 66 21 107 69 2 282 18 27 21 218 73 1 191 16 464 21 493 84 44 10 156 51
TI-B 334
@elsherbini
elsherbini / dada2_bioclite_install_output.txt
Last active July 31, 2016 18:23
Output from install binary with bioclite
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
@elsherbini
elsherbini / dada2_bioclite_install_output.txt
Created July 31, 2016 18:23
Output from install binary with bioclite
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
Natural language support but running in an English locale
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
@elsherbini
elsherbini / dada2_devtools_install_output.txt
Created July 31, 2016 17:56
Output from trying to install dada2 on cluster with old gcc
> library("devtools")
> devtools::install_github("benjjneb/dada2")
Downloading GitHub repo benjjneb/dada2@master
from URL https://api.github.com/repos/benjjneb/dada2/zipball/master
Installing dada2
trying URL 'https://cran.rstudio.com/src/contrib/RcppParallel_4.3.19.tar.gz'
Content type 'application/x-gzip' length 1560737 bytes (1.5 MB)
==================================================
downloaded 1.5 MB
Installing RcppParallel
@elsherbini
elsherbini / stefan_data_munging.ipynb
Last active July 6, 2016 16:25
Showing some patterns for reshaping data in R
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@elsherbini
elsherbini / README.md
Last active May 11, 2016 17:52
interproscan snakemake files for SLURM cluster

The three files I used to run interproscan 5.17 on a SLURM cluster.

It took input fasta files of bacterial genome contigs. It called ORFs using prodigal, and then used interproscan to annotate them.

To run the thing, update the config.yaml file and then submit the snakemake job:

sbatch snakemake.sbatch

Resources for learning

Linux / Command Line

  • Command Line Bootcamp - A great interactive tutorial for learning the basics of the command line

  • Art of the Command Line - Not interactive, but more exhaustive than the bootcamp.

  • explainshell - Give it a shell command, and it'll tell you what all the parts mean. Never wonder what cat ./in | cut -f 2 | sort | uniq -c | sort -n -k1 | sed -e 's/^[ \t]*//' > ./out means again!