Skip to content

Instantly share code, notes, and snippets.

Greg Caporaso gregcaporaso

Block or report user

Report or block gregcaporaso

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
@gregcaporaso
gregcaporaso / README.md
Created Aug 23, 2013
Example files used while developing pyqi's Getting Started tutorials.
View README.md

These files were used while developing pyqi's Getting Started tutorials. See those documents for usage examples.

@gregcaporaso
gregcaporaso / README.md
Last active Dec 21, 2015
Code and analysis notes for determine the taxonomic-specificity of a set of sequences with associated taxonomy strings. This has been tested with the Greengenes 13_5 database. See README.md for usage instructions and some analysis notes.
View README.md

Taxonomic specificity of sequences in Greengenes

Here I'm creating a hash of expected 515F/806R amplicons from the Greengenes OTUs (for a couple of different sizes of OTUs), and comparing the uniqueness of sequences with the number of different taxonomic identities at each level.

There are basically three categories of sequences:

  1. those that are unique, and therefore can only map to a single taxa
  2. those that are not unique, but still only map to a single taxa
  3. those that are not unique, and map to multiple taxa.
@gregcaporaso
gregcaporaso / reorg-dir-structure.py
Created Aug 9, 2013
Quick-and-dirty code to generate a shell script to re-organize the directory structure in the short-read-tax-assignment repo.
View reorg-dir-structure.py
41,9 All
#!/usr/bin/env python
# Author: Greg Caporaso
from os.path import join, isdir
from glob import glob
base_in_dir = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/multiple_assign_taxonomy_output/"
base_out_dir = "/home/caporaso/analysis/short-read-tax-assignment/data/eval-pre-computed/"
@gregcaporaso
gregcaporaso / generate_usearch_cmds.py
Last active Dec 20, 2015
code for converting blast "bl6" file to assignments (e.g., functional, taxonomic, etc).
View generate_usearch_cmds.py
#!/usr/bin/env python
from os.path import join
query_fp = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/S16S-2/rep_set.fna"
reference_seqs_fp = "/data/gg_13_5_otus/rep_set/97_otus.fasta"
reference_tax_fp = "/data/gg_13_5_otus/taxonomy/97_otu_taxonomy.txt"
input_biom_fp = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/S16S-2/otu_table_mc2_no_pynast_failures.biom"
output_biom_fn = "otu_table_mc2_no_pynast_failures_w_taxa.biom"
output_dir = "/home/caporaso/analysis/short-read-tax-assignment/demo/eval-demo/usearch_v_97/"
@gregcaporaso
gregcaporaso / uc_fast_params.txt
Created Jul 8, 2013
uclust-fast parameter settings (this is a valid QIIME parameters file, and is used in the [Illumina overview tutorial](http://qiime.org/tutorials/illumina_overview_tutorial.html)).
View uc_fast_params.txt
pick_otus:enable_rev_strand_match True
pick_otus:max_accepts 1
pick_otus:max_rejects 8
pick_otus:stepwords 8
pick_otus:word_length 8
@gregcaporaso
gregcaporaso / README.md
Last active Dec 19, 2015
Code for demultiplexing fastq data where index reads and barcodes are included in the beginning of sequences. This code depends on QIIME 1.7.0.
View README.md

Code for demultiplexing fastq data where index reads and barcodes are included in the beginning of sequences. This code depends on QIIME 1.7.0.

To run this code and pass the results into split_libraries_fastq.py:

prep_sl_fastq.py -b AmpF_25k.fastq.gz -m mapping.txt -o prepped_fastq
cd prepped_fastq
split_libraries_fastq.py -i AmpF_25k.fastq.amplicon.fastq -b AmpF_25k.fastq.barcode.fastq -m ../mapping.txt -o slout/ --barcode_type 12
@gregcaporaso
gregcaporaso / README.md
Last active Dec 18, 2015
Hack for converting TGEN SNP pipeline output into files that can be converted to BIOM format for analysis with QIIME.
View README.md

This is all UNTESTED CODE!!

Description

This software allows users to convert TGEN SNP tables to legacy-formatted QIIME OTU tables, which can then be converted into BIOM tables with convert_biom.py (from the biom-format package). This was quickly hacked together, so is untested, but intended to be useful in figuring out if making these tables available in BIOM format will support useful analyses.

Install

@gregcaporaso
gregcaporaso / generate_sample_ids.py
Created May 29, 2013
Generate sample ids for the office surface succession project.
View generate_sample_ids.py
#!/usr/bin/env python
# File created on 29 May 2013
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2011, The QIIME project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.6.0-dev"
__maintainer__ = "Greg Caporaso"
@gregcaporaso
gregcaporaso / clean_nifH_tax.py
Created May 7, 2013
A single-use script for cleaning up taxonomy strings from a nifH database generated by Gady and Buckley (2012) and exported by the authors from ARB. This is being use to hack together a 13_5 release of a nifH database for use with PrimerProspector and QIIME. The database reference is: PLoS One. 2012;7(7):e42149. doi: 10.1371/journal.pone.0042149…
View clean_nifH_tax.py
#!/usr/bin/env python
# File created on 07 May 2013
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2011, The QIIME project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.6.0-dev"
__maintainer__ = "Greg Caporaso"
@gregcaporaso
gregcaporaso / README.md
Last active Dec 15, 2015
Experimental wrappers for usearch 6.1. None of this is tested - just playing.
View README.md

Closed-reference OTU picking:

usearch61 -usearch_global seqs.fna -db refseqs.fasta -id 0.97 -uc usearch_global.uc -strand both

De novo OTU picking:

You can’t perform that action at this time.