Skip to content

Instantly share code, notes, and snippets.

View gregcaporaso's full-sized avatar
🌱

Greg Caporaso gregcaporaso

🌱
View GitHub Profile
@gregcaporaso
gregcaporaso / variability_v_diversity.py
Last active August 29, 2015 13:56
first pass script for comparing median diversity versus median variability for a given category
#!/usr/bin/env python
# File created on 26 Feb 2014
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2014, The QIIME Project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.8.0-dev"
__maintainer__ = "Greg Caporaso"
@gregcaporaso
gregcaporaso / check_illumina_barcodes.py
Created December 3, 2013 16:32
Script for performing some basic testing on Illumina amplicon sequencing primers as described in: http://www.nature.com/ismej/journal/v6/n8/full/ismej20128a.html
#!/usr/bin/env python
# File created on 01 Dec 2011
from __future__ import division
__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2011, The QIIME project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.3.0-dev"
__maintainer__ = "Greg Caporaso"
#!/usr/bin/env python
from sys import argv
from random import random
from cogent.parse.fastq import MinimalFastqParser
from cogent.draw.distribution_plots import generate_box_plots
from qiime.quality import ascii_to_phred33
from qiime.util import qiime_open
def fastq_quality_plots(fastq_records,
@gregcaporaso
gregcaporaso / compare_pre_post_distances.py
Last active December 27, 2015 01:09
Code for comparing groups (e.g., treatment and control) of pre/post UniFrac distances to determine if one group's microbiomes are more stable than the other.
#!/usr/bin/env python
# Authors: Greg Caporaso, John Chase
# Questions: Contact gregcaporaso@gmail.com
# Step 1: Generate lists of pre/post sample ids on a per-individual basis
# qiime.group.extract_per_individual_states_from_sample_metadata
# will let you generate a dict of individual id to (pre sample-id, post sample-id)
# Step 2: Extract distances for pre/post sample ids
# qiime.parse.parse_distmat_to_dict
@gregcaporaso
gregcaporaso / README.md
Created August 23, 2013 14:54
Example files used while developing pyqi's Getting Started tutorials.

These files were used while developing pyqi's Getting Started tutorials. See those documents for usage examples.

@gregcaporaso
gregcaporaso / README.md
Last active December 21, 2015 02:39
Code and analysis notes for determine the taxonomic-specificity of a set of sequences with associated taxonomy strings. This has been tested with the Greengenes 13_5 database. See README.md for usage instructions and some analysis notes.

Taxonomic specificity of sequences in Greengenes

Here I'm creating a hash of expected 515F/806R amplicons from the Greengenes OTUs (for a couple of different sizes of OTUs), and comparing the uniqueness of sequences with the number of different taxonomic identities at each level.

There are basically three categories of sequences:

  1. those that are unique, and therefore can only map to a single taxa
  2. those that are not unique, but still only map to a single taxa
  3. those that are not unique, and map to multiple taxa.
@gregcaporaso
gregcaporaso / reorg-dir-structure.py
Created August 9, 2013 20:40
Quick-and-dirty code to generate a shell script to re-organize the directory structure in the short-read-tax-assignment repo.
41,9 All
#!/usr/bin/env python
# Author: Greg Caporaso
from os.path import join, isdir
from glob import glob
base_in_dir = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/multiple_assign_taxonomy_output/"
base_out_dir = "/home/caporaso/analysis/short-read-tax-assignment/data/eval-pre-computed/"
@gregcaporaso
gregcaporaso / generate_usearch_cmds.py
Last active December 20, 2015 06:09
code for converting blast "bl6" file to assignments (e.g., functional, taxonomic, etc).
#!/usr/bin/env python
from os.path import join
query_fp = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/S16S-2/rep_set.fna"
reference_seqs_fp = "/data/gg_13_5_otus/rep_set/97_otus.fasta"
reference_tax_fp = "/data/gg_13_5_otus/taxonomy/97_otu_taxonomy.txt"
input_biom_fp = "/home/caporaso/analysis/short-read-tax-assignment/data/qiime-mock-community/S16S-2/otu_table_mc2_no_pynast_failures.biom"
output_biom_fn = "otu_table_mc2_no_pynast_failures_w_taxa.biom"
output_dir = "/home/caporaso/analysis/short-read-tax-assignment/demo/eval-demo/usearch_v_97/"
@gregcaporaso
gregcaporaso / uc_fast_params.txt
Created July 8, 2013 21:49
uclust-fast parameter settings (this is a valid QIIME parameters file, and is used in the [Illumina overview tutorial](http://qiime.org/tutorials/illumina_overview_tutorial.html)).
pick_otus:enable_rev_strand_match True
pick_otus:max_accepts 1
pick_otus:max_rejects 8
pick_otus:stepwords 8
pick_otus:word_length 8
@gregcaporaso
gregcaporaso / README.md
Last active December 19, 2015 05:18
Code for demultiplexing fastq data where index reads and barcodes are included in the beginning of sequences. This code depends on QIIME 1.7.0.

Code for demultiplexing fastq data where index reads and barcodes are included in the beginning of sequences. This code depends on QIIME 1.7.0.

To run this code and pass the results into split_libraries_fastq.py:

prep_sl_fastq.py -b AmpF_25k.fastq.gz -m mapping.txt -o prepped_fastq
cd prepped_fastq
split_libraries_fastq.py -i AmpF_25k.fastq.amplicon.fastq -b AmpF_25k.fastq.barcode.fastq -m ../mapping.txt -o slout/ --barcode_type 12