gregcaporaso/README.md

## README.md

      
    Raw
  

              README.md
            
          
    Notes by Greg Caporaso (gregcaporaso@gmail.com)
Analysis goals

From email to picrust-developers on 4 Oct 2012:


Filter the HMP (not HMP-mock) data set to ~50-100k sequences at random to form a filtered dataset (for decreased run time).


Select ~12 of the most abundant IMG-defined OTUs from the HMP, and slice the reference sequence to the amplified region in that dataset.  "IMG-defined" here means that we have an IMG genome attached to the OTU, opposed to the Greengenes-defined OTUs where we don't have a genome for that specific OTU. "most abundant" will be somewhat arbitrary - I'm thinking something like a random 12 IMG-defined from the 25% most abundant OTUs in the dataset.


Spike a combined total of 50k of the sliced reads from Step 2 into the filtered dataset from Step 1.


Pick OTUs with available methods (uclust-fast, uclust-strict, BWA-short, BLAT, usearch, bowtie2 (when available)).


Share OTU tables for PICRUST prediction accuracy analysis "positive control" (i.e., we expect these to achieve higher accuracy as we have enriched for genomes that we suspect are very likely to be present in the samples).


Analysis notes

Run on compy.colorado.edu on 26 Oct 2012 by Greg Caporaso (gregcaporaso@gmail.com)
python ../spike_control/picrust_spike_control.py -f /scratch/caporaso/picrust/otus/mammals/seqs.fna -i /scratch/caporaso/picrust/otus/mammals/ucrC88.strict/uclust_ref_picked_otus/otu_table.biom -o /scratch/caporaso/picrust/otus/mammals/spike_control/ -p 0.3 -g /scratch/caporaso/gg_otus_4feb2011/taxonomies/greengenes_tax.txt -r /Users/caporaso/data/img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta -c /scratch/caporaso/gg_otus_4feb2011/rep_set/gg_97_otus_4feb2011_aligned.fasta

# w spiked sequences
echo "pick_reference_otus_through_otu_table.py -i /scratch/caporaso/picrust/otus/mammals/spike_control/seqs.subsampled0.300000.spiked.fna -o /scratch/caporaso/picrust/otus/mammals/spike_control/ucrC97_subsampled0.300000_plus_spike/ -p /scratch/caporaso/picrust/otus/otu_picking_params_97.txt -r /Users/caporaso/data/img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta" | qsub -keo -N picrustspk -l pvmem=4gb -q memroute

echo "beta_diversity_through_plots.py -i /scratch/caporaso/picrust/otus/mammals/spike_control/ucrC97_subsampled0.300000_plus_spike/uclust_ref_picked_otus/otu_table.biom -o /scratch/caporaso/picrust/otus/mammals/spike_control/ucrC97_subsampled0.300000_plus_spike/bdiv_even488 -e 488 -t /Users/caporaso/data/img_gg_otus_18may2012/trees/97_otus_img_gg_18may2012.tre -m /scratch/caporaso/picrust/otus/mammals/map.txt" | qsub -keo -N spkbdiv -l pvmem=4gb -q memroute

# w/o spiked sequences
echo "pick_reference_otus_through_otu_table.py -i /scratch/caporaso/picrust/otus/mammals/spike_control/seqs.subsampled0.300000.fna -o /scratch/caporaso/picrust/otus/mammals/spike_control/ucrC97_subsampled0.300000/ -p /scratch/caporaso/picrust/otus/otu_picking_params_97.txt -r /Users/caporaso/data/img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta" | qsub -keo -N picrustspk -l pvmem=4gb -q memroute

echo "beta_diversity_through_plots.py -i /scratch/caporaso/picrust/otus/mammals/spike_control/ucrC97_subsampled0.300000/uclust_ref_picked_otus/otu_table.biom -o /scratch/caporaso/picrust/otus/mammals/spike_control/ucrC97_subsampled0.300000/bdiv_even298 -e 298 -t /Users/caporaso/data/img_gg_otus_18may2012/trees/97_otus_img_gg_18may2012.tre -m /scratch/caporaso/picrust/otus/mammals/map.txt" | qsub -keo -N spkbdiv -l pvmem=4gb -q memroute


Input md5 sums
7eae6b90a80419c402930e16ba790f48  /scratch/caporaso/picrust/otus/mammals/ucrC88.strict/uclust_ref_picked_otus/otu_table.biom
a8f087f46afcfc0ec48dd9a69d4faffa  /scratch/caporaso/gg_otus_4feb2011/taxonomies/greengenes_tax.txt
81b2a0624bac27eee6a59ab3e3d55bbf  /scratch/caporaso/picrust/otus/mammals/seqs.fna
7efa9b27e4eeb3985ca5614269cb7cb8  /Users/caporaso/data/img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta
afea7f81e67b77d3eb4917ede7b7f914  /scratch/caporaso/gg_otus_4feb2011/rep_set/gg_97_otus_4feb2011_aligned.fasta
4c6fa9441fac31f4df448469e3b44506  /Users/caporaso/data/img_gg_otus_18may2012/trees/97_otus_img_gg_18may2012.tre
f52d162d44591c9cd8d81379a89ad240  /scratch/caporaso/picrust/otus/mammals/map.txt

Initial investigations

Initial investigation was performed using HMP, but that data was problematic for several reasons. Those notes, which formed the basis of this script, are below.
Step 1.
Will work with the HMP v35 mock community data as we seem to be getting a higher fraction of read assignments with that data relative to the v13 data.
compy:/scratch/caporaso/picrust/otus/hmp/seqs_v35.fna : 710e2aa798629d2c3e591a83dd3db009

count_seqs.py -i seqs_v35.fna
30336944  : seqs_v35.fna (Sequence lengths (mean +/- std): 528.0545 +/- 16.3430)

echo "subsample_fasta.py -i /scratch/caporaso/picrust/otus/hmp/seqs_v35.fna -o /scratch/caporaso/picrust/otus/hmp/spiked_control/seqs_v35.subsample003.fna -p 0.003" | qsub -keo -N subsamplehmp

count_seqs.py -i seqs_v35.subsample003.fna 
91006  : seqs_v35.subsample003.fna (Sequence lengths (mean +/- std): 528.1065 +/- 16.4102)

Step 2. Select only the IMG OTUs
echo "filter_otus_from_otu_table.py -i /scratch/caporaso/picrust/otus/hmp/ucrC88_v35_new/uclust_ref_picked_otus/otu_table.biom -o /scratch/caporaso/picrust/otus/hmp/spiked_control/otu_table_img_only.biom -e /scratch/caporaso/gg_otus_4feb2011/taxonomies/greengenes_tax.txt" | qsub -keo -N filter_hmp -l pvmem=32gb -q memroute

#Filter the OTU table to include samples with at least 3000 observations (we’ll drop the rest from the final table).

filter_samples_from_otu_table.py -i otu_table_img_only.biom -o otu_table_img_only.mc3000.biom -n 3000

#Identify the most abundant IMG OTUs on a per-sample basis.

In [1]: from biom.parse import parse_biom_table

In [2]: t = parse_biom_table(open('./otu_table_img_only.mc3000.biom'))

In [16]: def get_abundant_oids(v):
   ....:     x = [(e,i) for i,e in enumerate(v)]
   ....:     x.sort()
   ....:     x.reverse()
   ....:     result = [(t.ObservationIds[i],e) for e,i in x if e > 0]
   ....:     return result

In [22]: abundant_otus = {}

In [23]: for sv, si, sm in t.iterSamples():
    abundant_otus[si] = get_abundant_oids(sv)

Write the set of identifiers of abundant OTUs to file:
In [26]: most_abundant = set()

In [27]: for sid, a in abundant_otus.items():
   ....:     most_abundant.update(set([e[0] for e in a]))
   ....:     

In [28]: len(most_abundant)
Out[28]: 332

In [30]: f = open('most_abundant_seq_ids.txt','w')

In [31]: f.write('\n'.join(most_abundant))

In [32]: f.close()

#Filter the abundant OTUs from the reference set: 
img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta 
(md5: 7efa9b27e4eeb3985ca5614269cb7cb8)

filter_fasta.py -f /Users/caporaso/data/img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta -o /scratch/caporaso/picrust/otus/hmp/spiked_control/abundant_img_sequences.fasta -s /scratch/caporaso/picrust/otus/hmp/spiked_control/most_abundant_seq_ids.txt

#Align those sequences
align_seqs.py -i /scratch/caporaso/picrust/otus/hmp/spiked_control/abundant_img_sequences.fasta -t /scratch/caporaso/gg_otus_4feb2011/rep_set/gg_97_otus_4feb2011_aligned.fasta 

#Trim to the v35 region using the HMP primers (slice_aligned_region.py is a PrimerProspector script). The primers appear to be 357F (CCTACGGGAGGCAGCAG) and 926R (CCGTCAATTCMTTTRAGT) (found these here (see page 11))
slice_aligned_region.py -f abundant_img_sequences_aligned.fasta -x CTGCTGCCTCCCGTAGG -y CCGTCAATTCMTTTRAG

Step 3. Generate fasta of sequences to spike into the subsample
In [1]: from cogent.parse.fasta import MinimalFastaParser

In [2]: sequences_to_spike = {}

In [3]: for seq_id, seq in MinimalFastaParser(open('pynast_aligned/abundant_img_sequences_aligned_alignment_region.fasta','U')):
    sequences_to_spike[seq_id.split()[0]] = seq

In [5]: f = open('spike.fasta','w')

In [6]: i = 30336945

In [12]: for sid, o in abundant_otus.items():                                                                                    
    for e in o[:2]:
        for j in range(50):
            try:
                f.write('>%s_%d %s\n%s\n' % (sid,i,e[0],sequences_to_spike[e[0]]))
            except KeyError:
                pass
            i += 1
   ....:             

In [13]: f.close()

In [14]: !count_seqs.py -i spike.fasta

97750  : spike.fasta (Sequence lengths (mean +/- std): 545.1565 +/- 8.8109)
97750  : Total

Step 4: Spike the sequences into the subsample
cat seqs_v35.subsample003.fna spike.fasta >> seqs_v35.subsample003_plus_spike.fna
count_seqs.py -i seqs_v35.subsample003_plus_spike.fna

188756  : seqs_v35.subsample003_plus_spike.fna (Sequence lengths (mean +/- std): 536.9361 +/- 15.5764)
188756  : Total

Step 5: Pick OTUs and generate OTU table on spiked data
Ran this in serial as the queue was full at the time
echo "pick_reference_otus_through_otu_table.py -i /scratch/caporaso/picrust/otus/hmp/spiked_control/seqs_v35.subsample003_plus_spike.fna -o /scratch/caporaso/picrust/otus/hmp/spiked_control/ucrC97_v35.subsample003_plus_spike/ -p /scratch/caporaso/picrust/otus/otu_picking_params_97.txt -r /Users/caporaso/data/img_gg_otus_18may2012/rep_set/97_otus_img_gg_18may2012.fasta" | qsub -keo -N picrustspk -l pvmem=4gb -q memroute

Currently have too many samples to reasonably work with this many sequences. Need to filter the sequence collection on a per sample basis. Need the mapping file! If no mapping file, work with a different data set (which might be easier regardless, given the first issue noted here).

  
## picrust_spike_control.py
#!/usr/bin/env python
# File created on 26 Oct 2012
from __future__ import division

__author__ = "Greg Caporaso"
__copyright__ = "Copyright 2011, The QIIME project"
__credits__ = ["Greg Caporaso"]
__license__ = "GPL"
__version__ = "1.5.0-dev"
__maintainer__ = "Greg Caporaso"
__email__ = "gregcaporaso@gmail.com"
__status__ = "Development"

from os.path import split, splitext
from cogent.parse.fasta import MinimalFastaParser
from qiime.util import parse_command_line_parameters, make_option
from qiime.util import qiime_system_call, create_dir
from qiime.filter import filter_fasta
from biom.parse import parse_biom_table

script_info = {}
script_info['brief_description'] = ""
script_info['script_description'] = ""
script_info['script_usage'] = [("","","")]
script_info['output_description']= ""
script_info['required_options'] = [
 make_option('-f','--input_fasta_fp',type="existing_filepath",help='the input filepath'),
 make_option('-i','--input_otu_table_fp',type="existing_filepath",help='the input filepath'),
 make_option('-o','--output_dir',type="new_dirpath",help='the output directory'),
 make_option('-p','--percent_subsample',action='store',type='float',
        help='Specify the percentage of sequences to subsample'),
 make_option('-g','--greengenes_tax_fp',type="existing_filepath",
        help='Path to Greengenes OTU taxonomy file'),
 make_option('-r','--picrust_reference_seqs_fp',type="existing_filepath",
        help='Path to PICRUST reference sequences file'),
 make_option('-c','--reference_alignment_fp',type="existing_filepath",
        help='Path to reference alignment file'),
]
script_info['optional_options'] = [\
 # Example optional option
 #make_option('-o','--output_dir',type="new_dirpath",help='the output directory [default: %default]'),\
]
script_info['version'] = __version__

def command_call(cmd):
    stdout, stderr, return_code = qiime_system_call(cmd)
    if return_code != 0:
        raise ValueError, "STDOUT:\n%s\nSTDERR:\n%s\n" % (stdout, stderr)

def get_abundant_oids(t,v):
    x = [(e,i) for i,e in enumerate(v)]
    x.sort()
    x.reverse()
    result = [(t.ObservationIds[i],e) for e,i in x if e > 0]
    return result

def picrust_spike_control(input_fasta_fp,
                          input_otu_table_fp,
                          output_dir,
                          percent_subsample,
                          greengenes_tax_fp,
                          picrust_reference_seqs_fp,
                          reference_alignment_fp):

    # subsample the full fasta file for reduced run time
    input_fasta_basename, input_fasta_ext = splitext(split(input_fasta_fp)[1])
    subsampled_fasta_fp = "%s/%s.subsampled%f%s" % (output_dir,
                                                     input_fasta_basename,
                                                     percent_subsample,
                                                     input_fasta_ext
                                                    )
    subsample_cmd = "subsample_fasta.py -i %s -o %s -p %f" % \
     (input_fasta_fp,subsampled_fasta_fp,percent_subsample)
    command_call(subsample_cmd)

    # Select only the IMG OTUs
    img_otu_table_fp = '%s/otu_table_img_only.biom' % output_dir
    filter_otus_from_otu_table_cmd =\
     "filter_otus_from_otu_table.py -i %s -o %s -e %s" % (input_otu_table_fp,
                                                          img_otu_table_fp,
                                                          greengenes_tax_fp)

    # Identify the most abundant IMG OTUs on a per-sample basis.
    command_call(filter_otus_from_otu_table_cmd)
    t = parse_biom_table(open(img_otu_table_fp))
    abundant_otus = {}
    for sv, si, sm in t.iterSamples():
        abundant_otus[si] = get_abundant_oids(t,sv)

    # compile the set of most abundant OTU ids
    most_abundant = set()
    for sid, a in abundant_otus.items():
        most_abundant.update(set([e[0] for e in a]))

    # Filter the abundant OTUs from the reference set
    abundant_img_sequences_fp = '%s/abundant_img_sequences.fasta' % output_dir
    filter_fasta(MinimalFastaParser(open(picrust_reference_seqs_fp)),
                 open(abundant_img_sequences_fp,'w'),
                 most_abundant)

    # Align those sequences
    pynast_output_dir = "%s/pynast_aligned/" % output_dir
    align_seqs_cmd = \
     "align_seqs.py -i %s -t %s -o %s" % (abundant_img_sequences_fp,
                                          reference_alignment_fp,
                                          pynast_output_dir)
    command_call(align_seqs_cmd)

    # Trim to the amplified region (these primers are 8F/338R, as listed in
    # Muegge et al supplementary material:
    # http://www.sciencemag.org/content/suppl/2011/05/18/332.6032.970.DC1/Muegge.SOM.pdf
    # with the forward primer reverse complemented
    abundant_img_sequences_aligned_fp =\
     "%s/pynast_aligned/abundant_img_sequences_aligned.fasta" % output_dir
    slice_aligned_region_cmd = "slice_aligned_region.py -f %s -x CTGAGCCAGGATCAAACTCT -y TGCTGCCTCCCGTAGGAGT -o %s" % (abundant_img_sequences_aligned_fp,output_dir)
    abundant_img_sequences_aligned_trimmed_fp = \
     "%s/abundant_img_sequences_aligned_alignment_region.fasta" % output_dir
    command_call(slice_aligned_region_cmd)

    # Generate fasta of sequences to spike into the subsample
    sequences_to_spike = {}
    for seq_id, seq in MinimalFastaParser(open(abundant_img_sequences_aligned_trimmed_fp,'U')):
        sequences_to_spike[seq_id.split()[0]] = seq
    i = 2000000
    spike_fasta_fp = '%s/spike.%s' % (output_dir,input_fasta_ext)
    f = open(spike_fasta_fp,'w')
    for sid, o in abundant_otus.items():
        for e in o[:2]:
            for j in range(200):
                try:
                    f.write('>%s_%d %s\n%s\n' % (sid,i,e[0],sequences_to_spike[e[0]]))
                except KeyError:
                    pass
                i += 1
    f.close()

    # Spike the sequences into the subsample
    spiked_output_fasta_fp =\
     "%s/%s.subsampled%f.spiked%s" % (output_dir,
                                input_fasta_basename,
                                percent_subsample,
                                input_fasta_ext
                               )
    cat_cmd = "cat %s %s >> %s" % (subsampled_fasta_fp,
                                  spike_fasta_fp,
                                  spiked_output_fasta_fp)
    command_call(cat_cmd)


def main():
    option_parser, opts, args =\
       parse_command_line_parameters(**script_info)
    create_dir(opts.output_dir,fail_on_exist=True)
    picrust_spike_control(opts.input_fasta_fp,
                          opts.input_otu_table_fp,
                          opts.output_dir,
                          opts.percent_subsample,
                          opts.greengenes_tax_fp,
                          opts.picrust_reference_seqs_fp,
                          opts.reference_alignment_fp)

if __name__ == "__main__":
    main()
	#!/usr/bin/env python
	# File created on 26 Oct 2012
	from __future__ import division

	__author__ = "Greg Caporaso"
	__copyright__ = "Copyright 2011, The QIIME project"
	__credits__ = ["Greg Caporaso"]
	__license__ = "GPL"
	__version__ = "1.5.0-dev"
	__maintainer__ = "Greg Caporaso"
	__email__ = "gregcaporaso@gmail.com"
	__status__ = "Development"

	from os.path import split, splitext
	from cogent.parse.fasta import MinimalFastaParser
	from qiime.util import parse_command_line_parameters, make_option
	from qiime.util import qiime_system_call, create_dir
	from qiime.filter import filter_fasta
	from biom.parse import parse_biom_table

	script_info = {}
	script_info['brief_description'] = ""
	script_info['script_description'] = ""
	script_info['script_usage'] = [("","","")]
	script_info['output_description']= ""
	script_info['required_options'] = [
	make_option('-f','--input_fasta_fp',type="existing_filepath",help='the input filepath'),
	make_option('-i','--input_otu_table_fp',type="existing_filepath",help='the input filepath'),
	make_option('-o','--output_dir',type="new_dirpath",help='the output directory'),
	make_option('-p','--percent_subsample',action='store',type='float',
	help='Specify the percentage of sequences to subsample'),
	make_option('-g','--greengenes_tax_fp',type="existing_filepath",
	help='Path to Greengenes OTU taxonomy file'),
	make_option('-r','--picrust_reference_seqs_fp',type="existing_filepath",
	help='Path to PICRUST reference sequences file'),
	make_option('-c','--reference_alignment_fp',type="existing_filepath",
	help='Path to reference alignment file'),
	]
	script_info['optional_options'] = [\
	# Example optional option
	#make_option('-o','--output_dir',type="new_dirpath",help='the output directory [default: %default]'),\
	]
	script_info['version'] = __version__

	def command_call(cmd):
	stdout, stderr, return_code = qiime_system_call(cmd)
	if return_code != 0:
	raise ValueError, "STDOUT:\n%s\nSTDERR:\n%s\n" % (stdout, stderr)

	def get_abundant_oids(t,v):
	x = [(e,i) for i,e in enumerate(v)]
	x.sort()
	x.reverse()
	result = [(t.ObservationIds[i],e) for e,i in x if e > 0]
	return result

	def picrust_spike_control(input_fasta_fp,
	input_otu_table_fp,
	output_dir,
	percent_subsample,
	greengenes_tax_fp,
	picrust_reference_seqs_fp,
	reference_alignment_fp):

	# subsample the full fasta file for reduced run time
	input_fasta_basename, input_fasta_ext = splitext(split(input_fasta_fp)[1])
	subsampled_fasta_fp = "%s/%s.subsampled%f%s" % (output_dir,
	input_fasta_basename,
	percent_subsample,
	input_fasta_ext
	)
	subsample_cmd = "subsample_fasta.py -i %s -o %s -p %f" % \
	(input_fasta_fp,subsampled_fasta_fp,percent_subsample)
	command_call(subsample_cmd)

	# Select only the IMG OTUs
	img_otu_table_fp = '%s/otu_table_img_only.biom' % output_dir
	filter_otus_from_otu_table_cmd =\
	"filter_otus_from_otu_table.py -i %s -o %s -e %s" % (input_otu_table_fp,
	img_otu_table_fp,
	greengenes_tax_fp)

	# Identify the most abundant IMG OTUs on a per-sample basis.
	command_call(filter_otus_from_otu_table_cmd)
	t = parse_biom_table(open(img_otu_table_fp))
	abundant_otus = {}
	for sv, si, sm in t.iterSamples():
	abundant_otus[si] = get_abundant_oids(t,sv)

	# compile the set of most abundant OTU ids
	most_abundant = set()
	for sid, a in abundant_otus.items():
	most_abundant.update(set([e[0] for e in a]))

	# Filter the abundant OTUs from the reference set
	abundant_img_sequences_fp = '%s/abundant_img_sequences.fasta' % output_dir
	filter_fasta(MinimalFastaParser(open(picrust_reference_seqs_fp)),
	open(abundant_img_sequences_fp,'w'),
	most_abundant)

	# Align those sequences
	pynast_output_dir = "%s/pynast_aligned/" % output_dir
	align_seqs_cmd = \
	"align_seqs.py -i %s -t %s -o %s" % (abundant_img_sequences_fp,
	reference_alignment_fp,
	pynast_output_dir)
	command_call(align_seqs_cmd)

	# Trim to the amplified region (these primers are 8F/338R, as listed in
	# Muegge et al supplementary material:
	# http://www.sciencemag.org/content/suppl/2011/05/18/332.6032.970.DC1/Muegge.SOM.pdf
	# with the forward primer reverse complemented
	abundant_img_sequences_aligned_fp =\
	"%s/pynast_aligned/abundant_img_sequences_aligned.fasta" % output_dir
	slice_aligned_region_cmd = "slice_aligned_region.py -f %s -x CTGAGCCAGGATCAAACTCT -y TGCTGCCTCCCGTAGGAGT -o %s" % (abundant_img_sequences_aligned_fp,output_dir)
	abundant_img_sequences_aligned_trimmed_fp = \
	"%s/abundant_img_sequences_aligned_alignment_region.fasta" % output_dir
	command_call(slice_aligned_region_cmd)

	# Generate fasta of sequences to spike into the subsample
	sequences_to_spike = {}
	for seq_id, seq in MinimalFastaParser(open(abundant_img_sequences_aligned_trimmed_fp,'U')):
	sequences_to_spike[seq_id.split()[0]] = seq
	i = 2000000
	spike_fasta_fp = '%s/spike.%s' % (output_dir,input_fasta_ext)
	f = open(spike_fasta_fp,'w')
	for sid, o in abundant_otus.items():
	for e in o[:2]:
	for j in range(200):
	try:
	f.write('>%s_%d %s\n%s\n' % (sid,i,e[0],sequences_to_spike[e[0]]))
	except KeyError:
	pass
	i += 1
	f.close()

	# Spike the sequences into the subsample
	spiked_output_fasta_fp =\
	"%s/%s.subsampled%f.spiked%s" % (output_dir,
	input_fasta_basename,
	percent_subsample,
	input_fasta_ext
	)
	cat_cmd = "cat %s %s >> %s" % (subsampled_fasta_fp,
	spike_fasta_fp,
	spiked_output_fasta_fp)
	command_call(cat_cmd)


	def main():
	option_parser, opts, args =\
	parse_command_line_parameters(**script_info)
	create_dir(opts.output_dir,fail_on_exist=True)
	picrust_spike_control(opts.input_fasta_fp,
	opts.input_otu_table_fp,
	opts.output_dir,
	opts.percent_subsample,
	opts.greengenes_tax_fp,
	opts.picrust_reference_seqs_fp,
	opts.reference_alignment_fp)

	if __name__ == "__main__":
	main()