Skip to content

Instantly share code, notes, and snippets.

@johnsolk
Created March 5, 2018 20:35
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save johnsolk/0aa14000bfbfec62fffe4893684e1bb6 to your computer and use it in GitHub Desktop.
Save johnsolk/0aa14000bfbfec62fffe4893684e1bb6 to your computer and use it in GitHub Desktop.
(agalma) ljcohen@js-169-78:~/tmp$ agalma test
Created temp directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7'
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-transcriptome-VE7/agalma.sqlite
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7' already exists
SRX288285 [2018-03-05 20:03:47]
/home/ljcohen/tmp/agalma-test-transcriptome-VE7/SRX288285_1.fq.gz (3.6 MB)
/home/ljcohen/tmp/agalma-test-transcriptome-VE7/SRX288285_2.fq.gz (3.7 MB)
species: Agalma elegans
ncbi_id: 316166
itis_id: None
extraction_id: None
library_id: SRR871526
library_type: TRANSCRIPTOMIC
individual: None
treatment: None
sequencer: Illumina HiSeq 2000
seq_center: Dunnlab
note: None
sample_prep: Trizol | Illumina TruSeq RNA Sample Prep Kit RNA Purification Beads ; 2 rounds | Illumina TruSeq RNA Sample Prep Kit
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/qc-1'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / qc.setup_data / 0.458s / 116.4MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / qc.fastqc / 0.460s / 116.4MB
Generate FastQC reports for each FASTQ file
biolite.pipeline.run:
STAGE 2 / qc.parse / 9.453s / 545.8MB
Parse FastQC reports into the database
biolite.pipeline.run:
FINISHED / 9.470s / 549.7MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/transcriptome-2'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / insert_size.setup_data / 0.408s / 124.0MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / insert_size.assemble_subset / 0.409s / 124.2MB
Assemble a subset of high quality reads
biolite.pipeline.run:
STAGE 2 / insert_size.estimate_insert / 42.766s / 1511.6MB
Estimate insert size by mapping the subset against the assembly
biolite.pipeline.run:
STAGE 3 / rrna.assemble_subsets / 48.671s / 1511.9MB
Assemble subsets of increasing numbers of reads
biolite.pipeline.run:
STAGE 4 / rrna.blast_transcripts / 182.307s / 1515.3MB
Blast transcripts against known rRNA database
biolite.pipeline.run:
STAGE 5 / rrna.find_exemplars / 182.704s / 1515.3MB
Parse blast output for exemplar rRNA sequences
agalma.rrna.find_exemplars: selecting an exemplar for gene target large-mito-rRNA
agalma.rrna.find_exemplars: large-mito-rRNA not found in the assembly, skipping
agalma.rrna.find_exemplars: selecting an exemplar for gene target large-nuclear-rRNA
agalma.rrna.find_exemplars: large-nuclear-rRNA not found in the assembly, skipping
agalma.rrna.find_exemplars: selecting an exemplar for gene target small-mito-rRNA
agalma.rrna.find_exemplars: small-mito-rRNA not found in the assembly, skipping
agalma.rrna.find_exemplars: selecting an exemplar for gene target small-nuclear-rRNA
agalma.rrna.find_exemplars: small-nuclear-rRNA not found in the assembly, skipping
biolite.pipeline.run:
STAGE 6 / rrna.map_reads / 182.742s / 1515.6MB
Map reads against rRNA exemplars
agalma.rrna.map_reads: no rRNA exemplars were found... skipping
biolite.pipeline.run:
STAGE 7 / rrna.exclude_reads / 182.745s / 1515.6MB
Exclude pairs where either read maps to an rRNA exemplar
agalma.rrna.exclude_reads: no rRNA exemplars were found... skipping
biolite.pipeline.run:
STAGE 8 / transcriptome.assemble_connector / 182.747s / 1515.6MB
[connector between "rrna" and "assemble"]
biolite.pipeline.run:
STAGE 9 / assemble.setup_rrna / 182.748s / 1515.6MB
Retrieve the rRNA exemplars from the database
agalma.assemble.setup_rrna: no previous rrna run found for id SRX288285
biolite.pipeline.run:
STAGE 10 / assemble.filter_data / 182.752s / 1515.6MB
Filter out low-quality reads
biolite.pipeline.run:
STAGE 11 / assemble.assemble / 185.695s / 1515.7MB
Assemble the filtered reads with Trinity
biolite.pipeline.run:
STAGE 12 / assemble.parse_assembly / 247.397s / 1540.3MB
Parse the assembly into the sequences table
biolite.pipeline.run:
STAGE 13 / assemble.remove_vectors / 247.415s / 1541.1MB
Remove vector contaminants with UniVec
biolite.utils.safe_mkdir: creating directory 'univec'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/transcriptome-2/univec' already exists
agalma.assemble.remove_vectors: found 0 vector contaminants
biolite.pipeline.run:
STAGE 14 / assemble.remove_rrna / 248.018s / 1541.3MB
Remove rRNA using curated and exemplar sequences
biolite.utils.safe_mkdir: creating directory 'rrna'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/transcriptome-2/rrna' already exists
agalma.assemble.remove_rrna: found 0 ribosomal RNAs
biolite.pipeline.run:
STAGE 15 / assemble.estimate_confidence / 248.619s / 1541.3MB
Estimate coverage and confidence values for each transcript
biolite.pipeline.run:
STAGE 16 / assemble.parse_confidence / 259.236s / 1541.3MB
Parse estimated confidence scores and update database
biolite.pipeline.run:
STAGE 17 / transcriptome.write_sequences / 259.240s / 1541.3MB
Write assembled sequences to FASTA
biolite.pipeline.run:
STAGE 18 / translate.identify_orfs / 259.245s / 1541.3MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 19 / translate.annotate_orfs / 259.660s / 1541.3MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/transcriptome-2/blastp'
biolite.pipeline.run:
STAGE 20 / translate.select_orfs / 344.512s / 1541.3MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 344.526s / 1541.3MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/report'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/report/css'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/report/img'
agalma.agalma_report.report_runs: 1 has pipelines: qc
agalma.agalma_report.report_runs: added qc report for 1
agalma.agalma_report.report_runs: 2 has pipelines: assemble,translate,rrna,transcriptome,insert_size
agalma.agalma_report.report_runs: added insert_size report for 2
agalma.agalma_report.report_runs: added rrna report for 2
agalma.agalma_report.report_runs: added assemble report for 2
agalma.agalma_report.report_runs: added translate report for 2
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: directory 'report' already exists
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/matplotlib/font_manager.py:1297: UserWarning: findfont: Font family [u'Arial'] not found. Falling back to DejaVu Sans
(prop.get_family(), self.defaultFamily[fontext]))
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/report/css' already exists
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/report/img' already exists
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/tabular-report'
SRX288285 [1, 2]
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/tabular-report/css'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/tabular-report/img'
Generating report for catalog ID 'SRX288285'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-transcriptome-VE7/tabular-report/SRX288285'
agalma.agalma_report.report_runs: 1 has pipelines: qc
agalma.agalma_report.report_runs: added qc report for 1
agalma.agalma_report.report_runs: 2 has pipelines: assemble,translate,rrna,transcriptome,insert_size
agalma.agalma_report.report_runs: added insert_size report for 2
agalma.agalma_report.report_runs: added rrna report for 2
agalma.agalma_report.report_runs: added assemble report for 2
agalma.agalma_report.report_runs: added translate report for 2
Created temp directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7'
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-phylogeny-3B7/agalma.sqlite
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7' already exists
SRX288285 [2018-03-05 20:10:26]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288432_1.fq (2.5 MB)
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288432_2.fq (2.5 MB)
species: Agalma elegans
ncbi_id: 316166
itis_id: None
extraction_id: None
library_id: SRR871526
library_type: TRANSCRIPTOMIC
individual: None
treatment: None
sequencer: Illumina HiSeq 2000
seq_center: Dunnlab
note: None
sample_prep: Trizol | Illumina TruSeq RNA Sample Prep Kit RNA Purification Beads ; 2 rounds | Illumina TruSeq RNA Sample Prep Kit
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-phylogeny-3B7/agalma.sqlite
SRX288432 [2018-03-05 20:10:26]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288432_1.fq (2.5 MB)
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288432_2.fq (2.5 MB)
species: Craseoa lathetica
ncbi_id: 316205
itis_id: None
extraction_id: None
library_id: SRR871529
library_type: TRANSCRIPTOMIC
individual: None
treatment: None
sequencer: Illumina HiSeq 2000
seq_center: Dunnlab
note: None
sample_prep: Invitrogen Dynabeads mRNA TESTDATAECT kit ; 1 round | Illumina TruSeq RNA Sample Prep Kit
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-phylogeny-3B7/agalma.sqlite
SRX288431 [2018-03-05 20:10:28]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288431_1.fq (252 KB)
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288431_2.fq (252 KB)
species: Physalia physalis
ncbi_id: 168775
itis_id: None
extraction_id: None
library_id: SRR871528
library_type: TRANSCRIPTOMIC
individual: None
treatment: None
sequencer: Illumina HiSeq 2000
seq_center: Dunnlab
note: None
sample_prep: Trizol | Illumina TruSeq RNA Sample Prep Kit RNA Purification Beads ; 2 rounds | Illumina TruSeq RNA Sample Prep Kit
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-phylogeny-3B7/agalma.sqlite
SRX288430 [2018-03-05 20:10:30]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288430_1.fq (757 KB)
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288430_2.fq (757 KB)
species: Nanomia bijuga
ncbi_id: 168759
itis_id: None
extraction_id: None
library_id: SRR871527
library_type: TRANSCRIPTOMIC
individual: None
treatment: None
sequencer: Illumina HiSeq 2000
seq_center: Dunnlab
note: None
sample_prep: Trizol | Invitrogen Dynabeads mRNA Purification Kit ; 2 rounds | Illumina TruSeq RNA Sample Prep Kit
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-phylogeny-3B7/agalma.sqlite
JGI_NEMVEC [2018-03-05 20:10:39]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/JGI_NEMVEC.fa (22 KB)
species: Nematostella vectensis
ncbi_id: 45351
itis_id: 52498
extraction_id: None
library_id: None
library_type: genome
individual: None
treatment: None
sequencer: None
seq_center: None
note: Gene predictions from genome sequencing
sample_prep: None
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-phylogeny-3B7/agalma.sqlite
NCBI_HYDMAG [2018-03-05 20:10:41]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/NCBI_HYDMAG.pfa (7 KB)
species: Hydra magnipapillata
ncbi_id: 6085
itis_id: 50845
extraction_id: None
library_id: None
library_type: genome
individual: None
treatment: None
sequencer: None
seq_center: None
note: Gene predictions from genome sequencing
sample_prep: None
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-1'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / assemble.setup_data / 0.227s / 125.0MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / assemble.setup_rrna / 0.229s / 125.0MB
Retrieve the rRNA exemplars from the database
__main__.setup_rrna: no previous rrna run found for id SRX288285
biolite.pipeline.run:
STAGE 2 / assemble.filter_data / 0.231s / 125.0MB
Filter out low-quality reads
biolite.pipeline.run:
STAGE 3 / assemble.assemble / 0.663s / 158.1MB
Assemble the filtered reads with Trinity
biolite.pipeline.run:
STAGE 4 / assemble.parse_assembly / 26.303s / 976.3MB
Parse the assembly into the sequences table
biolite.pipeline.run:
STAGE 5 / assemble.remove_vectors / 26.314s / 976.4MB
Remove vector contaminants with UniVec
biolite.utils.safe_mkdir: creating directory 'univec'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-1/univec' already exists
__main__.remove_vectors: found 0 vector contaminants
biolite.pipeline.run:
STAGE 6 / assemble.remove_rrna / 26.858s / 976.5MB
Remove rRNA using curated and exemplar sequences
biolite.utils.safe_mkdir: creating directory 'rrna'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-1/rrna' already exists
__main__.remove_rrna: found 0 ribosomal RNAs
biolite.pipeline.run:
STAGE 7 / assemble.estimate_confidence / 27.374s / 976.5MB
Estimate coverage and confidence values for each transcript
biolite.pipeline.run:
STAGE 8 / assemble.parse_confidence / 28.951s / 976.6MB
Parse estimated confidence scores and update database
biolite.pipeline.run:
FINISHED / 28.953s / 976.6MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-2'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / translate.setup_sequences / 2.085s / 127.3MB
Locate a previous assemble or import run
__main__.setup_sequences: using previous 'assemble' run id 1
biolite.pipeline.run:
STAGE 1 / translate.identify_orfs / 2.089s / 127.3MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 2 / translate.annotate_orfs / 2.421s / 154.4MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-2/blastp'
biolite.pipeline.run:
STAGE 3 / translate.select_orfs / 96.076s / 337.2MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 96.090s / 337.5MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-3'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / assemble.setup_data / 0.082s / 123.0MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / assemble.setup_rrna / 0.085s / 123.2MB
Retrieve the rRNA exemplars from the database
__main__.setup_rrna: no previous rrna run found for id SRX288430
biolite.pipeline.run:
STAGE 2 / assemble.filter_data / 0.088s / 123.2MB
Filter out low-quality reads
biolite.pipeline.run:
STAGE 3 / assemble.assemble / 0.283s / 154.4MB
Assemble the filtered reads with Trinity
biolite.pipeline.run:
STAGE 4 / assemble.parse_assembly / 18.601s / 719.0MB
Parse the assembly into the sequences table
biolite.pipeline.run:
STAGE 5 / assemble.remove_vectors / 18.612s / 719.9MB
Remove vector contaminants with UniVec
biolite.utils.safe_mkdir: creating directory 'univec'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-3/univec' already exists
__main__.remove_vectors: found 0 vector contaminants
biolite.pipeline.run:
STAGE 6 / assemble.remove_rrna / 19.146s / 720.1MB
Remove rRNA using curated and exemplar sequences
biolite.utils.safe_mkdir: creating directory 'rrna'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-3/rrna' already exists
__main__.remove_rrna: found 0 ribosomal RNAs
biolite.pipeline.run:
STAGE 7 / assemble.estimate_confidence / 19.704s / 720.1MB
Estimate coverage and confidence values for each transcript
biolite.pipeline.run:
STAGE 8 / assemble.parse_confidence / 20.931s / 720.2MB
Parse estimated confidence scores and update database
biolite.pipeline.run:
FINISHED / 20.935s / 720.2MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-4'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / translate.setup_sequences / 0.111s / 127.4MB
Locate a previous assemble or import run
__main__.setup_sequences: using previous 'assemble' run id 3
biolite.pipeline.run:
STAGE 1 / translate.identify_orfs / 0.114s / 127.7MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 2 / translate.annotate_orfs / 0.367s / 154.5MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-4/blastp'
biolite.pipeline.run:
STAGE 3 / translate.select_orfs / 31.670s / 306.5MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 31.677s / 306.8MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-5'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / assemble.setup_data / 0.085s / 122.9MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / assemble.setup_rrna / 0.087s / 123.2MB
Retrieve the rRNA exemplars from the database
__main__.setup_rrna: no previous rrna run found for id SRX288431
biolite.pipeline.run:
STAGE 2 / assemble.filter_data / 0.089s / 123.2MB
Filter out low-quality reads
biolite.pipeline.run:
STAGE 3 / assemble.assemble / 0.204s / 154.3MB
Assemble the filtered reads with Trinity
biolite.pipeline.run:
STAGE 4 / assemble.parse_assembly / 9.285s / 719.0MB
Parse the assembly into the sequences table
biolite.pipeline.run:
STAGE 5 / assemble.remove_vectors / 9.294s / 719.9MB
Remove vector contaminants with UniVec
biolite.utils.safe_mkdir: creating directory 'univec'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-5/univec' already exists
__main__.remove_vectors: found 0 vector contaminants
biolite.pipeline.run:
STAGE 6 / assemble.remove_rrna / 9.738s / 720.2MB
Remove rRNA using curated and exemplar sequences
biolite.utils.safe_mkdir: creating directory 'rrna'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-5/rrna' already exists
__main__.remove_rrna: found 0 ribosomal RNAs
biolite.pipeline.run:
STAGE 7 / assemble.estimate_confidence / 10.318s / 720.2MB
Estimate coverage and confidence values for each transcript
biolite.pipeline.run:
STAGE 8 / assemble.parse_confidence / 11.090s / 720.2MB
Parse estimated confidence scores and update database
biolite.pipeline.run:
FINISHED / 11.092s / 720.2MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-6'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / translate.setup_sequences / 0.108s / 127.4MB
Locate a previous assemble or import run
__main__.setup_sequences: using previous 'assemble' run id 5
biolite.pipeline.run:
STAGE 1 / translate.identify_orfs / 0.111s / 127.6MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 2 / translate.annotate_orfs / 0.297s / 154.7MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-6/blastp'
biolite.pipeline.run:
STAGE 3 / translate.select_orfs / 17.430s / 308.4MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 17.439s / 308.6MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-7'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / assemble.setup_data / 0.040s / 122.8MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / assemble.setup_rrna / 0.042s / 123.1MB
Retrieve the rRNA exemplars from the database
__main__.setup_rrna: no previous rrna run found for id SRX288432
biolite.pipeline.run:
STAGE 2 / assemble.filter_data / 0.043s / 123.1MB
Filter out low-quality reads
biolite.pipeline.run:
STAGE 3 / assemble.assemble / 0.487s / 154.2MB
Assemble the filtered reads with Trinity
biolite.pipeline.run:
STAGE 4 / assemble.parse_assembly / 24.038s / 815.3MB
Parse the assembly into the sequences table
biolite.pipeline.run:
STAGE 5 / assemble.remove_vectors / 24.050s / 816.4MB
Remove vector contaminants with UniVec
biolite.utils.safe_mkdir: creating directory 'univec'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-7/univec' already exists
__main__.remove_vectors: found 0 vector contaminants
biolite.pipeline.run:
STAGE 6 / assemble.remove_rrna / 24.536s / 816.5MB
Remove rRNA using curated and exemplar sequences
biolite.utils.safe_mkdir: creating directory 'rrna'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/assemble-7/rrna' already exists
__main__.remove_rrna: found 0 ribosomal RNAs
biolite.pipeline.run:
STAGE 7 / assemble.estimate_confidence / 25.136s / 816.6MB
Estimate coverage and confidence values for each transcript
biolite.pipeline.run:
STAGE 8 / assemble.parse_confidence / 27.133s / 816.6MB
Parse estimated confidence scores and update database
biolite.pipeline.run:
FINISHED / 27.137s / 816.6MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-8'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / translate.setup_sequences / 0.169s / 127.5MB
Locate a previous assemble or import run
__main__.setup_sequences: using previous 'assemble' run id 7
biolite.pipeline.run:
STAGE 1 / translate.identify_orfs / 0.172s / 127.7MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 2 / translate.annotate_orfs / 0.503s / 154.9MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-8/blastp'
biolite.pipeline.run:
STAGE 3 / translate.select_orfs / 107.290s / 347.9MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 107.302s / 348.2MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/import-9'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / import.setup_paths / 0.229s / 115.4MB
Determine the paths to the FASTA files
__main__.setup_paths: found paths [u'/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/JGI_NEMVEC.fa']
biolite.pipeline.run:
STAGE 1 / import.parse_sequences / 0.231s / 115.7MB
Parse the sequences from the FASTA files
biolite.pipeline.run:
FINISHED / 0.241s / 116.8MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-10'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / translate.setup_sequences / 0.056s / 127.4MB
Locate a previous assemble or import run
__main__.setup_sequences: using previous 'import' run id 9
biolite.pipeline.run:
STAGE 1 / translate.identify_orfs / 0.059s / 127.7MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 2 / translate.annotate_orfs / 0.229s / 155.0MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/translate-10/blastp'
biolite.pipeline.run:
STAGE 3 / translate.select_orfs / 56.689s / 310.1MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 56.700s / 310.4MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/annotate-11'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / annotate.setup_sequences / 0.116s / 123.8MB
Locate a previous import run
__main__.setup_sequences: using previous 'import' run id 9
biolite.pipeline.run:
STAGE 1 / annotate.annotate / 0.119s / 124.2MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/annotate-11/blastp'
biolite.pipeline.run:
STAGE 2 / annotate.parse / 43.388s / 311.9MB
Parse the annotations into the sequences table
biolite.pipeline.run:
FINISHED / 43.412s / 311.9MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/import-12'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / import.setup_paths / 0.124s / 115.2MB
Determine the paths to the FASTA files
__main__.setup_paths: found paths [u'/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/NCBI_HYDMAG.pfa']
biolite.pipeline.run:
STAGE 1 / import.parse_sequences / 0.126s / 115.4MB
Parse the sequences from the FASTA files
biolite.pipeline.run:
FINISHED / 0.137s / 116.6MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/annotate-13'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / annotate.setup_sequences / 0.139s / 123.3MB
Locate a previous import run
__main__.setup_sequences: using previous 'import' run id 12
biolite.pipeline.run:
STAGE 1 / annotate.annotate / 0.141s / 123.8MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/annotate-13/blastp'
biolite.pipeline.run:
STAGE 2 / annotate.parse / 56.069s / 309.5MB
Parse the annotations into the sequences table
biolite.pipeline.run:
FINISHED / 56.076s / 309.5MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/homologize-14'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / homologize.init / 0.110s / 124.1MB
Determine the version of gene entries to use and lookup species data
agalma.database.latest_genes_version: using default genes version 0
biolite.pipeline.run:
STAGE 1 / homologize.write_fasta / 0.114s / 124.6MB
Write sequences from the Agalma database to a FASTA file
biolite.utils.safe_mkdir: creating directory 'blastp'
biolite.pipeline.run:
STAGE 2 / homologize.prepare_blast / 0.118s / 125.1MB
Prepare all-by-all BLAST database and command list
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/homologize-14/blastp' already exists
biolite.pipeline.run:
STAGE 3 / homologize.run_blast / 0.231s / 156.9MB
Run all-by-all BLAST
biolite.pipeline.run:
STAGE 4 / homologize.parse_edges / 1.125s / 157.4MB
Parse BLAST hits into edges weighted by bitscore
biolite.pipeline.run:
STAGE 5 / homologize.mcl_cluster / 1.168s / 157.8MB
Run mcl on all-by-all graph to form gene clusters
biolite.pipeline.run:
STAGE 6 / homologize.load_mcl_cluster / 1.214s / 157.8MB
Load cluster file from mcl into homology database
__main__.load_mcl_cluster: histogram of gene cluster sizes:
2 : 2
3 : 3
4 : 2
6 : 1
7 : 1
9 : 1
13 : 1
biolite.pipeline.run:
FINISHED / 1.223s / 158.1MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/multalign-15'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / multalign.init / 0.285s / 115.5MB
Locate a previous homology or treeprune run
__main__.init: using previous 'homologize' run id 14
biolite.pipeline.run:
STAGE 1 / multalign.select_clusters / 0.287s / 115.5MB
Select a cluster for each homologize component that meets size, sequence
length, and composition requirements
biolite.utils.safe_mkdir: creating directory 'clusters'
agalma.database.select_homology_models: found the following taxa for homology id 14:
Agalma_elegans (SRX288285)
Nematostella_vectensis (JGI_NEMVEC)
Craseoa_lathetica (SRX288432)
Hydra_magnipapillata (NCBI_HYDMAG)
Nanomia_bijuga (SRX288430)
Physalia_physalis (SRX288431)
biolite.pipeline.run:
STAGE 2 / multalign.align_sequences / 0.293s / 116.5MB
Align sequences within each component
biolite.utils.safe_mkdir: creating directory 'alignments'
biolite.pipeline.run:
STAGE 3 / multalign.cleanup_alignments / 5.517s / 317.6MB
Clean up aligned sequences with Gblocks
biolite.pipeline.run:
STAGE 4 / multalign.parse_alignments / 6.014s / 317.7MB
Parse the cleaned sequences into the database
__main__.parse_alignments: dropping sequence Agalma_elegans@23 in cluster 1
__main__.parse_alignments: dropping sequence Physalia_physalis@45 in cluster 1
__main__.parse_alignments: dropping sequence Physalia_physalis@51 in cluster 1
biolite.pipeline.run:
FINISHED / 6.043s / 318.1MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/genetree-16'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / genetree.init / 0.184s / 108.1MB
Find alignments in database
biolite.pipeline.run:
STAGE 1 / genetree.genetrees / 0.186s / 108.3MB
Build gene trees from alignments
biolite.utils.safe_mkdir: creating directory 'alignments'
biolite.utils.safe_mkdir: creating directory 'trees'
biolite.pipeline.run:
STAGE 2 / genetree.parse / 6.311s / 127.0MB
Parse the trees into the database. Check for jobs that timed out.
biolite.pipeline.run:
FINISHED / 6.314s / 127.2MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/treeinform-17'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / treeinform.init / 0.094s / 123.4MB
Determine path to input trees
__main__.init: found genetree run 16
biolite.pipeline.run:
STAGE 1 / treeinform.identify_candidate_variants / 0.096s / 123.4MB
Identify candidates variants
biolite.pipeline.run:
STAGE 2 / treeinform.reassign_genes / 0.104s / 123.5MB
Reassign candidate variants to the same gene
agalma.database.validate_genes: Validating model IDs:
unique model_id: 113
= all model_id: 113
agalma.database.validate_genes: Validating number of transcripts:
original assembly: 113
= revised assembly: 113
agalma.database.validate_genes: Validating number of genes:
original assembly: 64
- reassigned: 2
+ newly created: 1
= revised assembly: 63
biolite.pipeline.run:
FINISHED / 0.109s / 124.1MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/homologize-18'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / homologize.init / 0.141s / 123.6MB
Determine the version of gene entries to use and lookup species data
agalma.database.latest_genes_version: using genes version 17
biolite.pipeline.run:
STAGE 1 / homologize.write_fasta / 0.145s / 124.1MB
Write sequences from the Agalma database to a FASTA file
biolite.utils.safe_mkdir: creating directory 'blastp'
biolite.pipeline.run:
STAGE 2 / homologize.prepare_blast / 0.148s / 124.6MB
Prepare all-by-all BLAST database and command list
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/homologize-18/blastp' already exists
biolite.pipeline.run:
STAGE 3 / homologize.run_blast / 0.264s / 156.4MB
Run all-by-all BLAST
biolite.pipeline.run:
STAGE 4 / homologize.parse_edges / 1.127s / 156.9MB
Parse BLAST hits into edges weighted by bitscore
biolite.pipeline.run:
STAGE 5 / homologize.mcl_cluster / 1.172s / 157.2MB
Run mcl on all-by-all graph to form gene clusters
biolite.pipeline.run:
STAGE 6 / homologize.load_mcl_cluster / 1.215s / 157.3MB
Load cluster file from mcl into homology database
__main__.load_mcl_cluster: histogram of gene cluster sizes:
2 : 2
3 : 3
4 : 2
5 : 1
7 : 1
9 : 1
13 : 1
biolite.pipeline.run:
FINISHED / 1.222s / 157.6MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/multalign-19'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / multalign.init / 0.082s / 115.9MB
Locate a previous homology or treeprune run
__main__.init: using previous 'homologize' run id 18
biolite.pipeline.run:
STAGE 1 / multalign.select_clusters / 0.083s / 115.9MB
Select a cluster for each homologize component that meets size, sequence
length, and composition requirements
biolite.utils.safe_mkdir: creating directory 'clusters'
agalma.database.select_homology_models: found the following taxa for homology id 18:
Agalma_elegans (SRX288285)
Nematostella_vectensis (JGI_NEMVEC)
Craseoa_lathetica (SRX288432)
Hydra_magnipapillata (NCBI_HYDMAG)
Nanomia_bijuga (SRX288430)
Physalia_physalis (SRX288431)
biolite.pipeline.run:
STAGE 2 / multalign.align_sequences / 0.089s / 116.9MB
Align sequences within each component
biolite.utils.safe_mkdir: creating directory 'alignments'
biolite.pipeline.run:
STAGE 3 / multalign.cleanup_alignments / 4.832s / 325.8MB
Clean up aligned sequences with Gblocks
biolite.pipeline.run:
STAGE 4 / multalign.parse_alignments / 5.344s / 325.9MB
Parse the cleaned sequences into the database
__main__.parse_alignments: dropping sequence Agalma_elegans@23 in cluster 7
__main__.parse_alignments: dropping sequence Physalia_physalis@45 in cluster 7
__main__.parse_alignments: dropping sequence Physalia_physalis@51 in cluster 7
__main__.parse_alignments: dropping sequence Agalma_elegans@8 in cluster 10
biolite.pipeline.run:
FINISHED / 5.375s / 326.2MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/genetree-20'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / genetree.init / 0.098s / 108.4MB
Find alignments in database
biolite.pipeline.run:
STAGE 1 / genetree.genetrees / 0.101s / 108.6MB
Build gene trees from alignments
biolite.utils.safe_mkdir: creating directory 'alignments'
biolite.utils.safe_mkdir: creating directory 'trees'
biolite.pipeline.run:
STAGE 2 / genetree.parse / 6.192s / 127.2MB
Parse the trees into the database. Check for jobs that timed out.
biolite.pipeline.run:
FINISHED / 6.197s / 127.3MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/treeprune-21'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / treeprune.init / 0.351s / 124.1MB
Determine path to input trees
biolite.pipeline.run:
STAGE 1 / treeprune.prune_trees / 0.352s / 124.1MB
Prune each tree using monophyly masking and paralogy pruning
biolite.pipeline.run:
STAGE 2 / treeprune.parse_trees / 0.382s / 124.6MB
Parse the tips of each tree to create a cluster in the database
__main__.parse_trees: histogram of gene cluster sizes:
4 : 4
5 : 1
6 : 1
biolite.pipeline.run:
FINISHED / 0.396s / 125.2MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/multalign-22'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / multalign.init / 0.125s / 115.5MB
Locate a previous homology or treeprune run
__main__.init: using previous 'treeprune' run id 21
biolite.pipeline.run:
STAGE 1 / multalign.select_clusters / 0.126s / 115.5MB
Select a cluster for each homologize component that meets size, sequence
length, and composition requirements
biolite.utils.safe_mkdir: creating directory 'clusters'
agalma.database.select_homology_models: found the following taxa for homology id 21:
Agalma_elegans (SRX288285)
Nematostella_vectensis (JGI_NEMVEC)
Craseoa_lathetica (SRX288432)
Hydra_magnipapillata (NCBI_HYDMAG)
Nanomia_bijuga (SRX288430)
Physalia_physalis (SRX288431)
biolite.pipeline.run:
STAGE 2 / multalign.align_sequences / 0.134s / 116.5MB
Align sequences within each component
biolite.utils.safe_mkdir: creating directory 'alignments'
biolite.pipeline.run:
STAGE 3 / multalign.cleanup_alignments / 3.872s / 219.0MB
Clean up aligned sequences with Gblocks
biolite.pipeline.run:
STAGE 4 / multalign.parse_alignments / 4.390s / 219.1MB
Parse the cleaned sequences into the database
biolite.pipeline.run:
FINISHED / 4.412s / 219.5MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/supermatrix-23'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / supermatrix.init / 0.323s / 127.4MB
Find alignments in database
biolite.pipeline.run:
STAGE 1 / supermatrix.supermatrix / 0.331s / 128.2MB
Concatenate multiple alignments into a supermatrix
biolite.pipeline.run:
STAGE 2 / supermatrix.trim / 0.335s / 128.2MB
Trim the supermatrix to the specified proportion of occupancy
__main__.trim: no proportion specified... skipping
biolite.pipeline.run:
STAGE 3 / supermatrix.parse / 0.338s / 128.2MB
Store the supermatrix in the database
biolite.pipeline.run:
FINISHED / 0.344s / 128.5MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/speciestree-24'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / speciestree.init / 0.180s / 123.8MB
Find supermatrix in database
biolite.pipeline.run:
STAGE 1 / speciestree.speciestree / 0.182s / 124.2MB
Build species tree with bootstraps
biolite.pipeline.run:
STAGE 2 / speciestree.parse / 2.682s / 154.8MB
Parse the tree into the database
__main__.parse: species tree:
/--------------------------------------------------------------------------------------------------------------- Physalia physalis
/-------------------------------------------------------@
| | /------------------------------------------------------- Hydra magnipapillata
| \-------------------------------------------------------@
| \------------------------------------------------------- Nematostella vectensis
@
| /------------------------------------------------------- Agalma elegans
|---------------------------------------------------------------------------------------------------------------@
| \------------------------------------------------------- Craseoa lathetica
|
\----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nanomia bijuga
biolite.pipeline.run:
FINISHED / 2.687s / 154.8MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/speciestree-25'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / speciestree.init / 0.116s / 123.8MB
Find supermatrix in database
biolite.pipeline.run:
STAGE 1 / speciestree.speciestree / 0.119s / 124.2MB
Build species tree with bootstraps
biolite.pipeline.run:
STAGE 2 / speciestree.parse / 7.666s / 154.8MB
Parse the tree into the database
__main__.parse: species tree:
/--------------------------------- Agalma elegans
/---------------------------------100
/--------------------------------100 \--------------------------------- Craseoa lathetica
| |
/---------------------------------100 \------------------------------------------------------------------- Nanomia bijuga
| |
/--------------------------------@ \---------------------------------------------------------------------------------------------------- Physalia physalis
| |
@ \-------------------------------------------------------------------------------------------------------------------------------------- Hydra magnipapillata
|
\----------------------------------------------------------------------------------------------------------------------------------------------------------------------- Nematostella vectensis
biolite.pipeline.run:
FINISHED / 7.674s / 154.9MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report'
agalma.agalma_report.report_runs: no catalog entry found for id 'AllByAllTest'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/css'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/img'
agalma.agalma_report.report_runs: 14 has pipelines: homologize
agalma.agalma_report.report_runs: added homologize report for 14
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/matplotlib/axes/_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
warnings.warn("No labelled objects found. "
agalma.agalma_report.report_runs: 15 has pipelines: multalign
agalma.agalma_report.report_runs: added multalign report for 15
agalma.agalma_report.report_runs: 16 has pipelines: genetree
agalma.agalma_report.report_runs: added genetree report for 16
agalma.agalma_report.report_runs: 17 has pipelines: treeinform
agalma.agalma_report.report_runs: 18 has pipelines: homologize
agalma.agalma_report.report_runs: added homologize report for 18
agalma.agalma_report.report_runs: 19 has pipelines: multalign
agalma.agalma_report.report_runs: added multalign report for 19
agalma.agalma_report.report_runs: 20 has pipelines: genetree
agalma.agalma_report.report_runs: added genetree report for 20
agalma.agalma_report.report_runs: 21 has pipelines: treeprune
agalma.agalma_report.report_runs: added treeprune report for 21
agalma.agalma_report.report_runs: 22 has pipelines: multalign
agalma.agalma_report.report_runs: added multalign report for 22
agalma.agalma_report.report_runs: 23 has pipelines: supermatrix
agalma.agalma_report.report_runs: added supermatrix report for 23
agalma.agalma_report.report_runs: 24 has pipelines: speciestree
agalma.agalma_report.report_runs: added speciestree report for 24
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/js'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/js' already exists
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/js' already exists
agalma.agalma_report.report_runs: 25 has pipelines: speciestree
agalma.agalma_report.report_runs: added speciestree report for 25
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/js' already exists
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/js' already exists
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/js' already exists
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: directory 'report' already exists
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/matplotlib/font_manager.py:1297: UserWarning: findfont: Font family [u'Arial'] not found. Falling back to DejaVu Sans
(prop.get_family(), self.defaultFamily[fontext]))
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/css' already exists
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/img' already exists
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: directory 'report' already exists
Saved figure to '/home/ljcohen/tmp/agalma-test-phylogeny-3B7/report/AllByAllTest.pdf'
Created temp directory '/home/ljcohen/tmp/agalma-test-expression-Lnu'
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-expression-Lnu/agalma.sqlite
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-expression-Lnu' already exists
SRX033366 [2018-03-05 20:20:20]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX033366.fq (1.7 MB)
species: Nanomia bijuga
ncbi_id: 168759
itis_id: None
extraction_id: None
library_id: SRR081276
library_type: TRANSCRIPTOMIC
individual: specimen-1
treatment: gastrozooids
sequencer: Illumina Genome Analyzer IIx
seq_center: Dunnlab
note: mRNA directly extracted from youngest gastrozooids from Nanomia bijuga specimen #1
sample_prep: Illumina mRNA-seq sample kit (#RS-930-1001, Illumina Inc.)
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-expression-Lnu/agalma.sqlite
SRX036876 [2018-03-05 20:20:21]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX036876.fq (1.9 MB)
species: Nanomia bijuga
ncbi_id: 168759
itis_id: None
extraction_id: None
library_id: SRR089297
library_type: TRANSCRIPTOMIC
individual: specimen-2
treatment: gastrozooids
sequencer: Illumina Genome Analyzer IIx
seq_center: Dunnlab
note: mRNA directly extracted from youngest gastrozooids from Nanomia bijuga specimen #2
sample_prep: Illumina mRNA-seq sample kit (#RS-930-1001, Illumina Inc.)
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-expression-Lnu/agalma.sqlite
SRX288430 [2018-03-05 20:20:22]
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288430_1.fq (757 KB)
/home/ljcohen/miniconda2/envs/agalma/lib/python2.7/site-packages/agalma/testdata/SRX288430_2.fq (757 KB)
species: Nanomia bijuga
ncbi_id: 168759
itis_id: None
extraction_id: None
library_id: None
library_type: None
individual: None
treatment: None
sequencer: None
seq_center: None
note: None
sample_prep: None
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/qc-1'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / qc.setup_data / 0.164s / 110.3MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / qc.fastqc / 0.165s / 110.5MB
Generate FastQC reports for each FASTQ file
biolite.pipeline.run:
STAGE 2 / qc.parse / 7.227s / 361.6MB
Parse FastQC reports into the database
biolite.pipeline.run:
FINISHED / 7.241s / 365.0MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/transcriptome-2'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / insert_size.setup_data / 0.145s / 123.8MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / insert_size.assemble_subset / 0.146s / 124.0MB
Assemble a subset of high quality reads
biolite.pipeline.run:
STAGE 2 / insert_size.estimate_insert / 13.774s / 718.9MB
Estimate insert size by mapping the subset against the assembly
biolite.pipeline.run:
STAGE 3 / rrna.assemble_subsets / 14.643s / 719.2MB
Assemble subsets of increasing numbers of reads
biolite.pipeline.run:
STAGE 4 / rrna.blast_transcripts / 114.140s / 719.8MB
Blast transcripts against known rRNA database
biolite.pipeline.run:
STAGE 5 / rrna.find_exemplars / 114.563s / 719.8MB
Parse blast output for exemplar rRNA sequences
agalma.rrna.find_exemplars: selecting an exemplar for gene target large-mito-rRNA
agalma.rrna.find_exemplars: large-mito-rRNA not found in the assembly, skipping
agalma.rrna.find_exemplars: selecting an exemplar for gene target large-nuclear-rRNA
agalma.rrna.find_exemplars: large-nuclear-rRNA not found in the assembly, skipping
agalma.rrna.find_exemplars: selecting an exemplar for gene target small-mito-rRNA
agalma.rrna.find_exemplars: small-mito-rRNA not found in the assembly, skipping
agalma.rrna.find_exemplars: selecting an exemplar for gene target small-nuclear-rRNA
agalma.rrna.find_exemplars: small-nuclear-rRNA not found in the assembly, skipping
biolite.pipeline.run:
STAGE 6 / rrna.map_reads / 114.602s / 720.2MB
Map reads against rRNA exemplars
agalma.rrna.map_reads: no rRNA exemplars were found... skipping
biolite.pipeline.run:
STAGE 7 / rrna.exclude_reads / 114.604s / 720.2MB
Exclude pairs where either read maps to an rRNA exemplar
agalma.rrna.exclude_reads: no rRNA exemplars were found... skipping
biolite.pipeline.run:
STAGE 8 / transcriptome.assemble_connector / 114.607s / 720.2MB
[connector between "rrna" and "assemble"]
biolite.pipeline.run:
STAGE 9 / assemble.setup_rrna / 114.608s / 720.2MB
Retrieve the rRNA exemplars from the database
agalma.assemble.setup_rrna: no previous rrna run found for id SRX288430
biolite.pipeline.run:
STAGE 10 / assemble.filter_data / 114.611s / 720.2MB
Filter out low-quality reads
biolite.pipeline.run:
STAGE 11 / assemble.assemble / 114.755s / 720.3MB
Assemble the filtered reads with Trinity
biolite.pipeline.run:
STAGE 12 / assemble.parse_assembly / 127.005s / 720.3MB
Parse the assembly into the sequences table
biolite.pipeline.run:
STAGE 13 / assemble.remove_vectors / 127.012s / 721.0MB
Remove vector contaminants with UniVec
biolite.utils.safe_mkdir: creating directory 'univec'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/transcriptome-2/univec' already exists
agalma.assemble.remove_vectors: found 0 vector contaminants
biolite.pipeline.run:
STAGE 14 / assemble.remove_rrna / 127.509s / 721.1MB
Remove rRNA using curated and exemplar sequences
biolite.utils.safe_mkdir: creating directory 'rrna'
biolite.utils.safe_mkdir: directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/transcriptome-2/rrna' already exists
agalma.assemble.remove_rrna: found 0 ribosomal RNAs
biolite.pipeline.run:
STAGE 15 / assemble.estimate_confidence / 128.085s / 721.1MB
Estimate coverage and confidence values for each transcript
biolite.pipeline.run:
STAGE 16 / assemble.parse_confidence / 129.101s / 721.1MB
Parse estimated confidence scores and update database
biolite.pipeline.run:
STAGE 17 / transcriptome.write_sequences / 129.104s / 721.1MB
Write assembled sequences to FASTA
biolite.pipeline.run:
STAGE 18 / translate.identify_orfs / 129.106s / 721.1MB
Identify long open reading frames
biolite.pipeline.run:
STAGE 19 / translate.annotate_orfs / 129.306s / 721.1MB
Blastp protein sequences against SwissProt
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/transcriptome-2/blastp'
biolite.pipeline.run:
STAGE 20 / translate.select_orfs / 159.621s / 721.1MB
Select the open reading frame with the best evalue
biolite.pipeline.run:
FINISHED / 159.630s / 721.1MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/qc-3'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / qc.setup_data / 0.373s / 108.7MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / qc.fastqc / 0.375s / 108.7MB
Generate FastQC reports for each FASTQ file
biolite.pipeline.run:
STAGE 2 / qc.parse / 3.853s / 381.3MB
Parse FastQC reports into the database
biolite.pipeline.run:
FINISHED / 3.862s / 383.1MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/expression-4'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / expression.setup_data / 0.373s / 123.2MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / expression.setup_reference / 0.375s / 123.2MB
Locate reference sequences in the Agalma database
__main__.setup_reference: using previous 'transcriptome' run id 2
biolite.pipeline.run:
STAGE 2 / expression.calculate / 0.385s / 124.0MB
Calculate gene and isoform expression with RSEM
biolite.pipeline.run:
STAGE 3 / expression.parse_counts / 1.363s / 186.9MB
Parse gene-level counts into Agalma database
biolite.pipeline.run:
FINISHED / 1.365s / 187.1MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/report-SRX033366'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/report-SRX033366/css'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/report-SRX033366/img'
agalma.agalma_report.report_runs: 3 has pipelines: qc
agalma.agalma_report.report_runs: added qc report for 3
agalma.agalma_report.report_runs: 4 has pipelines: expression
agalma.agalma_report.report_runs: added expression report for 4
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/qc-5'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / qc.setup_data / 0.089s / 108.6MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / qc.fastqc / 0.090s / 108.6MB
Generate FastQC reports for each FASTQ file
biolite.pipeline.run:
STAGE 2 / qc.parse / 3.610s / 361.6MB
Parse FastQC reports into the database
biolite.pipeline.run:
FINISHED / 3.620s / 363.4MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/expression-6'
biolite.pipeline.run: Starting at stage 0
biolite.pipeline.run:
STAGE 0 / expression.setup_data / 0.046s / 123.1MB
Setup paths to the FASTQ input sequence data
biolite.pipeline.setup_data: reading data from paths in catalog
biolite.pipeline.run:
STAGE 1 / expression.setup_reference / 0.048s / 123.1MB
Locate reference sequences in the Agalma database
__main__.setup_reference: using previous 'transcriptome' run id 2
biolite.pipeline.run:
STAGE 2 / expression.calculate / 0.054s / 123.9MB
Calculate gene and isoform expression with RSEM
biolite.pipeline.run:
STAGE 3 / expression.parse_counts / 1.120s / 186.7MB
Parse gene-level counts into Agalma database
biolite.pipeline.run:
FINISHED / 1.124s / 186.9MB
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/report-SRX036876'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/report-SRX036876/css'
biolite.utils.safe_mkdir: creating directory '/home/ljcohen/tmp/agalma-test-expression-Lnu/report-SRX036876/img'
agalma.agalma_report.report_runs: 5 has pipelines: qc
agalma.agalma_report.report_runs: added qc report for 5
agalma.agalma_report.report_runs: 6 has pipelines: expression
agalma.agalma_report.report_runs: added expression report for 6
biolite.config.parse_env_resources: database=/home/ljcohen/tmp/agalma-test-expression-Lnu/agalma.sqlite
biolite.config.parse_env_resources: threads=6
biolite.config.parse_env_resources: memory=14441M
__main__.expression: looking up sequence run 2
__main__.expression: found catalog id 2 (Nanomia bijuga)
__main__.expression: looking up expression runs
__main__.expression: found expression runs:
SRX288430 (Nanomia bijuga): [4,6]
__main__.counts_table: looking up read counts for expression runs [4, 6]
__main__.counts_table: looking up sequence types for sequence run 2
agalma.database.latest_genes_version: using default genes version 0
__main__.counts_table: looking up expected counts for expression runs [4, 6]
/home/ljcohen/miniconda2/envs/agalma/bin/agalma-test-tutorial: skipping
Test ran successfully.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment