Skip to content

Instantly share code, notes, and snippets.

@pvanheus
Created May 1, 2022 16:57
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save pvanheus/b4bb432608a6785c6450d134b0f37eee to your computer and use it in GitHub Desktop.
Save pvanheus/b4bb432608a6785c6450d134b0f37eee to your computer and use it in GitHub Desktop.
Unlocking working directory.
Building DAG of jobs...
Using shell: /bin/bash
Provided cluster nodes: 32
Conda environments: ignored
Job stats:
job count min threads max threads
--------------------------------- ------- ------------- -------------
add_branch_labels 4 1 1
adjust_metadata_regions 4 1 1
align 1 8 8
all 1 1 1
ancestral 4 1 1
annotate_metadata_with_index 4 1 1
build_align 4 8 8
calculate_epiweeks 4 1 1
clades 4 1 1
colors 4 1 1
combine_input_metadata 1 1 1
combine_samples 4 1 1
combine_sequences_for_subsampling 1 1 1
diagnostic 4 1 1
distances 4 1 1
emerging_lineages 4 1 1
export 4 1 1
filter 4 1 1
finalize 4 1 1
include_hcov19_prefix 4 1 1
index 4 1 1
join_metadata_and_nextclade_qc 4 1 1
logistic_growth 4 1 1
mask 4 1 1
mutational_fitness 4 1 1
recency 4 1 1
refine 4 1 1
rename_emerging_lineages 4 1 1
sanitize_metadata 1 1 1
subsample 6 1 1
tip_frequencies 4 1 1
traits 4 1 1
translate 4 1 1
tree 4 8 8
total 123 1 8
Select jobs to execute...
[Sun May 1 12:03:50 2022]
Job 12:
Aligning sequences to defaults/reference_seq.fasta
- gaps relative to reference are considered real
python3 scripts/sanitize_sequences.py --sequences data/africa_from_september.fasta --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout 2> logs/sanitize_sequences_africa_recent.txt | nextalign --jobs=8 --reference defaults/reference_seq.fasta --genemap defaults/annotation.gff --genes ORF1a,ORF1b,S,ORF3a,E,M,ORF6,ORF7a,ORF7b,ORF8,N,ORF9b --sequences /dev/stdin --output-dir results/translations --output-basename seqs_africa_recent --output-fasta results/aligned_africa_recent.fasta --output-insertions results/insertions_africa_recent.tsv > logs/align_africa_recent.txt 2>&1;
xz -2 -T 8 results/aligned_africa_recent.fasta;
xz -2 -T 8 results/translations/seqs_africa_recent*.fasta
Submitted job 12 with external jobid 'Submitted batch job 359401'.
[Sun May 1 12:03:50 2022]
rule sanitize_metadata:
input: data/africa_from_september.tsv
output: results/sanitized_metadata_africa_recent.tsv.xz
log: logs/sanitize_metadata_africa_recent.txt
jobid: 15
benchmark: benchmarks/sanitize_metadata_africa_recent.txt
wildcards: origin=africa_recent
resources: tmpdir=/tmp, mem_mb=2000
python3 scripts/sanitize_metadata.py --metadata data/africa_from_september.tsv --metadata-id-columns strain name 'Virus name' --database-id-columns 'Accession ID' gisaid_epi_isl genbank_accession --parse-location-field Location --rename-fields 'Virus name=strain' Type=type 'Accession ID=gisaid_epi_isl' 'Collection date=date' 'Additional location information=additional_location_information' 'Sequence length=length' Host=host 'Patient age=patient_age' Gender=sex Clade=GISAID_clade 'Pango lineage=pango_lineage' pangolin_lineage=pango_lineage Lineage=pango_lineage 'Pangolin version=pangolin_version' Variant=variant 'AA Substitutions=aaSubstitutions' 'Submission date=date_submitted' 'Is reference?=is_reference' 'Is complete?=is_complete' 'Is high coverage?=is_high_coverage' 'Is low coverage?=is_low_coverage' N-Content=n_content GC-Content=gc_content --strip-prefixes hCoV-19/ SARS-CoV-2/ --output results/sanitized_metadata_africa_recent.tsv.xz 2>&1 | tee logs/sanitize_metadata_africa_recent.txt
Submitted job 15 with external jobid 'Submitted batch job 359402'.
[Sun May 1 12:04:49 2022]
Finished job 15.
1 of 123 steps (1%) done
Select jobs to execute...
[Sun May 1 12:04:49 2022]
Job 13:
Combining metadata files results/sanitized_metadata_global-open.tsv.xz results/sanitized_metadata_africa_recent.tsv.xz -> results/combined_metadata.tsv.xz and adding columns to represent origin
python3 scripts/combine_metadata.py --metadata results/sanitized_metadata_global-open.tsv.xz results/sanitized_metadata_africa_recent.tsv.xz --origins global-open africa_recent --output results/combined_metadata.tsv.xz 2>&1 | tee logs/combine_input_metadata.txt
Submitted job 13 with external jobid 'Submitted batch job 359403'.
[Sun May 1 12:05:19 2022]
Finished job 13.
2 of 123 steps (2%) done
Select jobs to execute...
[Sun May 1 12:05:19 2022]
Job 80:
Subsample all sequences by 'focal' scheme for build 'southern_region_recent' with the following parameters:
- group by: --group-by country year month
- sequences per group: --sequences-per-group 800
- subsample max sequences:
- min-date: --min-date 2021-08-01
- max-date:
-
- exclude:
- include:
- query: --query "country.isin(['Angola', 'Union of the Comoros', 'Lesotho', 'Mozambique', 'Seychelles', 'Botswana', 'Eswatini', 'Madagascar', 'Malawi', 'Mauritius', 'Namibia', 'South Africa', 'Zambia', 'Zimbabwe'])"
- priority:
augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --min-date 2021-08-01 --query "country.isin(['Angola', 'Union of the Comoros', 'Lesotho', 'Mozambique', 'Seychelles', 'Botswana', 'Eswatini', 'Madagascar', 'Malawi', 'Mauritius', 'Namibia', 'South Africa', 'Zambia', 'Zimbabwe'])" --group-by country year month --sequences-per-group 800 --probabilistic-sampling --output-strains results/southern_region_recent/sample-focal.txt 2>&1 | tee logs/subsample_southern_region_recent_focal.txt
Submitted job 80 with external jobid 'Submitted batch job 359404'.
[Sun May 1 12:05:19 2022]
Job 112:
Subsample all sequences by 'focal' scheme for build 'southern_region_only_recent' with the following parameters:
- group by: --group-by country year month
- sequences per group: --sequences-per-group 800
- subsample max sequences:
- min-date: --min-date 2021-08-01
- max-date:
-
- exclude:
- include:
- query: --query "country.isin(['Angola', 'Union of the Comoros', 'Lesotho', 'Mozambique', 'Seychelles', 'Botswana', 'Eswatini', 'Madagascar', 'Malawi', 'Mauritius', 'Namibia', 'South Africa', 'Zambia', 'Zimbabwe'])"
- priority:
augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --min-date 2021-08-01 --query "country.isin(['Angola', 'Union of the Comoros', 'Lesotho', 'Mozambique', 'Seychelles', 'Botswana', 'Eswatini', 'Madagascar', 'Malawi', 'Mauritius', 'Namibia', 'South Africa', 'Zambia', 'Zimbabwe'])" --group-by country year month --sequences-per-group 800 --probabilistic-sampling --output-strains results/southern_region_only_recent/sample-focal.txt 2>&1 | tee logs/subsample_southern_region_only_recent_focal.txt
Submitted job 112 with external jobid 'Submitted batch job 359405'.
[Sun May 1 12:05:19 2022]
Job 16:
Subsample all sequences by 'focal' scheme for build 'africa_recent' with the following parameters:
- group by: --group-by country year month
- sequences per group: --sequences-per-group 800
- subsample max sequences:
- min-date: --min-date 2021-08-01
- max-date:
-
- exclude:
- include:
- query: --query "region == 'Africa'"
- priority:
augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --min-date 2021-08-01 --query "region == 'Africa'" --group-by country year month --sequences-per-group 800 --probabilistic-sampling --output-strains results/africa_recent/sample-focal.txt 2>&1 | tee logs/subsample_africa_recent_focal.txt
Submitted job 16 with external jobid 'Submitted batch job 359406'.
[Sun May 1 12:05:20 2022]
Job 17:
Subsample all sequences by 'contextual' scheme for build 'africa_recent' with the following parameters:
- group by:
- sequences per group:
- subsample max sequences:
- min-date:
- max-date:
-
- exclude:
- include:
- query: --query "region != 'Africa'"
- priority:
augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --query "region != 'Africa'" --output-strains results/africa_recent/sample-contextual.txt 2>&1 | tee logs/subsample_africa_recent_contextual.txt
Submitted job 17 with external jobid 'Submitted batch job 359407'.
[Sun May 1 12:05:20 2022]
Job 49:
Subsample all sequences by 'focal' scheme for build 'africa_only_recent' with the following parameters:
- group by: --group-by country year month
- sequences per group: --sequences-per-group 800
- subsample max sequences:
- min-date: --min-date 2021-08-01
- max-date:
-
- exclude:
- include:
- query: --query "region == 'Africa'"
- priority:
augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --min-date 2021-08-01 --query "region == 'Africa'" --group-by country year month --sequences-per-group 800 --probabilistic-sampling --output-strains results/africa_only_recent/sample-focal.txt 2>&1 | tee logs/subsample_africa_only_recent_focal.txt
Submitted job 49 with external jobid 'Submitted batch job 359408'.
[Sun May 1 12:05:20 2022]
Job 81:
Subsample all sequences by 'contextual' scheme for build 'southern_region_recent' with the following parameters:
- group by:
- sequences per group:
- subsample max sequences:
- min-date:
- max-date:
-
- exclude:
- include:
- query: --query "region != 'Africa'"
- priority:
augur filter --metadata results/combined_metadata.tsv.xz --include defaults/include.txt --exclude defaults/exclude.txt --query "region != 'Africa'" --output-strains results/southern_region_recent/sample-contextual.txt 2>&1 | tee logs/subsample_southern_region_recent_contextual.txt
Submitted job 81 with external jobid 'Submitted batch job 359409'.
[Sun May 1 12:05:40 2022]
Finished job 17.
3 of 123 steps (2%) done
[Sun May 1 12:05:40 2022]
Finished job 81.
4 of 123 steps (3%) done
[Sun May 1 12:05:50 2022]
Finished job 80.
5 of 123 steps (4%) done
[Sun May 1 12:05:50 2022]
Finished job 49.
6 of 123 steps (5%) done
[Sun May 1 12:06:00 2022]
Finished job 16.
7 of 123 steps (6%) done
[Sun May 1 12:06:10 2022]
Finished job 112.
8 of 123 steps (7%) done
[Sun May 1 12:13:54 2022]
Finished job 12.
9 of 123 steps (7%) done
Select jobs to execute...
[Sun May 1 12:13:54 2022]
Job 11:
Combine and deduplicate aligned FASTAs from multiple origins in preparation for subsampling.
python3 scripts/sanitize_sequences.py --sequences data/global_aligned.fasta.xz results/aligned_africa_recent.fasta.xz --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout | xz -c -2 > results/combined_sequences_for_subsampling.fasta.xz
Submitted job 11 with external jobid 'Submitted batch job 359410'.
[Sun May 1 12:14:44 2022]
Finished job 11.
10 of 123 steps (8%) done
Select jobs to execute...
[Sun May 1 12:14:44 2022]
Job 48:
Combine and deduplicate FASTAs
augur filter --sequences results/combined_sequences_for_subsampling.fasta.xz --metadata results/combined_metadata.tsv.xz --exclude-all --include results/africa_only_recent/sample-focal.txt --output-sequences results/africa_only_recent/africa_only_recent_subsampled_sequences.fasta.xz --output-metadata results/africa_only_recent/africa_only_recent_subsampled_metadata.tsv.xz 2>&1 | tee logs/subsample_regions_africa_only_recent.txt
Submitted job 48 with external jobid 'Submitted batch job 359411'.
[Sun May 1 12:14:44 2022]
Job 111:
Combine and deduplicate FASTAs
augur filter --sequences results/combined_sequences_for_subsampling.fasta.xz --metadata results/combined_metadata.tsv.xz --exclude-all --include results/southern_region_only_recent/sample-focal.txt --output-sequences results/southern_region_only_recent/southern_region_only_recent_subsampled_sequences.fasta.xz --output-metadata results/southern_region_only_recent/southern_region_only_recent_subsampled_metadata.tsv.xz 2>&1 | tee logs/subsample_regions_southern_region_only_recent.txt
Submitted job 111 with external jobid 'Submitted batch job 359412'.
[Sun May 1 12:14:44 2022]
Job 79:
Combine and deduplicate FASTAs
augur filter --sequences results/combined_sequences_for_subsampling.fasta.xz --metadata results/combined_metadata.tsv.xz --exclude-all --include results/southern_region_recent/sample-focal.txt results/southern_region_recent/sample-contextual.txt --output-sequences results/southern_region_recent/southern_region_recent_subsampled_sequences.fasta.xz --output-metadata results/southern_region_recent/southern_region_recent_subsampled_metadata.tsv.xz 2>&1 | tee logs/subsample_regions_southern_region_recent.txt
Submitted job 79 with external jobid 'Submitted batch job 359413'.
[Sun May 1 12:14:45 2022]
Job 10:
Combine and deduplicate FASTAs
augur filter --sequences results/combined_sequences_for_subsampling.fasta.xz --metadata results/combined_metadata.tsv.xz --exclude-all --include results/africa_recent/sample-focal.txt results/africa_recent/sample-contextual.txt --output-sequences results/africa_recent/africa_recent_subsampled_sequences.fasta.xz --output-metadata results/africa_recent/africa_recent_subsampled_metadata.tsv.xz 2>&1 | tee logs/subsample_regions_africa_recent.txt
Submitted job 10 with external jobid 'Submitted batch job 359414'.
[Sun May 1 12:16:14 2022]
Finished job 111.
11 of 123 steps (9%) done
Select jobs to execute...
[Sun May 1 12:16:14 2022]
Finished job 79.
12 of 123 steps (10%) done
[Sun May 1 12:16:14 2022]
Job 110:
Running nextclade QC and aligning sequences to defaults/reference_seq.fasta
- gaps relative to reference are considered real
python3 scripts/sanitize_sequences.py --sequences results/southern_region_only_recent/southern_region_only_recent_subsampled_sequences.fasta.xz --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout 2> logs/sanitize_sequences_before_nextclade_southern_region_only_recent.txt | nextclade run --jobs 8 --input-fasta /dev/stdin --reference defaults/reference_seq.fasta --input-dataset data/sars-cov-2-nextclade-defaults --output-tsv results/southern_region_only_recent/nextclade_qc.tsv --output-dir results/southern_region_only_recent/translations --output-basename aligned --output-fasta results/southern_region_only_recent/aligned.fasta --output-insertions results/southern_region_only_recent/insertions.tsv 2>&1 | tee logs/align_southern_region_only_recent.txt
Submitted job 110 with external jobid 'Submitted batch job 359415'.
Select jobs to execute...
[Sun May 1 12:16:16 2022]
Job 78:
Running nextclade QC and aligning sequences to defaults/reference_seq.fasta
- gaps relative to reference are considered real
python3 scripts/sanitize_sequences.py --sequences results/southern_region_recent/southern_region_recent_subsampled_sequences.fasta.xz --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout 2> logs/sanitize_sequences_before_nextclade_southern_region_recent.txt | nextclade run --jobs 8 --input-fasta /dev/stdin --reference defaults/reference_seq.fasta --input-dataset data/sars-cov-2-nextclade-defaults --output-tsv results/southern_region_recent/nextclade_qc.tsv --output-dir results/southern_region_recent/translations --output-basename aligned --output-fasta results/southern_region_recent/aligned.fasta --output-insertions results/southern_region_recent/insertions.tsv 2>&1 | tee logs/align_southern_region_recent.txt
Submitted job 78 with external jobid 'Submitted batch job 359416'.
[Sun May 1 12:16:55 2022]
Finished job 10.
13 of 123 steps (11%) done
Select jobs to execute...
[Sun May 1 12:16:55 2022]
Job 9:
Running nextclade QC and aligning sequences to defaults/reference_seq.fasta
- gaps relative to reference are considered real
python3 scripts/sanitize_sequences.py --sequences results/africa_recent/africa_recent_subsampled_sequences.fasta.xz --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout 2> logs/sanitize_sequences_before_nextclade_africa_recent.txt | nextclade run --jobs 8 --input-fasta /dev/stdin --reference defaults/reference_seq.fasta --input-dataset data/sars-cov-2-nextclade-defaults --output-tsv results/africa_recent/nextclade_qc.tsv --output-dir results/africa_recent/translations --output-basename aligned --output-fasta results/africa_recent/aligned.fasta --output-insertions results/africa_recent/insertions.tsv 2>&1 | tee logs/align_africa_recent.txt
Submitted job 9 with external jobid 'Submitted batch job 359417'.
[Sun May 1 12:17:15 2022]
Finished job 48.
14 of 123 steps (11%) done
Select jobs to execute...
[Sun May 1 12:17:15 2022]
Job 47:
Running nextclade QC and aligning sequences to defaults/reference_seq.fasta
- gaps relative to reference are considered real
python3 scripts/sanitize_sequences.py --sequences results/africa_only_recent/africa_only_recent_subsampled_sequences.fasta.xz --strip-prefixes hCoV-19/ SARS-CoV-2/ --output /dev/stdout 2> logs/sanitize_sequences_before_nextclade_africa_only_recent.txt | nextclade run --jobs 8 --input-fasta /dev/stdin --reference defaults/reference_seq.fasta --input-dataset data/sars-cov-2-nextclade-defaults --output-tsv results/africa_only_recent/nextclade_qc.tsv --output-dir results/africa_only_recent/translations --output-basename aligned --output-fasta results/africa_only_recent/aligned.fasta --output-insertions results/africa_only_recent/insertions.tsv 2>&1 | tee logs/align_africa_only_recent.txt
Submitted job 47 with external jobid 'Submitted batch job 359418'.
[Sun May 1 13:01:59 2022]
Finished job 78.
15 of 123 steps (12%) done
Select jobs to execute...
[Sun May 1 13:01:59 2022]
rule join_metadata_and_nextclade_qc:
input: results/southern_region_recent/southern_region_recent_subsampled_metadata.tsv.xz, results/southern_region_recent/nextclade_qc.tsv
output: results/southern_region_recent/metadata_with_nextclade_qc.tsv
log: logs/join_metadata_and_nextclade_qc_southern_region_recent.txt
jobid: 83
benchmark: benchmarks/join_metadata_and_nextclade_qc_southern_region_recent.txt
wildcards: build_name=southern_region_recent
resources: tmpdir=/tmp
python3 scripts/join-metadata-and-clades.py results/southern_region_recent/southern_region_recent_subsampled_metadata.tsv.xz results/southern_region_recent/nextclade_qc.tsv -o results/southern_region_recent/metadata_with_nextclade_qc.tsv 2>&1 | tee logs/join_metadata_and_nextclade_qc_southern_region_recent.txt
Submitted job 83 with external jobid 'Submitted batch job 359420'.
[Sun May 1 13:01:59 2022]
Job 77:
Mask bases in alignment results/southern_region_recent/aligned.fasta
- masking 100 from beginning
- masking 200 from end
- masking other sites: 21987 21846
python3 scripts/mask-alignment.py --alignment results/southern_region_recent/aligned.fasta --mask-from-beginning 100 --mask-from-end 200 --mask-sites 21987 21846 --mask-terminal-gaps --output results/southern_region_recent/masked.fasta 2> logs/mask_southern_region_recent.txt
Submitted job 77 with external jobid 'Submitted batch job 359421'.
[Sun May 1 13:02:29 2022]
Error in rule join_metadata_and_nextclade_qc:
jobid: 83
output: results/southern_region_recent/metadata_with_nextclade_qc.tsv
log: logs/join_metadata_and_nextclade_qc_southern_region_recent.txt (check log file(s) for error message)
shell:
python3 scripts/join-metadata-and-clades.py results/southern_region_recent/southern_region_recent_subsampled_metadata.tsv.xz results/southern_region_recent/nextclade_qc.tsv -o results/southern_region_recent/metadata_with_nextclade_qc.tsv 2>&1 | tee logs/join_metadata_and_nextclade_qc_southern_region_recent.txt
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)
cluster_jobid: Submitted batch job 359420
Logfile logs/join_metadata_and_nextclade_qc_southern_region_recent.txt:
Traceback (most recent call last):
File "/usr/people/pvh/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas/_libs/index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Nextclade_pango'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "scripts/join-metadata-and-clades.py", line 150, in <module>
main()
File "scripts/join-metadata-and-clades.py", line 140, in main
result[col] = result[col].fillna(VALUE_MISSING_DATA)
File "/usr/people/pvh/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/people/pvh/miniconda3/envs/nextstrain/lib/python3.8/site-packages/pandas/core/indexes/base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'Nextclade_pango'
Error executing rule join_metadata_and_nextclade_qc on cluster (jobid: 83, external: Submitted batch job 359420, jobscript: /usr/people/pvh/ncov/.snakemake/tmp.8yp8bknw/join_metadata_and_nextclade_qc.83.sh). For error details see the cluster log and the log files of the involved rule(s).
[Sun May 1 13:02:54 2022]
Finished job 77.
16 of 123 steps (13%) done
[Sun May 1 13:11:26 2022]
Finished job 110.
17 of 123 steps (14%) done
[Sun May 1 13:35:17 2022]
Finished job 9.
18 of 123 steps (15%) done
[Sun May 1 13:55:29 2022]
Finished job 47.
19 of 123 steps (15%) done
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /usr/people/pvh/ncov/.snakemake/log/2022-05-01T120348.527600.snakemake.log
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment