Skip to content

Instantly share code, notes, and snippets.

@andreas-wilm
Created July 4, 2018 13:57
Show Gist options
  • Save andreas-wilm/0a7502d974a1e696e54f2d5a1486e18c to your computer and use it in GitHub Desktop.
Save andreas-wilm/0a7502d974a1e696e54f2d5a1486e18c to your computer and use it in GitHub Desktop.
Nextflow: Proof of concept example for repeated function call with expected static output
#!/usr/bin/env nextflow
/* this is a minimal working example for a bug we recently ran into.
* in a nutshell: we can function (gen_sample_map_str) in every call to
* process GenomicsDB. * the function has static input and should
* produce identical output each time, but it doesn't. the question is
* why.
*
* yes, we can do tihs different, i.e. construct the string once and yes
* some things don't make sense in this current minimalistic form.
* but the code should in theory nevertheless work, yet doesn't.
*/
params.publishdir = 'out'
// a sample map akin to what's used in GATK's GenomicsDB
params.sample_name_map = [
'sample1': 's1.g.vcf.gz',
'sample2': 's2.g.vcf.gz',
'sample3': 's3.g.vcf.gz',
'sample4': 's4.g.vcf.gz',
'sample5': 's5.g.vcf.gz',
'sample6': 's6.g.vcf.gz',
'sample7': 's7.g.vcf.gz',
'sample8': 's8.g.vcf.gz',
'sample9': 's9.g.vcf.gz',
'sample10': 's10.g.vcf.gz'
]
// fake regions
region_ch = Channel.from([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
// constructing a sample map string from params.sample_name_map.the
// question is, why do repeated calls give different results?
def gen_sample_map_str() {
str = ""
params.sample_name_map.each{ k, v ->
str += "${k}\t${v}\n"
}
return str;
}
// just prints output of gen_sample_map_str() to a file. in theory all
// files should have at least identical number of lines, but they
// don't (see output generated in Validate)
//
process GenomicsDBImport {
publishDir params.publishdir
input:
val reg from region_ch
output:
file("${reg}.txt") into final_ch
script:
sample_name_map_str = gen_sample_map_str()
"""
echo "${sample_name_map_str}" > ${reg}.txt;
"""
}
process Validate {
input:
file(all) from final_ch.collect()
output:
stdout wc_ch
script:
"""
wc -l ${all}
"""
}
wc_ch.subscribe { print "Line numbers should be identical but are not (forget about order):\n $it" }
@andreas-wilm
Copy link
Author

Adding -process.cpus=1 or -qs 1 as arguments doesn't fix the behaviour

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment