Skip to content

Instantly share code, notes, and snippets.

@sreichl
Last active September 12, 2024 09:25
Show Gist options
  • Save sreichl/c6bda9e4193329ead05e5f2a39b22709 to your computer and use it in GitHub Desktop.
Save sreichl/c6bda9e4193329ead05e5f2a39b22709 to your computer and use it in GitHub Desktop.
Snakemake rule to ensure documentation and reproducibility by automated export of all used environments, configurations, and annotation files.
# location: config/config.yaml
##### GENERAL #####
project_name: myProject
result_path: /path/to/results
annotation: /path/to/annotation.csv

Configuration and annotation files for documentation and to ensure reproducibility.

# location: workflow/rules/export.smk
# export and add all used conda environment specifications (including exact versions and builds) to report
rule env_export:
output:
report(os.path.join(result_path,'envs','{env}.yaml'),
caption="../report/software.rst",
category="Software",
subcategory="{}_{}".format(config["project_name"], module_name),
labels={
"name": config["project_name"],
"module": module_name,
"env": "{env}",
}
),
conda:
"../envs/{env}.yaml"
resources:
mem_mb=config.get("mem", "1000"),
threads: config.get("threads", 1)
log:
os.path.join("logs","rules","env_{env}.log"),
shell:
"""
conda env export > {output}
"""
# export and add configuration file to report
rule config_export:
output:
configs = report(os.path.join(result_path,'configs','{}_config.yaml'.format(config["project_name"])),
caption="../report/configs.rst",
category="Configuration",
subcategory="{}_{}".format(config["project_name"], module_name),
labels={
"name": config["project_name"],
"module": module_name,
"type": "config"
}
)
resources:
mem_mb=config.get("mem", "1000"),
threads: config.get("threads", 1)
log:
os.path.join("logs","rules","config_export.log"),
run:
with open(output["configs"], 'w') as outfile:
yaml.dump(config, outfile, sort_keys=False, width=1000, indent=2)
# export and add used annotation file(s) to report
rule annot_export:
input:
config["annotation"],
output:
annot = report(os.path.join(result_path,'configs','{}_annot.csv'.format(config["project_name"])),
caption="../report/configs.rst",
category="Configuration",
subcategory="{}_{}".format(config["project_name"], module_name),
labels={
"name": config["project_name"],
"module": module_name,
"type": "annotation",
}
)
resources:
mem_mb=1000, #config.get("mem", "16000"),
threads: config.get("threads", 1)
log:
os.path.join("logs","rules","annot_export.log"),
shell:
"""
cp {input} {output}
"""
# location: workflow/Snakefile
# required entries in the Snakefile
##### module name #####
module_name = "myModule"
##### setup report #####
report: "report/workflow.rst"
##### load config #####
configfile: os.path.join("config", "config.yaml")
##### set global variables
result_path = os.path.join(config["result_path"],module_name)
# list of names of the used environment specifications in workflow/envs/{env_name}.yaml
envs = ["sklearn","ggplot"]
# target rule needs to require export
rule all:
input:
# export environments and configurations
envs = expand(os.path.join(result_path,'envs','{env}.yaml'),env=envs),
configs = os.path.join(result_path,'configs','{}_config.yaml'.format(config["project_name"])),
annotations = os.path.join(result_path,'configs','{}_annot.csv'.format(config["project_name"])),
threads: config.get("threads", 1)
resources:
mem_mb=config.get("mem", "16000"),
log:
os.path.join("logs","rules","all.log")
##### load rules #####
include: os.path.join("rules", "export.smk")

Exported conda environment specifications to document the installed software, including versions and build.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment