Skip to content

Instantly share code, notes, and snippets.

@Phlya
Last active March 28, 2022 08:49
Show Gist options
  • Save Phlya/5aeb55ef3d1ecb8b025cad125eb37ce0 to your computer and use it in GitHub Desktop.
Save Phlya/5aeb55ef3d1ecb8b025cad125eb37ce0 to your computer and use it in GitHub Desktop.
Pairtools dedup benchmarking
walkpolicies = ["all", "mask"]
backends = ["cython", "scipy", "sklearn"]
cores = [1, 2, 4, 8]
chunksizes = [1000, 10_000, 100_000, 1_000_000, 10_000_000]
mismatches = [0, 3]
carryovers = [10, 100, 1000]
extensions = ["nodups.pairs", "dups.pairs", "unmapped.pairs", "stats"]
cython = expand(
"output/test_pairs.{walkpolicy}.cython.0.1.{mismatch}.0.{ext}",
walkpolicy=walkpolicies,
mismatch=mismatches,
ext=extensions,
)
sklearn = expand(
"output/test_pairs.{walkpolicy}.sklearn.{chunksize}.{cores}.{mismatch}.{carryover}.{ext}",
walkpolicy=walkpolicies,
chunksize=chunksizes,
cores=cores,
mismatch=mismatches,
carryover=carryovers,
ext=extensions,
)
scipy = expand(
"output/test_pairs.{walkpolicy}.scipy.{chunksize}.1.{mismatch}.{carryover}.{ext}",
walkpolicy=walkpolicies,
chunksize=chunksizes,
mismatch=mismatches,
carryover=carryovers,
ext=extensions,
)
rule all:
input:
lambda wildcards: cython + sklearn + scipy,
rule dedup:
input:
pairsfile="test_pairs.wp-{walkpolicy}.pairs",
threads: lambda wildcards: int(wildcards.cores)
output:
nodups="output/test_pairs.{walkpolicy}.{backend}.{chunksize}.{cores}.{mismatch}.{carryover}.nodups.pairs",
dups="output/test_pairs.{walkpolicy}.{backend}.{chunksize}.{cores}.{mismatch}.{carryover}.dups.pairs",
stats="output/test_pairs.{walkpolicy}.{backend}.{chunksize}.{cores}.{mismatch}.{carryover}.stats",
unmapped="output/test_pairs.{walkpolicy}.{backend}.{chunksize}.{cores}.{mismatch}.{carryover}.unmapped.pairs",
benchmark:
repeat(
"benchmarks/test_pairs.{walkpolicy}.{backend}.{chunksize}.{cores}.{mismatch}.{carryover}.dedup.benchmark",
3,
)
shell:
"pairtools dedup {input.pairsfile} --backend {wildcards.backend} "
"--chunksize {wildcards.chunksize} --max-mismatch {wildcards.mismatch} "
"--carryover {wildcards.carryover} "
"--mark-dups --keep-parent-id "
"-p {wildcards.cores} "
"-o {output.nodups} --output-stats {output.stats} --output-dups {output.dups} "
"--output-unmapped {output.unmapped} "
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment