Skip to content

Instantly share code, notes, and snippets.

@ctb
Last active March 20, 2017 16:59
Show Gist options
  • Save ctb/e7d910326792554e1fbf826fe12da83f to your computer and use it in GitHub Desktop.
Save ctb/e7d910326792554e1fbf826fe12da83f to your computer and use it in GitHub Desktop.

Downsample signature --scaled values

Usage:

./sourmash compute --scaled 5000 data/GCF_000005845.2_ASM584v2_genomic.fna.gz -f

python e7d910326792554e1fbf826fe12da83f/subscaled.py GCF_000005845.2_ASM584v2_genomic.fna.gz.sig foo.sig 10000

and now foo.sig will contain the newly subscaled signature.

#! /usr/bin/env python3
import sys
import screed
import sourmash_lib.signature
import copy
in_fp = open(sys.argv[1])
sigs = list(sourmash_lib.signature.load_signatures(in_fp))
new_scaled = float(sys.argv[3])
max_hash = 2**64 / float(new_scaled)
# for each signature, extract state, change max_hash, and reinstantiate.
for sig in sigs:
print('processing {}'.format(sig.name()))
state = sig.estimator.__getstate__()
state = list(state)
assert max_hash < state[6], \
"'scaled' can only increase from original sigs."
state[6] = max_hash
sig.estimator.__setstate__(state)
# save!
out_fp = open(sys.argv[2], 'wt')
print(sourmash_lib.signature.save_signatures(sigs), file=out_fp)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment