In the spring of 2024, while brute-forcing mnemonics, I took a look at improving the performance of embit.bip39.mnemonic_to_bytes(), which is:
- well peer-reviewed
- tested... arguably enough
- stable and in-use by a number of projects
I ended up re-implementing this function using a big-integer accumulator. I call the branch "bip39_via_accumulator".
But this function was originally copied from Jimmy Song, a well respected developer with a real name (NOT anon like me).
My implementation had 3 primary changes for performance (I'll argue that it's also easier to read and understand):
- negligeable improvement (2%-8%) with fewer branches and bytes conversions,
- noticeable improvement (~2x-3x) using
try/except
instead ofif word not in wordlist:
, - noticeable (and memory expensive) improvement (2x-25x) using a word-->index dictionary.
Because embit is geared towards resource-limited micro-controllers, I have since removed the last enhancement, which accounted for the greatest performance boost on a few of the devices I tested.
This leaves the above branch as a 2x-3x performance improvement and a SCANDALOUS/HERETICAL total-rewrite of a perfectly functioning and highly sensitive library used by a few projects to protect only-God-knows how much family treasure.
With due respect for this existing stable library function (for my peers, and for fellow bitcoiners), I have re-imagined what I believe is a less-controversial branch that I'm calling "mnemonic_to_bytes_speedup". I plan to submit a pull-request for this, and that's why I've invited you here.
It has a single code commit aimed at the ~2x-3x speedup via a try/except
block, with no functional changes.
It also has added tests to illustrate how a 3rd party app might choose to implement the word-->index dictionary speedup via the wordlist
parameters of mnemonic_is_valid()
and mnemonic_to_bytes()
. This may very well better-belong in embit's "examples", since it is NOT truly implemented within embit, as well as the test-suites of 3rd party apps that use this memory-hungry trick.
I've chosen to add it to embit's tests because:
- it works really well for speeding up mnemonic_to_bytes() when memory is available and when speed is wanted,
- I'd want future developers to be aware of how
wordlist
is being used in the event they re-implement and break it, - I'm a believer that unit-tests are a great place to "document" intention (but embit devs might argue strongly that THIS WAS NEVER INTENDED.)
Along the way, I bumped into 2 edge cases that seemed "incorrect" to me, related to input-validation of mnemonic
length as well as the same for entropy
, and I've submitted embit pull-request #63 for that on August 15th 2024.
I'm using the following to test different devices: