jdlcdl/mnemonic_to_bytes_speedup.md

## mnemonic_to_bytes_speedup.md

      
    Raw
  

              mnemonic_to_bytes_speedup.md
            
          
    diybitcoinhardware/embit.bip39.mnemonic_to_bytes() speedup

Back Story (a branch I do not intend to PR)

In the spring of 2024, while brute-forcing mnemonics, I took a look at improving the performance of embit.bip39.mnemonic_to_bytes(), which is:

well peer-reviewed
tested... arguably enough
stable and in-use by a number of projects

I ended up re-implementing this function using a big-integer accumulator.  I call the branch "bip39_via_accumulator".
But this function was originally copied from Jimmy Song, a well respected developer with a real name (NOT anon like me).
My implementation had 3 primary changes for performance (I'll argue that it's also easier to read and understand):

negligeable improvement (2%-8%) with fewer branches and bytes conversions,
noticeable improvement (~2x-3x) using try/except instead of if word not in wordlist:,
noticeable (and memory expensive) improvement (2x-25x) using a word-->index dictionary.

Because embit is geared towards resource-limited micro-controllers, I have since removed the last enhancement, which accounted for the greatest performance boost on a few of the devices I tested.
This leaves the above branch as a 2x-3x performance improvement and a SCANDALOUS/HERETICAL total-rewrite of a perfectly functioning and highly sensitive library used by a few projects to protect only-God-knows how much family treasure.

New Plan (a branch I intend to PR)

With due respect for this existing stable library function (for my peers, and for fellow bitcoiners), I have re-imagined what I believe is a less-controversial branch that I'm calling "mnemonic_to_bytes_speedup".  I plan to submit a pull-request for this, and that's why I've invited you here.
It has a single code commit aimed at the ~2x-3x speedup via a try/except block, with no functional changes.
It also has added tests to illustrate how a 3rd party app might choose to implement the word-->index dictionary speedup via the wordlist parameters of mnemonic_is_valid() and mnemonic_to_bytes().  This may very well better-belong in embit's "examples", since it is NOT truly implemented within embit, as well as the test-suites of 3rd party apps that use this memory-hungry trick.
I've chosen to add it to embit's tests because:

it works really well for speeding up mnemonic_to_bytes() when memory is available and when speed is wanted,
I'd want future developers to be aware of how wordlist is being used in the event they re-implement and break it,
I'm a believer that unit-tests are a great place to "document" intention (but embit devs might argue strongly that THIS WAS NEVER INTENDED.)


Past News (I've already issued this PR)

Along the way, I bumped into 2 edge cases that seemed "incorrect" to me, related to input-validation of mnemonic length as well as the same for entropy, and I've submitted embit pull-request #63 for that on August 15th 2024.