Skip to content

Instantly share code, notes, and snippets.

@sipa
Last active December 23, 2021 16:54
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sipa/8c04d331a98dd1a7b92861170f398cb9 to your computer and use it in GitHub Desktop.
Save sipa/8c04d331a98dd1a7b92861170f398cb9 to your computer and use it in GitHub Desktop.
Bech32 checksum design

Background

Characters as field elements

The characters of Bech32 strings are interpreted as elements of F = GF(25), a finite field of size 32. Elements of a field have associated addition and multiplication operations defined for them, which satisfy the normal properties those operations have.

Power of α Previous polynomial * α mod α5 + α3 + 1 Bit pattern Decimal Bech32 character
- - 0 000002 {0} q
α0 - 1 000012 {1} p
α1 α α 000102 {2} z
α2 α2 α2 001002 {4} y
α3 α3 α3 010002 {8} g
α4 α4 α4 100002 {16} s
α5 α5 α3 + 1 010012 {9} f
α6 α4 + α α4 + α 100102 {18} j
α7 α5 + α2 α3 + α2 + 1 011012 {13} d
α8 α4 + α3 + α α4 + α3 + α 110102 {26} 6
α9 α5 + α4 + α2 α4 + α3 + α2 + 1 111012 {29} a
α10 α5 + α4 + α3 + α α4 + α + 1 100112 {19} n
α11 α5 + α2 + α α3 + α2 + α + 1 011112 {15} 0
α12 α4 + α3 + α2 + α α4 + α3 + α2 + α 111102 {30} 7
α13 α5 + α4 + α3 + α2 α4 + α2 + 1 101012 {21} 4
α14 α5 + α3 + α α + 1 000112 {3} r
α15 α2 + α α2 + α 001102 {6} x
α16 α3 + α2 α3 + α2 011002 {12} v
α17 α4 + α3 α4 + α3 110002 {24} c
α18 α5 + α4 α4 + α3 + 1 110012 {25} e
α19 α5 + α4 + α α4 + α3 + α + 1 110112 {27} m
α20 α5 + α4 + α2 + α α4 + α3 + α2 + α + 1 111112 {31} l
α21 α5 + α4 + α3 + α2 + α α4 + α2 + α + 1 101112 {23} h
α22 α5 + α3 + α2 + α α2 + α + 1 001112 {7} 8
α23 α3 + α2 + α α3 + α2 + α 011102 {14} w
α24 α4 + α3 + α2 α4 + α3 + α2 111002 {28} u
α25 α5 + α4 + α3 α4 + 1 100012 {17} 3
α26 α5 + α α3 + α + 1 010112 {11} t
α27 α4 + α2 + α α4 + α2 + α 101102 {22} k
α28 α5 + α3 + α2 α2 + 1 001012 {5} 9
α29 α3 + α α3 + α 010102 {10} 2
α30 α4 + α2 α4 + α2 101002 {20} 5
α31 = α0 α5 + α3 1 000012 {1} p

we'll denote as numbers between {} symbols. For example, n is interpreted as {19}. To add two elements, XOR their bits together. For example, m + d = {27} + {13} = {110112} + {011012} = {101102} = {22} = k. To multiply two elements, treat their bits as coefficients of polynomials over GF(2), multiply those polynomials, and find the remainder modulo a5 + a3 + 1 (or {41}). The variable a is used to avoid confusion with the polynomials in future sections (which are unrelated). For example, 9 * 6 = {5} * {26} = (a2 + 1) * (a4 + a3 + a) = (a4 + a3 + a) * a2 + (a4 + a3 + a) = a6 + a5 + a4 + a = a3 + 1 mod a5 + a3 + 1 = {9} = f.

Strings as polynomial coefficients

Strings are seen as coefficients of polynomials over F, so the coefficients themselves are field elements from the previous section. For example, the string phrase is interpreted as the polynomial x5 + {23}x4 + {3}x3 + {29}x2 + {19}x + {25}.

Valid strings

Valid Bech32 values (after expansion of the human-readable part, and conversion of the data part) are strings which, when prefixed with {1}, have a corresponding polynomial that equals {1} modulo the generator polynomial g(x) = x6 + {29}x5 + {22}x4 + {20}x3 + {21}x2 + {29}x + {18}. The bech32_polymod function, above, in fact computes this modulus, with the six resulting coefficients packed into a 30-bit integer. The prefix {1} is there to make sure prefixing zeroes invalidates the string. The modulus is required to be {1} is to make sure that suffixing zeroes invalidates the string.

BCH parameters

We define the extension field E as polynomials over F modulo x2 + x + {3}. It has an element alpha = {9}x + {15} with multiplicative order 1023, whose 3 consecutive powers alpha997, alpha998, alpha999 have minimal polynomials in F: x2 + {30}x + {30}, x2 + {27}x + {4}, and x2 + {24}x + {14}. The lcm of these minimal polynomials is the generator g(x). This usage of a generator that is the product of minimal polynomials for subsequent powers of an element in an extension field makes it a BCH code.

Selected properties

BCH codes constructed using the mechanism given above guarantee that any two different valid strings up to length 1023 differ in at least 4 characters (it has a Hamming distance (HD) of 4), or in other words, no 3 errors will ever go undetected. However, some codes result in better properties for shorter lengths (or equivalently, for codes whose errors occur within a restricted window), but randomly chosen codes tend to behave very badly just past their design bound. The specific code chosen here is the result of:

  • Starting with a list of all 159605 BCH codes designed to have HD4 or HD5 for length 93, 151, 165, 341, 1023 or 1057.
  • From those, requiring the detection of 4 errors up to length 71, resulting in 28825 remaining codes.
  • From those, choosing the codes with the best worst-case window for 5-character errors, resulting in 310 codes with identical missed detection rate for any number of character errors and window lengths.
  • From those, picking the code with the lowest chance for not detecting small numbers of bit errors.
These analyses were performed using a meet-in-the-middle approach to make the exhaustive searches tractable. The code can be found here.

The following table summarizes the chances for detection failure (as multiples of 1 in 109).

Window length Number of wrong characters
Length Description ≤4 5 6 7 8 ≥9
8 Longest with HD 7 0 1.127 0.909 n/a
18 Longest with HD 6 0 0.965 0.929 0.932 0.931
19 Worst case for 6 errors 0 0.093 0.972 0.928 0.931
39 Length for a P2WPKH address 0 0.756 0.935 0.932 0.931
59 Length for a P2WSH address 0 0.805 0.933 0.931
71 Length for a 40-byte program address 0 0.830 0.934 0.931
89 Longest with HD 5 0 0.867 0.933 0.931
This means that when 6 changed characters occur randomly distributed in the 39 characters of a P2WPKH address, there is a chance of 0.935 per billion that it will go undetected. When those 6 changes occur randomly within a 19-character window, that chance goes up to 0.972 per billion. As the number of errors goes up, the chance converges towards 1 in 230 = 0.931 per billion.

Note on larger lengths

Even though the BCH code performs reasonably well up to 1023 characters, it is not intended to be used beyond 89 characters (excluding the separator). If there is a need for such long addresses, a longer BCH code should be used (11 character checksums exist that guarantee detection up to 6 errors in a window of 1023 characters).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment