Analysis of insertion in Bech32 strings
In this document I analyze the error detecting properties of Bech32, including insertion/erasure, and see how things could be improved.
The characters in Bech32 strings are interpreted as coefficients are polynomials over GF(32). The numbers in GF(32) will be represented here as integers in the range 0 to 31, inclusive. For details on the arithmetic on these numbers (which differs significantly from normal arithmetic), see this document.
Specifically, the polynomial corresponding to a Bech32 string can be found by first creating a list of coefficients as follows:
- Start with a 1.
- For every character c in the HRP, add asciival(c) >> 5 to the list.
- Add a 0.
- For every character c in the HRP, add asciival(c) & 31 to the list.
- For every character in the data part, add its conversion (according to the Bech32 character set) to the list.
Then use that list as the coefficients of the polynomial, from high degree to low degree. For example "a12uel5l" would expand to [1,3,0,1,10,28,25,31,20,31], corresponding to the polynomial x9 + 3x8 + x6 + 10x5 + 28x4 + 25x3 + 31x2 + 20x + 31.
A valid Bech32 string is one where the polynomial p(x) corresponding to the string satisfies the equation p(x) mod g(x) = 1, where g(x) is the Bech32 generator, equal to x6 + 29x5 + 22x4 + 20x3 + 21x2 + 29x + 18. This is accomplished by choosing the last six characters of the data part as a checksum in such a way that this equation is true.
In what follows there will be manipulations of polynomials modulo g(x). g(x) is not primitive (it can be written as the product of 3 quadratic polynomials), and thus operations modulo g(x) do not constitute a field. In particular, not every element has an inverse, but many do.
Detection of substitution errors
Given a valid polynomial p(x) and an error polynomial e(x) (whose coefficients represent the per-character difference; for small errors only a few coefficients of e will be nonzero), we wonder when p(x) + e(x) is valid as well. We know that p(x) mod g(x) = 1, and want to know when (p(x) + e(x)) mod g(x) = 1. By subtracting the first equation from the second, we get e(x) mod g(x) = 0. In other words: errors are not detected when their error polynomial is a multiple of g(x).
Through the BCH construction we know that any polynomial of degree below 1023 with at most 3 non-zero coefficients will not be a multiple of g(x). This follows from how g(x) was constructed as the least common multiple of minimal polynomials of 3 successive powers of an order-1023 element in an extension field. Through exhaustive analysis, we also know that any polynomial with degree below 89 with at most 4 non-zero coefficients will also not be a multiple of g(x). This guarantees that any 4 errors within a window of 89 characters will always be detected.
Detection of swaps of adjacent characters
To determine how Bech32 performs w.r.t. swaps of adjacent characters, let's again look at the effect on the difference in polynomials.
If we have a valid string "a b c d e f g h" (where each letter is a variable representing one character), and swap the "e" and "f" ones, the per character differences with "a b c d f e g h" will be "0 0 0 0 (e-f) (f-e) 0 0". In GF(32), negation is the same as addition, so this is identical to "0 0 0 0 (e+f) (e+f) 0 0" and we observe that such a swap is in fact a substition error of the same value to two adjacent positions. This is generally the case for all adjacent swaps.
Translating the above to polynomials, this means that the error polynomial for a (number of) swaps is in fact the error polynomial for just the changes to the right-hand side character of the swap multiplied by (x + 1). If that right-hand-only polynomial has at most 4 non-zero terms (corresponding to 4 swaps), we know from the previous paragraph that it cannot be a multiple of g(x). Since (x + 1) is not a divisor of g(x), multiplying it with cannot change its modulus from nonzero to zero. As a result, any error that consists purely of up to 4 adjacent character swaps will also be detected.
Using exhaustive analysis, it can also be shown that one swap and two subsititions or two swaps and one substitution will also always be detected. I have not found an algebraic argument for this.
Detection of insertion errors
In this section we will analyze single-insertion errors. Specifically, given lengths LBegin (≥ 0), LEnd (≥ 0), and LInsert (≥ 1), then what is the probability over all strings Begin, End, and Insert of those lengths for which Begin || End is valid, that Begin || Insert || End is also valid? In other words, when is inserting an LInsert-character string in position LBegin from the beginning and LEnd from the end, not detected?
Note that since deletion is the reverse of insertion, it does not need to be analyzed separately: if inserting a string is not detected, then deleting the same string from the result is also not detected.
- Let begin(x), end(x), and insert(x) be polynomials corresponding to the strings Begin, End, and Insert.
- The old, valid polynomial corresponding to Begin || End is then old(x) = xLEnd begin(x) + end(x)
- The new polynomial corresponding to Begin || Insert || End is new(x) = xLEnd + LInsert begin(x) + x LEnd insert(x) + end(x).
- We know old(x) - 1 = 0 mod g(x).
- We want to know when new(x) - 1 = 0 mod g(x).
Given the knowledge that old(x) - 1 = 0 mod g(x), we can either:
- Eliminate begin(x):
- new(x) - 1 = 0 mod g(x)
- ⬄ (new(x) - 1) - xLInsert (old(x) - 1) = 0 mod g(x).
- ⬄ (xLEnd + LInsert begin(x) + x LEnd insert(x) + end(x) - 1) - xLInsert (xLEnd begin(x) + end(x) - 1) = 0 mod g(x).
- ⬄ x LEnd insert(x) + (1 - xLInsert) (end(x) - 1) = 0 mod g(x).
- ⬄ end(x) = 1 + x LEnd insert(x) / (xLInsert - 1) mod g(x).
- Eliminate end(x):
- new(x) - 1 = 0 mod g(x)
- ⬄ (new(x) - 1) - (old(x) - 1) = 0 mod g(x).
- ⬄ xLEnd + LInsert begin(x) + x LEnd insert(x) + end(x) - xLEnd begin(x) - end(x) mod g(x).
- ⬄ xLInsert begin(x) + insert(x) - begin(x) mod g(x).
- ⬄ begin(x) = insert(x) / (1 - xLInsert) mod g(x).
In both results, the left and right side of the equation have independent distributions (because we pick insert(x) independently from old(x)). That means that if end(x) mod g(x), begin(x) mod g(x), or insert(x) mod g(x) is uniformly distributed over all 230 possibilities, the equation will only hold with probability 2-30</sup, which is acceptably low.
So when is this the case? Unfortunately, begin(x) and end(x) are not independently distributed. For example, when LBegin = 2, we know that begin(x) only takes 1024 possible values. As a result, because old(x) is a valid polynomial that incorporates both begin(x) and end(x), end(x) mod g(x) also only takes 1024 possible values.
When both LBegin and LEnd are at least 6 however, this is not an issue - both begin(x) mod g(x) and end(x) mod g(x) can then take on all values, and we're good.
When LInsert is at least 6, the right hand side is uniformly distributed and we're good as well.
When LEnd and LInsert are both less than 6, we can use the above equation end(x) = 1 + x LEnd insert(x) / (xLInsert - 1) mod g(x) to analyze the detection abilities. Simply iterate over all possible values of insert(x) (32LInsert possibilities, which is at most about 33 million), and see when the result matches a possible end(x) for the given LEnd value (e.g. when LEnd is 4, end(x) will have degree at most 4-1). The results are in the table below:
The crosses in this table exactly correspond to the known issue in Bech32. For example, LInsert=2 LEnd=3 corresponds to inserting 2 characters before the 3rd last character. If the last 3 characters are "qqp", then it is indeed possible to insert "qq" before them according to that issue. Over all 2 character inserts and all 3 character suffices, requiring them to be exactly "qqp" and "qq" fixes 5 characters, so has a probability of 32-5 = 2-25.
When LBegin and LInsert are both less than 6, we may worry about similar concerns. However, since begin(x) always contains the expansion of the prefix (and in BIP173 addresses, the witness version), we can't simply assume it is going to be uniform to begin with. This is an advantage and a disadvantage: the advantage is that even with LBegin very low (close to the length of the HRP's expansion), begin(x) won't be a low-degree polynomial, so on average, the same issue does not occur. The disadvantage is that to make sure, we need to run the analysis on begin(x) = insert(x) / (1 - xLInsert) mod g(x) specifically for every HRP/prefix we're interested in. When running the analysis for the "bc1" prefix used for BIP173, the behavior is as expected.
Improving detection of insertion errors
To see how we can do better, let's change the Bech32 equation from p(x) = 1 mod g(x) to the more general p(x) = m(x) mod g(x). The low-LEnd equation above then becomes end(x) = m(x) + x LEnd insert(x) / (xLInsert - 1) mod g(x). The low-LBegin equation is unaffected.
If we pick m(x) = 31x5 + 31x4 + 31x3 + 31x2 + 31x + 31, and run the low-LEnd analysis, we get the much more satisfying table below:
This would address all known weaknesses in the scheme, without worsening any known detection qualities. To see why, note that the substituation/swap detection property only depends on the choice of the generator g(x), and that the detection abilities for low-LBegin values are independent from the choice of m(x) (they depend on the HRP expansion instead)*.
It is also only a small code change. The BIP173 reference could would need to be modified from
def bech32_verify_checksum(hrp, data): return bech32_polymod(bech32_hrp_expand(hrp) + data) == 1 def bech32_create_checksum(hrp, data): values = bech32_hrp_expand(hrp) + data polymod = bech32_polymod(values + [0,0,0,0,0,0]) ^ 1 return [(polymod >> 5 * (5 - i)) & 31 for i in range(6)]
M = 0x3FFFFFFF def bech32_verify_checksum(hrp, data): return bech32_polymod(bech32_hrp_expand(hrp) + data) == M def bech32_create_checksum(hrp, data): values = bech32_hrp_expand(hrp) + data polymod = bech32_polymod(values + [0,0,0,0,0,0]) ^ M return [(polymod >> 5 * (5 - i)) & 31 for i in range(6)]
Using various types of analysis, it is possible to determine that Bech32 behaves well in the presence of substitution and character swapping errors. Furthermore, among a wide class of single-insertion or single-deletion errors (of potentially multiple characters), the possibility of inserting/deleting "q" characters right before a final "p" is the only unexpected deviation from its intended detection abilities (not more than 1 in a billion failure rate for other errors than up to 4 substitutions).
Finally, it is possible to modify a constant in Bech32 to obtain a variant that fixes this issue, without worsening any of the other qualities that were analyzed.