This document describes the search for a checksum algorithm to be used with Monero's new addressing scheme Jamtis. The purpose of the checksum is to detect accidental corruption of the address string.
Since Jamtis encodes addresses in base32, it allows the use of cyclic codes, which can provide guaranteed error detection, unlike hash-based checksums.
Monero currently uses a 32-bit hash-based checksum for its addresses. Hash-based checksums have a flat false positive rate regardless of the number of errors. Bitcoin's bech32 address format uses a degree-6 cyclic code optimized for short addresses of up to 89 characters. The cashaddr address format of Bitcoin Cash uses a degree-8 cyclic code that can detect 5 errors for lengths of up to 130 characters.
The following table shows the performance of the various checksums when applied to a 196-character Jamtis address. The column "detection" refers to the number of errors that can always be detected. "Max5" and "Max6" are the maximum spans when 5 and 6 errors can always be detected, respectively. "Rate5" and "Rate6" are the worst false positive rates for lengths of up to 196 characters (multiplied by 2^40
).
Checksum | size | used by | detection | Max5 | Max6 | Rate5 *2^40 |
Rate6 *2^40 |
---|---|---|---|---|---|---|---|
code(J3PBP3J1) | 40 bits | BCH (cashaddr) | 4 | 130 | 33 | 0.692608 | 1.001845 |
40-bit hash | 40 bits | - | 0 | - | - | 1.000000 | 1.000000 |
32-bit hash | 32 bits | XMR | 0 | - | - | 256.000000 | 256.000000 |
code(TMKLTI) | 30 bits | BTC (bech32) | 3 | 18 | 8 | 1007.952788 | 1068.701701 |
This shows that a degree-8 cyclic code can offer the best performance even for relatively long addresses.
We will search for a degree-8 cyclic code that has the best performance at 196 characters.
Since there are over 1 trillion (2^40) possible degree-8 codes over GF(2^5), we will only search a random subset of this space and select the best perfoming code.
The gen_crc.py file (attached to this document) is a python script that will deterministically generate random 10 million degree-8 codes (encoded in base32hex). It avoids polynomials with a zero constant term since those perform like degree-7 codes.
"Crccollide" is a tool written by Bitcoin developers to test the performance of cyclic codes. The source code is available in a forked repository.
The tool accepts a generator polynomial (encoded in base32hex) and will examine its ability to detect a small number of errors up to a specified length.
The crc_res.py file (attached to this document) will read the outputs of crccollide in a specified directory and print the best performing codes based on the following 4 indicators:
- The maximum length when 4 errors can always be detected. Degree-8 codes can typically always detect 4 errors up to the reference length of 196, so this will filter out codes that don't reach this baseline performance.
- The combined maximum error rates for 5 and 6 errors up to the reference length. Degree-8 codes have a 6-error rates typically around 1.0x2^-40 at 196 characters, so this metric allows codes with a slightly worse ability to detect 6 errors if they have an exceptional ability to detect 5 errors.
- The average 5-error rate at the reference length of 196 characters.
- The average 6-error rate at the reference length of 196 characters.
pypy3 gen_crc.py | sort -u > gen_list1.txt
git clone https://github.com/tevador/ezbase32
g++ ezbase32/crccollide.cpp -o crccollide -lpthread -O3
mkdir results1
parallel -a gen_list1.txt ./crccollide {} 5 120 ">" results1/{}.txt
find results1 -name "*.txt" -type f -size -2k -delete
pypy3 crc_res.py results1 196 64 > results1.txt
The first command will generate 9999947 unique generators, which can be checked with wc -l gen_list1.txt
.
The fifth command will test each generator with the crccollide tool. To speed things up, crccollide is configured to exit early if the generator cannot detect 5 errors at 120 characters. Note that this step may run for days or even weeks (took over 4 days using AMD Threadripper 3970X with 64 threads in parallel).
The sixth command will clean up the files of the generators that cannot detect 5 errors at 120 characters (these files will have a size of less than 2 KB). The results directory should contain 17122 files after this step, which can be checked with ls results1 | wc -l
.
The seventh command will print the top 64 generators. The resulting file results1.txt is attached to this document.
The top two codes (U1PIRGA7
and AJ4RJKVB
) significantly outperform the remaining ones since they can detect 5 errors at the reference length of 196 characters.
The other 62 codes have 5-error false positive rates roughly 3x lower than a 40-bit hash.
The main search only tested the performance of the generators up to the length of 210 characters. Since the checksum might need to be used for longer strings in the future (such as certified addresses or invoices), we tested the performance of the top generators for lengths of up to 500 characters:
grep -oP "^[0-9A-V]{8}" results1.txt | sort -u > gen_list2.txt
g++ ezbase32/crccollide.cpp -o crccollide_500 -lpthread -O3 -DLENGTH=500
mkdir results2
parallel -a gen_list2.txt ./crccollide_500 {} ">" results2/{}.txt
pypy3 crc_res.py results2 196 16 > results2.txt
(The fourth step took about 5 hours using AMD Threadripper 3970X with 64 threads in parallel).
The file results2.txt, attached to this document, lists the best performing 16 generators. This refining process eliminated the generators that cannot always detect 4 error at 500 characters. Notably, the codes U1PIRGA7
and AJ4RJKVB
can still detect 5 errors at this length. The remaining 14 codes have very similar performance for both 5 and 6 errors.
The main search only tested the performance of the generators for up to 6 errors. To further refine the results, we tested the top 16 generators for their ability to detect 7 and 8 errors in spans of up to 50 characters:
grep -oP "^[0-9A-V]{8}" results2.txt | sort -u > gen_list3.txt
g++ ezbase32/crccollide.cpp -o crccollide_50_4 -lpthread -O3 -DLENGTH=50 -DERRORS=4 -DTHREADS=4
mkdir results3
parallel -a gen_list3.txt ./crccollide_50_4 {} ">" results3/{}.txt
The resulting 16 files were examined manually to extract the maximum false positive rates for 7 and 8 errors.
generator | err7 | err8 |
---|---|---|
1.790741 | 4.440434 | |
0.994695 | 1.776173 | |
5155TAUI | 0.999173 | 1.024186 |
1.004357 | 1.937644 | |
AJ4RJKVB | 1.133229 | 1.211027 |
0.994857 | 4.440434 | |
1.564244 | 1.011930 | |
2.310268 | 1.211028 | |
JKHOGKBA | 1.075304 | 1.211028 |
0.987055 | 4.440434 | |
NHU34U1C | 0.994869 | 1.372498 |
O2IM6DP1 | 1.150664 | 1.007043 |
QKDE8B2P | 1.109647 | 1.121159 |
1.443917 | 4.440434 | |
U1PIRGA7 | 1.043527 | 1.080609 |
4.692732 | 1.008940 |
We have eliminated the 9 codes that have false positive rates for 7 or 8 errors exceeding 1.5.
The remaining 7 codes can be split into two groups.
The two "super-generators" can always detect 5 errors, but have a slightly degraded ability to detect 6 errors, with a false positive rate of 15.346780 at the length of 15 characters (similar rate as a 36-bit hash). The generator U1PIRGA7
has a slightly better performance, so it's the winner in this group.
generator | max5 | max6 | err5 | err6 | err7 | err8 |
---|---|---|---|---|---|---|
U1PIRGA7 | 500 | 13 | 0.000000 | 15.346780 | 1.043527 | 1.080609 |
AJ4RJKVB | 500 | 13 | 0.000000 | 15.346780 | 1.133229 | 1.211027 |
The remaining 5 generators do not significantly underperform in any situation compared to a 40-bit hash. The generator 5155TAUI
stands out with its overall lowest false positive rates and the ability to detect 5 errors in spans of 146 characters.
generator | max5 | max6 | err5 | err6 | err7 | err8 |
---|---|---|---|---|---|---|
JKHOGKBA | 127 | 22 | 0.274547 | 1.080168 | 1.075304 | 1.211028 |
5155TAUI | 146 | 23 | 0.315105 | 0.991495 | 0.999173 | 1.024186 |
QKDE8B2P | 132 | 23 | 0.287547 | 1.108732 | 1.109647 | 1.121159 |
O2IM6DP1 | 121 | 32 | 0.327585 | 0.995260 | 1.150664 | 1.007043 |
NHU34U1C | 122 | 25 | 0.300546 | 1.093295 | 0.994869 | 1.372498 |
The checksum algorithm proposal in file jamtis_polymod.py is attached to this document. The proposal uses the code based on the generator U1PIRGA7
(or 5155TAUI
as an alternative).
Most of the calculations were performed on a Monero Research Computing machine equipped with an AMD Threadripper CPU.
Thanks to Pieter "sipa" Wuille for developing the crccollide tool and providing some consultations.