Skip to content

Instantly share code, notes, and snippets.

@miguelmota
Last active June 7, 2020 02:11
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save miguelmota/ff591873da4f76393ce48efe62d49fd1 to your computer and use it in GitHub Desktop.
Save miguelmota/ff591873da4f76393ce48efe62d49fd1 to your computer and use it in GitHub Desktop.
C++ base58 encode
// NOTICE! The below implementation is BROKEN as pointed out by @HetDerwel. Please take a look at his implementation instead:
// https://gist.github.com/miguelmota/ff591873da4f76393ce48efe62d49fd1#gistcomment-3321715
// ---------------------------------------------------------------------------------------
// original gist:
// source from: https://bitcoin.stackexchange.com/questions/76480/encode-decode-base-58-c
const char * const ALPHABET =
"123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz";
const char ALPHABET_MAP[128] = {
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, 0, 1, 2, 3, 4, 5, 6, 7, 8, -1, -1, -1, -1, -1, -1,
-1, 9, 10, 11, 12, 13, 14, 15, 16, -1, 17, 18, 19, 20, 21, -1,
22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, -1, -1, -1, -1, -1,
-1, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, -1, 44, 45, 46,
47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, -1, -1, -1, -1, -1
};
int EncodeBase58(const string input, int len, unsigned char result[]) {
unsigned char const* bytes = (unsigned const char*)(input.c_str());
unsigned char digits[len * 137 / 100];
int digitslen = 1;
for (int i = 0; i < len; i++) {
unsigned int carry = (unsigned int) bytes[i];
for (int j = 0; j < digitslen; j++) {
carry += (unsigned int) (digits[j]) << 8;
digits[j] = (unsigned char) (carry % 58);
carry /= 58;
}
while (carry > 0) {
digits[digitslen++] = (unsigned char) (carry % 58);
carry /= 58;
}
}
int resultlen = 0;
// leading zero bytes
for (; resultlen < len && bytes[resultlen] == 0;)
result[resultlen++] = '1';
// reverse
for (int i = 0; i < digitslen; i++)
result[resultlen + i] = ALPHABET[digits[digitslen - 1 - i]];
result[digitslen + resultlen] = 0;
return digitslen + resultlen;
}
std::string word = "hello world";
int len = word.length();
unsigned char x[0]; // = something
unsigned char encoded[(len) * 137 / 100];
EncodeBase58(word, len, encoded);
printf("%s", encoded); // StV1DL6CwTryKyV
@HetDerwel
Copy link

HetDerwel commented May 28, 2020

Take this array...
unsigned char digits[len * 137 / 100]; // fix it by adding a = {0};
and then this statement...
carry += (unsigned int) (digits[j]) << 8;

What's going to happen if the array isn't initialised to 0? I was running this code in Release Mode with most optimisations on and that array didn't get initialised and so I was getting all sorts of weird values out of the above code. Took me quite a bit of figuring out to track it down to this, as in Debug Mode the array was initialised on the architecture I was running it on.

Here are some additional thoughts though:

  1. You usually want to encode binary data to convert it to human readable text when using a Base58 encoder, but for some reason your input is a string. Why? Binary data isn't really suited to a string, so wouldn't it be better to take say, std::vector? This also gives you the added advantage in that you can lose the len argument as in the algorithm you can do a .size() on the vector.
  2. Taking a char array as somewhere to dump the result of the encode is probably a bad idea here. If someone doesn't provide the right amount of space, then it's all going to go badly. Might be an idea to return a string, since that's what the algorithm is trying to produce. I see no need to return a length as this can easily be obtained from the std::string and you don't need to null terminate.

The above points do add a little overhead because you're using containers rather than static arrays, but it's rare that you'll ever need to Base58 encode vast amounts of data so it should be negligible. This is what I ended up with...

inline static constexpr const uint8_t base58map[] = {
	'1', '2', '3', '4', '5', '6', '7', '8',
	'9', 'A', 'B', 'C', 'D', 'E', 'F', 'G',
	'H', 'J', 'K', 'L', 'M', 'N', 'P', 'Q',
	'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y',
	'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g',
	'h', 'i', 'j', 'k', 'm', 'n', 'o', 'p',
	'q', 'r', 's', 't', 'u', 'v', 'w', 'x',
	'y', 'z' };

std::string EncodeBase58(const std::vector<uint8_t>& data, const uint8_t* mapping)
{
	std::vector<uint8_t> digits((data.size() * 138 / 100) + 1);
	size_t digitslen = 1;
	for (size_t i = 0; i < data.size(); i++)
	{
		uint32_t carry = static_cast<uint32_t>(data[i]);
		for (size_t j = 0; j < digitslen; j++)
		{
			carry = carry + static_cast<uint32_t>(digits[j] << 8);
			digits[j] = static_cast<uint8_t>(carry % 58);
			carry /= 58;
		}
		for (; carry; carry /= 58)
			digits[digitslen++] = static_cast<uint8_t>(carry % 58);
	}
	std::string result;
	for (size_t i = 0; i < (data.size() - 1) && !data[i]; i++)
		result.push_back(mapping[0]);
	for (size_t i = 0; i < digitslen; i++)
		result.push_back(mapping[digits[digitslen - 1 - i]]);
	return result;
}

Usage:
	std::vector<uint8_t> data{ 'h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd' };
	std::string result = EncodeBase58(data, base58map);

Passing in a character mapping allows for Base58 encoding where the dictionary is different from the standard one. I've found examples of this elsewhere.

@miguelmota
Copy link
Author

Thanks for the detailed explanation @HetDerwel! I follow what you're saying.

I'll leave a note on the snippet and hopefully people use your improved implementation if they stumble across this gist. Thanks!

@HetDerwel
Copy link

HetDerwel commented Jun 7, 2020

As it turns out, there is another couple of issues with the original code that was pasted from StackExchange. This is a bug in how zeros are counted on the start of the input data. This bug only happens if your data starts with null bytes, but the best way to see this problem is if you try to encode a single NULL byte. The original code you pasted will output as a result "11", when it should in fact just be "1". The reason why this happens is because of this loop:

for (size_t i = 0; i < digitslen; i++)
        result.push_back(mapping[digits[digitslen - 1 - i]]);

This loop is ALWAYS executed at least once, but if your data is just a single byte, this loop shouldn't execute at all. Leading null bytes should be dealt with by the previous loop, but the easiest way to solve this one is to allow this loop to always run, but stop the previous loop running for a single null byte. So have also fixed that in the above code with:

for (size_t i = 0; i < (data.size() - 1) && !data[i]; i++)

Lastly, the size for your digits array is incorrect and will lead to random crashing. That's also been fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment