Skip to content

Instantly share code, notes, and snippets.

@bostontrader
Last active February 27, 2021 00:40
Show Gist options
  • Save bostontrader/37ad3aba39d77e6f8a4e8212c02b25aa to your computer and use it in GitHub Desktop.
Save bostontrader/37ad3aba39d77e6f8a4e8212c02b25aa to your computer and use it in GitHub Desktop.
Crypto currencies frequently use numbers with uncommonly high precision. Dealing with these numbers is difficult for a variety of reasons. But there are solutions available.

On Crypto Currency Amounts

Crypto currencies frequently use numbers with uncommonly high precision. Dealing with these numbers is difficult for a variety of reasons:

  • The precision of these numbers easily exceeds the ordinary limits of software. For example, although 32 bit ints are ubiquitous, they are not big enough to do the job.
  • There are usually several layers of software working together in a stack. Figuring this out with just one piece of the stack is not enough. Our solution must satisfy all the members of the stack.
  • We must understand rounding and round-off error.
  • We must understand the difference between storage and internal manipulation of the numbers vs. presentation in a UI.

A Solution

The bookwerx-core-rust and bookwerx-ui-elm products are permeated with these issues. In this essay I will describe one particular solution and explore the choices we had and the reasoning behind the choices we took.

TL;DR The basic solution is that we’re going to use a system of arbitrary length integer arithmetic with radix and sign stored separately. Let’s see how this works…

Generic Number Format

Any financial amount can be written as several digits with a decimal point between some of them (or implicitly appended to the right side). We can represent this number using three fields:

  1. An integer significand. This is an unsigned integer that contains all the digits of the amount.
  2. A signed integer exponent. We can multiply the significand by 10 ^ exp in order to determine the position of the decimal point.
  3. A sign flag for positive or negative. We use a separate sign field as a means to simplify the handling of anything to do with the sign. If we attempt to use a signed significand instead, we encounter numerous tedious issues. Fuck it, just use a sign flag.

Integer Limits

The main problem we have with this scheme comes from implementing the significand.

UINT32 has a max of 4294967295. Considering that the amounts we encounter can easily have 8 digits of precision to the right of the decimal point, it should be obvious that it would be easy to find a plausible amount that won’t fit into an UINT32.

UINT64 has a max of 18 446 744 073 709 551 615. Given 8 digits of precision to the right of the decimal point, a UINT64 looks big enough for anything in real life. Even a DevCoin billionaire will have no problems with UINT64.

But 8 decimal places is not enough. ETH goes out to 18 decimal places. In addition, 64 bits is already too big for the general software stack. Javascript, JSON, and Elm in particular cannot do this.

UINT128 would obviously give us the ability to deal with much larger amounts. But if your software stack is already choking at 64 bits, it’s not going to like 128 bits.

These limits are the fundamental reasons that we chose to implement the significand using a variable length array of digits.

For the exponent we use a signed INT8. This is far, far above anything we would ever encounter in real-life so it’s a non-issue.

Floating Point vs. Integers

We would still have the same issue of precision with any floating point format so using them solves nothing. Not content to merely add no benefit, using floats also adds the additional headache of dealing with the least significant digits in the floating point representation. Much easier to do this with our integer format.

What Number Base to Use

Suppose we tried to use an array of unsigned INT8 in order to implement integers of arbitrary length. We would thus be representing our integers using base 256. We can easily generalize this to an array of INT32 and thus be representing our integers using base 2^32. Either way works and the actual mechanics of arithmetic using these arrays would be the same. But when considering the actual operations we’ll have to perform using these arbitrary length ints, let's consider the pros and cons re: selection of the number base?

  • Running time. The actual number of operations required to perform arithmetic will be higher when using lower number bases, as compared to higher number bases. How much arithmetic are doing doing and is this important?

  • Storage efficiency. Any array is going to need some indicator of the array size. Said indicator gives us some overhead. Our system of variable length integers is less efficient for small and simple numbers as compared to ints. How difference does this make in real-life and is this important?

  • Base conversion. Whatever number format we use will have to be converted to and from base 10. We start with base 10 that must be converted into an internal format and said internal format must later be converted back to base 10 for a UI. Perhaps it’s best to just use base 10?

These are the general considerations. Three obvious choices quickly emerge: base 10, base 2^8, and base 2^32.

The Stack

Together, bookwerx-core-rust and bookwerx-ui-elm use MySQL, Rust, HTTP, JSON, Javascript, and Elm. Given a particular choice of number base, what pros and cons do each of the members of this stack have? In the rest of this essay we'll document what we know about these issues as well as our experience with particular number base choices.

MySQL

The party starts with MySQL. We use MySQL for the underlying storage. This db necessarily records financial amounts when it stores transaction information. The db is a heavily linked thing that relates currencies, accounts, categories, transactions, and distributions (the DR and CR amounts of a transaction.) MySQL is very adept at doing this part of the task.

An important consideration here is that we’re not going to use MySQL for any numeric calculation. We only use it for storage.

As per our prior discussion, MySQL’s 64 bit BIGINT type is not suitable for the amount field.

The DECIMAL type might be workable but we have not explored it very much and our intuition says to not use it. It looks like said numbers must have maximum lengths pre-defined and will be stored as either strings or as an array of floats. Too many things going on here for our taste.

Base 10 We can store an arbitrary base 10 integer using VARCHAR(40) and only the base 10 digits plus a minus sign. If this limit is not big enough, we can easily change it. VARCHAR at this size uses 8 BITS per base 10 digit, given the ASCII character set, plus an additional byte to record the length. Obviously a more compact representation could be had, but in real-life this is a non-issue.

One drawback with this method is that we must deal with the possibility of non-numeric characters in this field. We solve that by using a CHECK constraint on this field and regex magic.

Reading ahead... The main reason we do this is because we'll need to send these amounts to the server using http query parameters which only likes strings. We'll also need these numbers in base 10 string format for the UI. Finally, when it comes time to round these values for the UI, we'll be glad we have base 10 strings to manipulate.

Rust Never Sleeps

The next step in our journey is to figure this out in Rust. Bookwerx-core-rust is a RESTful API server that receives http requests, uses MySQL, and sends results in JSON.

Rust does give us INT128 and that should be big enough for our purposes, but that’s too big for the rest of the stack.

*** Base 10 ***

POST /distributions and PUT /distributions use http requests which contain a body of query parameters. Said parameters are all strings. We can very simply INSERT and UPDATE our db using these strings, subject to validation constraints.

Speaking of... even though MySQL has a CHECK constraint to ensure that the amount field can only contain numeric digits (and an optional minus sign as a first character) the error messages it emits are, ahem, less then helpful. Especially to an end-user who might be tormented by them. So therefore the relevant POST and PUT handlers in bookwerx-core-rust also perform the same validation, but emit slightly friendlier error messages.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment