rbrockerhoff/Decimal.md

## Decimal.md

      
    Raw
  

              Decimal.md
            
          
    Pre-proposal: Safer Decimal Calculations


Proposal: TBD
Author(s): Rainer Brockerhoff
Status: TBD
Review manager: TBD

Quoting the “The Swift Programming Language” book: “Swift adopts safe programming patterns…”; “Swift is friendly to new programmers”. The words “safe” and “safety” are found many times in the book and in online documentation. The usual rationale for safe features is, to quote a typical sentence, “…enables you to catch and fix errors as early as possible in the development process”.
One frequent stumbling point for both new and experienced programmers stems from the vagaries of binary floating-point arithmetic. This tentative pre-proposal suggests one possible way to make the dangers somewhat more clear.
My intention here is to start a discussion on this to inform the ongoing (and future) reasoning on extending and regularising arithmetic in Swift.
Motivation

Floating-point hardware on most platforms that run Swift — that is, Intel and ARM CPUs — uses the binary representation forms of the IEEE 754-2008 standard. Although some few mainframes and software libraries implement the decimal representations this is not currently leveraged by Swift. Apple's NSDecimal and NSDecimalNumber implementation is awkward to use in Swift, especially as standard arithmetic operators cannot be used directly.
Although it is possible to express floating-point constants in hexadecimal (0x123.AB) with an optional binary exponent (0x123A.Bp-4), decimal-form floating-point constants (123.45 or 1.2345e2) are extremely common in practice.
Unfortunately it is tempting to use floating-point arithmetic for financial calculations or other purposes such as labelling graphical or statistical data. Constants such as 0.1, 0.01, 0.001 and variations or multiples thereof will certainly be used in such applications — and almost none of these constant can be precisely represented in binary floating-point format.
Rounding errors will therefore be introduced at the outset, causing unexpected or outright buggy behaviour down the line which will be surprising to the user and/or the programmer. This will often happen at some point when the results of a calculation are compared to a constant or to another result.
Current Solution

As things stand, Swift's default print() function, Xcode playgrounds etc. do some discreet rounding or truncation to make the problem less apparent - a Double initialized with the literal 0.1 prints out as 0.1 instead of the exact value of the internal representation, something like 0.100000000000000005551115123125782702118158340454101562.
This, unfortunately, masks this underlying problem in settings such as “toy” programs or educational playgrounds, leading programmers to be surprised later when things won't work. A cursory search on StackOverflow reveals tens of thousands of questions with headings like “Is floating point math broken?".
Warning on imprecise literals

To make decimal-format floating-point literals safe, I suggest that the compiler should emit a warning whenever a literal is used that cannot be safely represented as an exact value of the type expected. (Note that 0.1 cannot be represented exactly as any binary floating-point type.)
The experienced programmer will, however, be willing to accept some imprecision under circumstances that cannot be reliably determined by the compiler. I suggest, therefore, that this acceptance be indicated by an annotation to the literal; a form such as ~0.1 might be easiest to read and implement, as the prefix ~ operator currently has no meaning for a floating-point value. A “fixit” would be easily implemented to insert the missing notation.
Conversely, to avoid inexperienced or hurried programmers to strew ~s everywhere, it would be useful to warn, and offer to fix, if the ~ is present but the literal does have an exact representation.
Tolerances

A parallel idea is that of tolerances, introducing an ‘epsilon’ value to be used in comparisons. Unfortunately an effective value of the epsilon depends on the magnitude of the operands and there are many edge cases.
Introducing a special type along the lines of “floating point with tolerances” — using some accepted engineering notation for literals like 100.5±0.1 — might be useful for specialised applications but will not solve this specific problem. Expanding existing constructs to accept an optional tolerance value, as has been proposed elsewhere, may be useful in those specific instances but not contribute to raise programmer awareness of unsafe literals.
Full Decimal type proposal

There are cogent arguments that prior art/habits and the already complex interactions between Double, Float, Float80 and CGFloat are best left alone.
However, there remains a need for a precise implementation of a workable Decimal value type for financial calculations. IMHO repurposing the existing NSDecimalNumber from Objective-C is not the best solution.
As most experienced developers know, the standard solution for financial calculations is to internally store fixed-point values — usually but not always in cents — and then print the “virtual” point (or decimal comma, for the rest of us) on output.
I propose, therefore, an internal data layout like this:

UInt16 - position of the “virtual” point, starting at 0
UInt16 - data array size - 1
[Int32] - contiguous data array, little-endian order, grown as needed.

Note that both UInt16 fields being zero implies that the number is reduced to a 32-bit Integer. Number literals in Swift can be up to 2048 bits in size, so the maximum data array size would be 64, although it could conceivably grow beyond that. The usual cases of the virtual point position being 0 or 2 could be aggressively optimized for normal arithmetic operators.
Needless to say such a Decimal number would accept and represent literals such as 0.01 with no problems. It would also serve as a BigNum implementation for most purposes.
No doubt implementing this type in the standard library would allow for highly optimized implementations for all major CPU platforms. In particular, the data array should probably be [Int64] for 64-bit platforms.
Acknowledgement

Thanks to Erica Sadun for their help with an early version of this pre-proposal.
Some references


http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html
https://docs.python.org/2/tutorial/floatingpoint.html
https://en.wikipedia.org/wiki/IEEE_floating_point
https://randomascii.wordpress.com/category/floating-point/
http://code.jsoftware.com/wiki/Essays/Tolerant_Comparison