A few years ago, I worked on a type that could process double
values received as real-time market data updates and helped to convert them to be displayed for humans on a UI.
The type was used to process market data updates of U.S. Treasury Notes. The pricing of U.S. Treasury Notes is detailed on the cmegroup site. In this post I will focus on the pricing of bond and note products while keeping futures aside.
The prices of these products are displayed in fractions. The price of such a product may look as 99-032
. This means it is 99 full points plus 3/32s of a point plus 2/8 of 1/32s of a point. The last digit of the price can take up values of 0
2
, +
and 6
, where plus denotes 4/8 or 1/2.
Hence 99-032
equals 99 + 3/32 + 2/256 = 99.1015625. Note, that the last digit is always a multiple of 1/4th of a 1/32nd. Another example may be 100-23+
which equals 100.734375. However, this post will focus on displaying such data, converting it from a double
such as 100.734375 to 100-23+
.
A few years back .NET was less performant. In this post, I will revisit the original implementation as well as review 3 implementations I would consider building today to take advantage of the performance benefits of the modern .NET. Even the original implementation is significantly faster when run on .NET 8 due to the number of optimizations that have been built into the BCL and the JIT.
Please note, that the code samples in this post expect non-evil inputs. This post does not showcase production grade code snippets. Please don't use them as-is in real-world applications. The examples assume en-us as the default culture.
I don't recall the exact original implementation and I omit edge cases that are not relevant for U.S. Treasury Notes, the original implementation could take shape as shown below.
public class FractionalClassic(int Integer, int Fraction32, int Fraction4)
{
public double AsDouble() => Integer + (Fraction32 / 32.0) + (Fraction4 / 128.0);
public override string ToString()
{
char lastChar = Fraction4 switch
{
0 => '0',
1 => '2',
2 => '+',
3 => '6',
_ => throw new ArgumentOutOfRangeException()
};
return $"{Integer}-{Fraction32:00}{lastChar}";
}
}
Here I am using C# 12's primary constructor. The type accepts the Integer value which represents the full points, Fraction32 the 1/32s and Fraction4 is 1/4 of a 1/32nd. It has an additional method on top of the default methods inherited from object
. AsDouble()
returns the original value as a double
. It also has the ToString()
method overridden which uses C# interpolated string feature to format the full and fractional points into the desired string
object. The original implementation probably concatenated the string
as return Integer.ToString() + "-" + Fraction32.ToString("00") + lastChar;
given at the time of implementation string interpolation was not yet available.
The difference between the interpolated string version above and the referenced string concatenation becomes apparent when one looks at the memory allocations. Based on my measurements with 100 fractional double values the string concatenation allocates 13 KB on the heap while the version with string interpolations allocates 7.31 KB.
Each of the following implementations shown in this post is benchmarked with BenchmarkDotNet. The benchmarks are given a double
array with 100 values to be converted into string
objects. The benchmark splits the double
value into full and fractions points, then uses the showcased fractional type implementation to create a string
.
The corresponding benchmark of the above method is implemented as:
[Benchmark]
public void ClassicToString()
{
int i = 0;
foreach (double price in Prices)
{
var integerPart = (int)price;
var remainder = (price - integerPart) * 32;
byte fraction32 = (byte)remainder;
byte fraction4 = (byte)((remainder - fraction32) * 4);
var fractional = new FractionalClassic(integerPart, fraction32, fraction4);
DisplayPrices[i++] = fractional.ToString();
}
}
Note, that the result string
objects are not thrown away, but referenced by the DisplayPrices
array. Double
values that do not conform to a valid fractional number are rounded down to the nearest fraction.
In the second part of this blog post I will incrementally improve the current solution.
Let's introduce Fractional
type. Compared to the original implementation the Fractional
is a value type implemented as a record struct. This avoids an extra heap allocation, given this is typically a short lived. When passed around it gets copied by value. It has different type of fields too: it uses an int
for the full points and uses byte
for the fractions. Given fractional point values are limited to 32 and 4 respectively, the smaller data type is enough to represent them. The type uses required properties with essential validation in the setters. For example, it does not allow negative prices (note, that in rare real-life use-cases a negative price could be valid) and does not allow larger fractions to 31 and 4.
public record struct Fractional : ISpanFormattable
{
private int _integer;
private byte _fraction32;
private byte _fraction4;
public required int Integer
{
get => _integer; init
{
if (value < 0)
throw new ArgumentOutOfRangeException();
_integer = value;
}
}
public required byte Fraction32
{
get => _fraction32; init
{
if (value > 31 || value < 0)
throw new ArgumentOutOfRangeException();
_fraction32 = value;
}
}
public required byte Fraction4
{
get => _fraction4; init
{
if (value > 3 || value < 0)
throw new ArgumentOutOfRangeException();
_fraction4 = value;
}
}
public double AsDouble() => _integer + (Fraction32 / 32.0) + (Fraction4 / 128.0);
public string ToString(string? format, IFormatProvider? formatProvider)
{
Span<char> destination = stackalloc char[14];
if (TryFormat(destination, out int written, format, formatProvider))
return destination[0..written].ToString();
throw new ArgumentOutOfRangeException();
}
public bool TryFormat(Span<char> destination, out int charsWritten, ReadOnlySpan<char> format = default, IFormatProvider? provider = default)
{
charsWritten = 0;
if (!Integer.TryFormat(destination, out int currentWritten))
return false;
charsWritten += currentWritten;
if (charsWritten + 1 > destination.Length)
return false;
destination[charsWritten++] = '-';
if (!Fraction32.TryFormat(destination[charsWritten..], out currentWritten, "00"))
return false;
charsWritten += currentWritten;
if (charsWritten + 1 > destination.Length)
return false;
char lastChar = Fraction4 switch
{
0 => '0',
1 => '2',
2 => '+',
3 => '6'
};
destination[charsWritten++] = lastChar;
return true;
}
public override string ToString() => ToString(null, null);
}
Another key difference is how it creates the string
object. The ToString()
method allocates an array on the stack for 14 characters. In a real application this should be a larger buffer, or in case the latter invoked TryFormat()
fails, it could allocate a regular array on the heap. For simplicity this code only uses the stack allocated array. The type implements the ISpanFormattable
interface that comes with the premise of a TryFormat()
method. This method writes the fields of the type as a string
into the provided destination buffer. Both the int
and byte
types implement the ISpanFormattable
interface, so the code defers formatting to their TryFormat()
method. The -
character and the final digit are written as regular char
s.
This solution brings performance improvements in terms of mean execution time and allocated memory as well:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|----------------------- |-----------:|---------:|---------:|-------:|-------:|----------:|
| FractionalToString | 5,071.4 ns | 47.27 ns | 41.91 ns | 0.6561 | 0.0076 | 4.06 KB |
| ClassicToString | 6,857.8 ns | 52.74 ns | 44.04 ns | 1.1902 | 0.0153 | 7.31 KB |
The next iteration is called creatively the FastFractional
. This post inlines the discussed methods of the type, while the complete implementation can be found on GitHub.
One of the two key differences with this type is that instead of having Fraction32 and Fraction4 properties we only have a single Fractions128 property. This not just makes the struct
smaller and inherently faster to copy by value, but it also simplifies parsing the double
input values.
The second key difference is the ToString()
method implementation:
public record struct FastFractional : ISpanFormattable
{
private byte _fractions128;
public required uint Integer { get; init; }
public required byte Fractions128
{
// ...
}
public string ToString(string? format, IFormatProvider? formatProvider)
{
return string.Create(CountDigits(Integer) + 4, ((Integer, Fractions128)), static (destination, state) =>
{
if (state.Integer > 89 && state.Integer < 128)
BinaryPrimitives.WriteUInt64LittleEndian(MemoryMarshal.AsBytes(destination), GetInteger((int)state.Integer));
else
state.Integer.TryFormat(destination, out _);
BinaryPrimitives.WriteUInt64LittleEndian(MemoryMarshal.AsBytes(destination[^4..]), GetFractions(state.Fractions128));
});
}
// ...
private static ulong GetInteger(int value)
{
ReadOnlySpan<ulong> table = new ulong[] { 0x20002000300039, 0x20002000310039, ... };
return table[value - 90];
}
private static ulong GetFractions(int index)
{
ReadOnlySpan<ulong> table = new ulong[] { 0x3000300030002d, 0x3200300030002d, ... };
return table[index];
}
private static int CountDigits(uint value)
{
// Algorithm based on https://lemire.me/blog/2021/06/03/computing-the-number-of-digits-of-an-integer-even-faster.
ReadOnlySpan<long> table = new long[]
{
4294967296,
8589934582,
//...
};
long tableValue = Unsafe.Add(ref MemoryMarshal.GetReference(table), uint.Log2(value));
return (int)((value + tableValue) >> 32);
}
}
The ToString()
method uses the string.Create with the SpanAction overload to create the result string
.
CountDigits()
counts the number of char
s required for the full points. The fractional points always use 3 char
s and an additional -
concatenating full and fractional points. The CountDigits()
method uses an algorithm described by Daniel Lemire, and also used by .NET 8. The idea behind the algorithm is to calculate the number of char
s required when the number is formatted as a binary number and convert that to decimal using log2 to log10 conversion formula. Negative prices are not considered by this implementation.
The SpanAction
of string.Create
method provides a callback, which writes the full and fractional points as number with two operations. A key observation for these numbers is that most of the values in practice fall in the [90..128] range. There might be prices higher or lower, but most real-life examples fall within this range. This gives an opportunity for an interesting optimization: we may pre-calculate the underlying byte values of the formatted number instead of formatting the integer
value every time. The second optimization is that the numbers within this range are 2 or 3 characters. The 3 chars require 6 bytes, so we may use a larger, 8 byte ulong
type to encode the underlying bytes of the original number. This optimization is also used by .NET 7. Finally, instead of writing the 3 characters with instructions, we can copy the ulong
value with WriteUInt64LittleEndian()
method as a single operation.
When the full points of the price does not fall within the [90..128] range the regular int.TryFormat()
method is used to format the number as a string
.
The fractional part of the price also utilizes the same idea: all values between -000
and -316
can be pre-encoded ulong
numbers. The results are kept in a ulong
array of 128 items. Finding the right number is looking it up in an array. Then formatting the value into the string
happens by writing it using WriteUInt64LittleEndian
to the destination buffer.
This approach yields a significant jump in the mean execution time on the given input sample. Note that the values of the input sample fall within the optimized range.
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|----------------------- |-----------:|---------:|---------:|-------:|-------:|----------:|
| FastFractionalToString | 736.3 ns | 9.31 ns | 8.71 ns | 0.6628 | 0.0095 | 4.06 KB |
| FractionalToString | 5,071.4 ns | 47.27 ns | 41.91 ns | 0.6561 | 0.0076 | 4.06 KB |
| ClassicToString | 6,857.8 ns | 52.74 ns | 44.04 ns | 1.1902 | 0.0153 | 7.31 KB |
The final iteration of the type is called CopyFractional
. The sample below highlights the differences compared to the previous solutions, while the complete example can be found on GitHub.
public string ToString(string? format, IFormatProvider? formatProvider)
{
return string.Create(CountDigits(Integer) + 4, ((Integer, Fractions128)), static (destination, state) =>
{
ReadOnlySpan<char> calculated = "090-000090-002...";
var startIndex = (int)(state.Integer - 90) * (7 * 128) + state.Fractions128 * 7;
if (state.Integer > 99 && state.Integer < 128)
{
calculated.Slice(startIndex, 7).CopyTo(destination);
}
else if (state.Integer > 89 && state.Integer <= 99)
{
calculated.Slice(startIndex + 1, 6).CopyTo(destination);
}
else
{
state.Integer.TryFormat(destination, out _);
BinaryPrimitives.WriteUInt64LittleEndian(MemoryMarshal.AsBytes(destination[^4..]), GetFractions(state.Fractions128));
}
});
}
This approach uses a similar idea to the previous solution. However, instead of writing the full and fractional parts in two iterations it uses a single copy when the value falls within the [90..128] range. All values are pre-formatted and concatenated in a single string
that is stored in the application binaries. This string
can be accessed using the ReadOnlySpan<char> calculated =
instruction. ToString()
method finds the corresponding slice of the preformatted string
and copies it to the destination buffer.
The benchmarks show that this approach as the fastest on my test machine:
| Method | Mean | Error | StdDev | Gen0 | Gen1 | Allocated |
|----------------------- |-----------:|---------:|---------:|-------:|-------:|----------:|
| FastFractionalToString | 736.3 ns | 9.31 ns | 8.71 ns | 0.6628 | 0.0095 | 4.06 KB |
| CopyFractionalToString | 676.9 ns | 13.01 ns | 11.54 ns | 0.6628 | 0.0095 | 4.06 KB |
| FractionalToString | 5,071.4 ns | 47.27 ns | 41.91 ns | 0.6561 | 0.0076 | 4.06 KB |
| ClassicToString | 6,857.8 ns | 52.74 ns | 44.04 ns | 1.1902 | 0.0153 | 7.31 KB |
However, this approach comes with a significant trade-off: the size of the binary increases by more than 60 KB as the whole preformatted string
is compiled into the IL.
A careful reader may wonder if Single instruction, multiple data, SIMD could be leveraged for the above task. While it is possible to vectorize parts of this algorithm, the final string
object has to be created with a non-SIMD instruction. So far, I have not found an efficient solution to vectorize the algorithm, but if there is a managed way achieving this, I would be more than happy to learn about it. The fact that eventually the vectorized loop needs to iterate all the values of the vector one-by-one to create the individual string
objects diminishes the performance benefit of using SIMD. My best attempts to vectorize the algorithm reached a mean execution time within the range of ~1000 ns.