reinderien/insanity.md

## insanity.md

      
    Raw
  

              insanity.md
            
          
    Welcome to the descent into madness that is Excel units.
xlsxwriter has
this lovely nugget
that, unless you really care about exact column widths, you probably never had the urge to look up:
# Convert column width from user units to character width.
# For Calabri 11.
max_digit_width = 7
padding = 5
# ...
width = int((int(width * max_digit_width + 0.5) + padding)
            / float(max_digit_width) * 256.0) / 256.0
If you're trying to exactly match the width of some reference spreadsheet columns, this becomes a pesky source of error.
The interesting questions are:

How much error?
What number can I feed into this thing so that a certain number goes out?

Unfortunately, there is no exact solution, because floor is lossy. But not all hope is lost! Do some math. A
straightforward way to approximate this creature is, for both floor functions, calculate linear upper and lower
bounds. Keep the "lower lower" and "upper upper", and discard the "lower upper" and "upper lower". Some math
later, we get exact lower and upper bounds for the error added:
lower = (5 - 0.5)/7 - 1/256 + 1/7/256 = 573/896
upper = (5 + 0.5)/7 = 11/14

To make sure that I'm not insane, run this numerical simulator for a million or so iterations:
#!/usr/bin/env python3

from math import floor
from random import random

dmin = 573/896
dmax = 11/14
total = 0
n = 0
best_uerror = best_lerror = float('inf')

while True:
    try:
        x = 100 * random()
        y = floor(256/7 * (5 + floor(7*x + 0.5))) / 256
        delta = y - x

        uerror = dmax - delta
        lerror = delta - dmin

        if uerror < 0:
            raise ValueError()
        if lerror < 0:
            raise ValueError()

        best_lerror = min(best_lerror, lerror)
        best_uerror = min(best_uerror, uerror)

        total += delta
        n += 1
        if n % 100 == 0:
            print(n, end='\r')

    except KeyboardInterrupt:
        break

print()
print(f'mean error = {total/n / (dmin + dmax)*2 - 1}')
print(f'best lerror = {best_lerror}')
print(f'best uerror = {best_uerror}')
This outputs:
1438000
mean error = 2.801041729827425e-05
best lerror = 1.1766504732779737e-06
best uerror = 1.216649843205353e-06

The lower and upper bound are extremely accurate. The mean from simulation is also very
close to the average of the lower and upper bound.
TLDR: xlsxwriter adds about 0.7126 on average to all column widths.