Skip to content

Instantly share code, notes, and snippets.

@reinderien
Created August 15, 2019 03:50
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save reinderien/cc21cee33a2803003964812f7c5d21c2 to your computer and use it in GitHub Desktop.
Save reinderien/cc21cee33a2803003964812f7c5d21c2 to your computer and use it in GitHub Desktop.
xlsxwriter and insane units

Welcome to the descent into madness that is Excel units.

xlsxwriter has this lovely nugget that, unless you really care about exact column widths, you probably never had the urge to look up:

# Convert column width from user units to character width.
# For Calabri 11.
max_digit_width = 7
padding = 5
# ...
width = int((int(width * max_digit_width + 0.5) + padding)
            / float(max_digit_width) * 256.0) / 256.0

If you're trying to exactly match the width of some reference spreadsheet columns, this becomes a pesky source of error. The interesting questions are:

  • How much error?
  • What number can I feed into this thing so that a certain number goes out?

Unfortunately, there is no exact solution, because floor is lossy. But not all hope is lost! Do some math. A straightforward way to approximate this creature is, for both floor functions, calculate linear upper and lower bounds. Keep the "lower lower" and "upper upper", and discard the "lower upper" and "upper lower". Some math later, we get exact lower and upper bounds for the error added:

lower = (5 - 0.5)/7 - 1/256 + 1/7/256 = 573/896
upper = (5 + 0.5)/7 = 11/14

To make sure that I'm not insane, run this numerical simulator for a million or so iterations:

#!/usr/bin/env python3

from math import floor
from random import random

dmin = 573/896
dmax = 11/14
total = 0
n = 0
best_uerror = best_lerror = float('inf')

while True:
    try:
        x = 100 * random()
        y = floor(256/7 * (5 + floor(7*x + 0.5))) / 256
        delta = y - x

        uerror = dmax - delta
        lerror = delta - dmin

        if uerror < 0:
            raise ValueError()
        if lerror < 0:
            raise ValueError()

        best_lerror = min(best_lerror, lerror)
        best_uerror = min(best_uerror, uerror)

        total += delta
        n += 1
        if n % 100 == 0:
            print(n, end='\r')

    except KeyboardInterrupt:
        break

print()
print(f'mean error = {total/n / (dmin + dmax)*2 - 1}')
print(f'best lerror = {best_lerror}')
print(f'best uerror = {best_uerror}')

This outputs:

1438000
mean error = 2.801041729827425e-05
best lerror = 1.1766504732779737e-06
best uerror = 1.216649843205353e-06

The lower and upper bound are extremely accurate. The mean from simulation is also very close to the average of the lower and upper bound.

TLDR: xlsxwriter adds about 0.7126 on average to all column widths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment