I was recently interfacing with the tm
struct in my language, Neat.
For context, on Unix systems, tm
or "broken-down time", is a data structure that decodes timestamps,
which are simple numeric values, into human-understandable values in the local calendar - year, day,
hour, minute and so on.
It's a simple struct, so I just copied it from the manpage. Now, int
in Neat is always 32-bit,
like in D, and long
is always 64-bit, so I had to translate the C types. Luckily, C int
and Neat
int
are roughly equivalent:
struct tm {
int tm_sec;
int tm_min;
int tm_hour;
int tm_mday;
int tm_mon;
int tm_year;
int tm_wday;
int tm_yday;
int tm_isdst;
/* Seconds East of UTC */
long int tm_gmtoff
Hang on...? Was long
in C 32-bit or 64-bit? I think it was still 32-bit even on a 64-bit system, right? Right.
I mean, it makes no sense, right? Why would a field encoding offset to GMT in seconds ever need to be 64-bit?
The normal 32-bit offset gives you 100 years of offset to GMT, which is already grossly excessive.
Yeah, gotta be 32-bit.
int tm_gmtoff;
Like a car crash in slow motion, you can probably already see the bug report coming. No, it turns out C long int
is
not 32-bit on 64-bit systems, but 64-bit. long int
is guaranteed to be at least 32-bit, but can be bigger.
But that left me with a mystery on my hands.
Why in the blue blazes is seconds offset to UTC encoded as a signed 64-bit number on my system? No, on any system?
Do we need to handle eldritch timezones from deep antiquity? Is there a TZ=Rome/Nero setting? Who did this?
The manual just says it originates from 4.3BSD. What is that even? In fact, let's shed some blame and look at
git blame
...
... huh. Initial commit? 1995? Wait, 4.3BSD is from 1986?
The old-timers reading this will have already realized the issue. We young 'uns think of int
as the 32-bit type and
long int
as the 32- to 64-bit type, but that's in truth merely an artifact of the now-decades of dominance of 32-bit
and 64-bit machines. 1986 is far enough back that the code of the day would have run on 16-bit machines.
In fact, the C standard, which arose around the same time, only guarantees that int
is at least 16 bits large, not 32!
Looking at all the other fields with fresh eyes: seconds, hours, days... all of those fields, which I interpreted as 32-bit by habit, would (and did) fit snugly into a signed 16-bit value.
But 16 signed bits would not have been enough for a timezone offset in seconds. That only gets you to 10 hours,
well short of the 24 required. The developers of the day could have decided to redefine tm_gmtoff
in tens of
seconds, but why bother? long int
is right there - and is conveniently guaranteed in the C standard (ie. K&R) to have "at least 32 bits"
(preposterous, who would ever use more than that) of storage. Of course, nowadays we would probably use int16_t
and int32_t
with fixed size guarantees. But this set of types did not exist yet. long int
was the only type guaranteed to have the size required.
And that's why, in 2023, my computer represents timezone offset with a field large enough to exceed the lifetime of the universe by a factor of 40.
Was long in C 32-bit or 64-bit? I think it was still 32-bit even on a 64-bit system, right? Right.
When the 64 bit types started to emerge, there was a period of confusion, and it is still not completely settled. https://unix.org/version2/whatsnew/lp64_wp.html
And the true answer in C is:
sizeof(char) == 1
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
sizeof(type*)
depends on the type and can be smaller or larger thansizeof(int)
orsizeof(long)
Also note that
short
is reallyshort int
andlong
is reallylong int
(andlong long
is reallylong long int
). But see above, the prefixes don't necessarily force the type to be any shorter or longer.There are now (since C99,
24soon 25 years ago) exact size types (intNN_t
) and minimum size types.I once coded in an SILP64 system.