Skip to content

Instantly share code, notes, and snippets.

@FeepingCreature
Last active January 7, 2024 13:40
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save FeepingCreature/9d85ed1e716fe568e20e1074c117f7c8 to your computer and use it in GitHub Desktop.
Save FeepingCreature/9d85ed1e716fe568e20e1074c117f7c8 to your computer and use it in GitHub Desktop.
tm_gmtoff: A Typesystem Mystery

tm_gmtoff: A Typesystem Mystery

I was recently interfacing with the tm struct in my language, Neat. For context, on Unix systems, tm or "broken-down time", is a data structure that decodes timestamps, which are simple numeric values, into human-understandable values in the local calendar - year, day, hour, minute and so on. It's a simple struct, so I just copied it from the manpage. Now, int in Neat is always 32-bit, like in D, and long is always 64-bit, so I had to translate the C types. Luckily, C int and Neat int are roughly equivalent:

struct tm {
  int tm_sec;
  int tm_min;
  int tm_hour;
  int tm_mday;
  int tm_mon;
  int tm_year;
  int tm_wday;
  int tm_yday;
  int tm_isdst;
  /* Seconds East of UTC */
  long int tm_gmtoff

Hang on...? Was long in C 32-bit or 64-bit? I think it was still 32-bit even on a 64-bit system, right? Right. I mean, it makes no sense, right? Why would a field encoding offset to GMT in seconds ever need to be 64-bit? The normal 32-bit offset gives you 100 years of offset to GMT, which is already grossly excessive. Yeah, gotta be 32-bit.

int tm_gmtoff;

Like a car crash in slow motion, you can probably already see the bug report coming. No, it turns out C long int is not 32-bit on 64-bit systems, but 64-bit. long int is guaranteed to be at least 32-bit, but can be bigger.

But that left me with a mystery on my hands.

Why in the blue blazes is seconds offset to UTC encoded as a signed 64-bit number on my system? No, on any system? Do we need to handle eldritch timezones from deep antiquity? Is there a TZ=Rome/Nero setting? Who did this? The manual just says it originates from 4.3BSD. What is that even? In fact, let's shed some blame and look at git blame...

... huh. Initial commit? 1995? Wait, 4.3BSD is from 1986?

The old-timers reading this will have already realized the issue. We young 'uns think of int as the 32-bit type and long int as the 32- to 64-bit type, but that's in truth merely an artifact of the now-decades of dominance of 32-bit and 64-bit machines. 1986 is far enough back that the code of the day would have run on 16-bit machines. In fact, the C standard, which arose around the same time, only guarantees that int is at least 16 bits large, not 32!

Looking at all the other fields with fresh eyes: seconds, hours, days... all of those fields, which I interpreted as 32-bit by habit, would (and did) fit snugly into a signed 16-bit value.

But 16 signed bits would not have been enough for a timezone offset in seconds. That only gets you to 10 hours, well short of the 24 required. The developers of the day could have decided to redefine tm_gmtoff in tens of seconds, but why bother? long int is right there - and is conveniently guaranteed in the C standard (ie. K&R) to have "at least 32 bits" (preposterous, who would ever use more than that) of storage. Of course, nowadays we would probably use int16_t and int32_t with fixed size guarantees. But this set of types did not exist yet. long int was the only type guaranteed to have the size required.

And that's why, in 2023, my computer represents timezone offset with a field large enough to exceed the lifetime of the universe by a factor of 40.

@jhi
Copy link

jhi commented Dec 30, 2023

Was long in C 32-bit or 64-bit? I think it was still 32-bit even on a 64-bit system, right? Right.

When the 64 bit types started to emerge, there was a period of confusion, and it is still not completely settled. https://unix.org/version2/whatsnew/lp64_wp.html

And the true answer in C is:

sizeof(char) == 1
sizeof(char) <= sizeof(short) <= sizeof(int) <= sizeof(long) <= sizeof(long long)
sizeof(type*) depends on the type and can be smaller or larger than sizeof(int) or sizeof(long)

Also note that short is really short int and long is really long int (and long long is really long long int). But see above, the prefixes don't necessarily force the type to be any shorter or longer.

There are now (since C99, 24 soon 25 years ago) exact size types (intNN_t) and minimum size types.

I once coded in an SILP64 system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment