Skip to content

Instantly share code, notes, and snippets.

@Keith-S-Thompson
Last active March 22, 2023 03:06
Show Gist options
  • Star 30 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Keith-S-Thompson/6920347 to your computer and use it in GitHub Desktop.
Save Keith-S-Thompson/6920347 to your computer and use it in GitHub Desktop.
Discussion of korn.c, 1987 IOCCC entry, mentioned in http://stackoverflow.com/a/19214007/827263

korn.c is the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell).

korn.hint, as the name implies, offers some hints.

A commenter on Stack Overflow asked for some clarification. I didn't want to post spoilers on the site, so I'm posting them here instead. If you haven't already (and if you're familiar with the rules of C) I encourage you to study the program for a while first.

=====

Here's the code:

main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}

This was written 2 years before the ANSI standard was published. Modern compilers are likely to accept it with warnings, but a few changes are needed to bring it into conformance:

#include <stdio.h>
int main(void) { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}

But that's not quite as much fun -- and it still depends on unix being a predefined macro that expands to 1. (Depending on the compiler, you can probably address that by compiling with -Dunix.)

Commenter Sebastian wrote:

Hmm... Whenever I try to evaluate the first arg to printf in my head, I get "21%six", but not "%six" as I would expect. Can anyone enlighten me where it went wrong?

You missed a couple of things. The format string starts with \021, an octal escape that expands (well, contracts) to a single character with the value 21 octal or 17 decimal. (The \0 by itself doesn't expand to a null character, though it would if it were followed by something other than another octal digit.) The \012 expands to character 10, which on most systems is the same as \n; probably \021 was chosen for symmetry with \012. The value \021 doesn't matter, because it's skipped.

Remember that the array indexing operator is commutative, as discussed here, and that unix (for some compilers in some modes) expands to 1. So the first argument to printf:

&unix["\021%six\012\0"]

which is equivalent to:

&"\021%six\012\0"[1]

That's a string literal indexed by 1, which refers to the second character of the string, the %. Taking the address of that character gives us a string pointer (note: a pointer to a string is by definition a pointer to the string's first character) pointing to a string with the value "%six\012\0", or, equivalently, "%six\n".

So the format string is "%six\n".

The second argument is:

(unix)["have"]+"fun"-0x60)

which, once you realize unix expands to 1 and indexing is commutative and that the ASCII value of 'a' is 0x61, is equivalent to the string "un". (I might go into more detail on this later.)

Taking all this into account, the printf call is equivalent to this:

printf("%six\n", "un");

and therefore to:

printf("unix\n");
@Keith-S-Thompson
Copy link
Author

I've fixed the URLs and corrected a typo or two.

@rajrana22
Copy link

Sorry, I'm not too great at pointer arithmetic. Could you explain to me why 'a'+"fun"-0x60 results in "un"?

Also, could you explain this part:

Taking the address of that character gives us a string pointer (note: a pointer to a string is by definition a pointer to the string's first character) pointing to a string with the value "%six\012\0"

I'm not sure why the address of the character % would point to that string.

@josephcsible
Copy link

Could you explain to me why 'a'+"fun"-0x60 results in "un"?

The ASCII value of 'a' is 0x61, so adding that and subtracting 0x60 is a net change of 1. Adding 1 to a pointer to a string yields a pointer to the substring without the first character.

Also, could you explain this part:

Taking the address of that character gives us a string pointer (note: a pointer to a string is by definition a pointer to the string's first character) pointing to a string with the value "%six\012\0"

I'm not sure why the address of the character % would point to that string.

The address of a character within a string is the same as the address of a substring starting at that character. Pointers to strings are represented in the exact same way as pointers to individual characters.

@rajrana22
Copy link

@josephcsible Thank you for the explanation! Makes perfect sense to me now.

@Keith-S-Thompson
Copy link
Author

Could you explain to me why 'a'+"fun"-0x60 results in "un"?

Naively, 'a' in ASCII has the value 0x61, so 'a'+"fun"-0x60 would be equivalent to "fun" + ('a' - 0x60) or, "fun + 1, which yields a pointer to the character 'u', which is also a pointer to the string "un".

If we break it down in terms of behavior defined by the C standard, it's a bit more complicated.

Pointer addition is commutative, and can be expressed either as integer + pointer or pointer + integer. The latter usually makes more sense.

'a' is the integer value 0x61 (97 decimal), so 'a'+"fun" is a pointer to a character in memory 97 bytes after the beginning of the string "fun". This goes past the bounds of the array object, so the behavior is undefined. In practice, most implementations don't check this kind of thing (they're not required to), and the result is very probably a valid address at the machine level even though it's not a valid C pointer value.

We then subtract 0x60 from that pointer value, which, if we assume a straightforward memory layout, gives us a pointer to the u.

It could fail if the string "fun" happens to be stored near the end of a memory segment, or if the compiler assumes the behavior is defined and performs some optimization that breaks if that assumption is invalid. In practice, in this particular case, it's likely to "work".

And of course the code was written before the 1989 ANSI standard was published. There was no explicit concept of "undefined behavior" at the time. The C Reference Manual in the back of K&R1 (1978) doesn't address what happens if pointer addition goes outside the bounds of an array.

@rajrana22
Copy link

rajrana22 commented Aug 2, 2022

@Keith-S-Thompson I see. I now understand what you guys meant by undefined behavior in this instance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment