Skip to content

Instantly share code, notes, and snippets.

@Keith-S-Thompson
Last active March 22, 2023 03:06
Show Gist options
  • Star 30 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Keith-S-Thompson/6920347 to your computer and use it in GitHub Desktop.
Save Keith-S-Thompson/6920347 to your computer and use it in GitHub Desktop.
Discussion of korn.c, 1987 IOCCC entry, mentioned in http://stackoverflow.com/a/19214007/827263

korn.c is the "Best One Liner" winner of the 1987 International Obfuscated C Code Contest, by David Korn (yes, the author of the Korn Shell).

korn.hint, as the name implies, offers some hints.

A commenter on Stack Overflow asked for some clarification. I didn't want to post spoilers on the site, so I'm posting them here instead. If you haven't already (and if you're familiar with the rules of C) I encourage you to study the program for a while first.

=====

Here's the code:

main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}

This was written 2 years before the ANSI standard was published. Modern compilers are likely to accept it with warnings, but a few changes are needed to bring it into conformance:

#include <stdio.h>
int main(void) { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}

But that's not quite as much fun -- and it still depends on unix being a predefined macro that expands to 1. (Depending on the compiler, you can probably address that by compiling with -Dunix.)

Commenter Sebastian wrote:

Hmm... Whenever I try to evaluate the first arg to printf in my head, I get "21%six", but not "%six" as I would expect. Can anyone enlighten me where it went wrong?

You missed a couple of things. The format string starts with \021, an octal escape that expands (well, contracts) to a single character with the value 21 octal or 17 decimal. (The \0 by itself doesn't expand to a null character, though it would if it were followed by something other than another octal digit.) The \012 expands to character 10, which on most systems is the same as \n; probably \021 was chosen for symmetry with \012. The value \021 doesn't matter, because it's skipped.

Remember that the array indexing operator is commutative, as discussed here, and that unix (for some compilers in some modes) expands to 1. So the first argument to printf:

&unix["\021%six\012\0"]

which is equivalent to:

&"\021%six\012\0"[1]

That's a string literal indexed by 1, which refers to the second character of the string, the %. Taking the address of that character gives us a string pointer (note: a pointer to a string is by definition a pointer to the string's first character) pointing to a string with the value "%six\012\0", or, equivalently, "%six\n".

So the format string is "%six\n".

The second argument is:

(unix)["have"]+"fun"-0x60)

which, once you realize unix expands to 1 and indexing is commutative and that the ASCII value of 'a' is 0x61, is equivalent to the string "un". (I might go into more detail on this later.)

Taking all this into account, the printf call is equivalent to this:

printf("%six\n", "un");

and therefore to:

printf("unix\n");
@mcornella
Copy link

If you want to compile the original source you can include and define from the command line:
gcc -include "stdio.h" -Dunix main.c -o main.exe

Also this works too and it has unix cats :)

main() { printf(&unix["\021%six\012\0"],(unix)["cats"]+"run"-0x60);}

Copy link

ghost commented Jan 10, 2017

cool

@josephcsible
Copy link

josephcsible commented Oct 13, 2018

There's actually a bit of undefined behavior in this program. Consider the process of evaluating the second argument. After evaluating (unix)["have"] to 'a', you're left with 'a'+"fun"-0x60. By order of operations, this is evaluted as ('a'+"fun")-0x60. "fun" is a char array of size 4 ({'f', 'u', 'n', '\0'}), and 'a' is equal to 97. The result of the addition is a pointer that points neither into nor just beyond said array.

From the C standard:

The behavior is undefined in the following circumstances: [...] Addition or subtraction of a pointer into, or just beyond, an array object and an integer type produces a result that does not point into, or just beyond, the same array object (6.5.6).

So it doesn't matter that the subsequent subtraction of 0x60 (96) "should" result in a pointer to the 'u' in "fun" (even though it does on practically every platform), as the initial addition has already rendered the entire program undefined. Clang has a warning about this:

korn.c:1:57: warning: the pointer incremented by 97 refers past the end of the
      array (that contains 4 elements) [-Warray-bounds-pointer-arithmetic]
        main() { printf(&unix["\021%six\012\0"],(unix)["have"]+"fun"-0x60);}
                                                ~~~~~~~~~~~~~~ ^

Were the second argument to instead be -0x60+(unix)["have"]+"fun", it would then be well-defined to behave as required.

@jhudsoncedaron
Copy link

@josephcsible: It's not undefined behavior; the old platform defined general pointer comparison and arithmetic to work.

@josephcsible
Copy link

@jhudsoncedaron: What's "the old platform"? Where does it say those things are defined to work?

@jhudsoncedaron
Copy link

@josephcsible: The reference copy of the C standard library included a copy of malloc that depended on arbitrary pointer arithmetic and comparison working. I'm not exactly sure which versions I looked at anymore, but it wasn't until C was being ported off unix that weird pointer arithmetic became itself undefined.

@Keith-S-Thompson
Copy link
Author

@jhudsoncedaron "Undefined behavior" is defined by the C standard as "behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements". It means the standard doesn't define the behavior. If something else (say, a secondary standard like POSIX) happens to define the the behavior, it's still undefined behavior in the context of the C standard.

It's perfectly legitimate for code that implements the C standard library to have undefined behavior in this sense -- as long as it works correctly as part of the implementation. Code that implements malloc, for example, doesn't even have to be written in C. (If you ported that code to a platform where the pointer comparisons fail, the result would be a non-conforming implementation.)

@c-harding
Copy link

@Keith-S-Thompson The links to IOCCC are broken. Updated links: IOCCC, korn.c, hint.

@Keith-S-Thompson
Copy link
Author

I've fixed the URLs and corrected a typo or two.

@rajrana22
Copy link

Sorry, I'm not too great at pointer arithmetic. Could you explain to me why 'a'+"fun"-0x60 results in "un"?

Also, could you explain this part:

Taking the address of that character gives us a string pointer (note: a pointer to a string is by definition a pointer to the string's first character) pointing to a string with the value "%six\012\0"

I'm not sure why the address of the character % would point to that string.

@josephcsible
Copy link

Could you explain to me why 'a'+"fun"-0x60 results in "un"?

The ASCII value of 'a' is 0x61, so adding that and subtracting 0x60 is a net change of 1. Adding 1 to a pointer to a string yields a pointer to the substring without the first character.

Also, could you explain this part:

Taking the address of that character gives us a string pointer (note: a pointer to a string is by definition a pointer to the string's first character) pointing to a string with the value "%six\012\0"

I'm not sure why the address of the character % would point to that string.

The address of a character within a string is the same as the address of a substring starting at that character. Pointers to strings are represented in the exact same way as pointers to individual characters.

@rajrana22
Copy link

@josephcsible Thank you for the explanation! Makes perfect sense to me now.

@Keith-S-Thompson
Copy link
Author

Could you explain to me why 'a'+"fun"-0x60 results in "un"?

Naively, 'a' in ASCII has the value 0x61, so 'a'+"fun"-0x60 would be equivalent to "fun" + ('a' - 0x60) or, "fun + 1, which yields a pointer to the character 'u', which is also a pointer to the string "un".

If we break it down in terms of behavior defined by the C standard, it's a bit more complicated.

Pointer addition is commutative, and can be expressed either as integer + pointer or pointer + integer. The latter usually makes more sense.

'a' is the integer value 0x61 (97 decimal), so 'a'+"fun" is a pointer to a character in memory 97 bytes after the beginning of the string "fun". This goes past the bounds of the array object, so the behavior is undefined. In practice, most implementations don't check this kind of thing (they're not required to), and the result is very probably a valid address at the machine level even though it's not a valid C pointer value.

We then subtract 0x60 from that pointer value, which, if we assume a straightforward memory layout, gives us a pointer to the u.

It could fail if the string "fun" happens to be stored near the end of a memory segment, or if the compiler assumes the behavior is defined and performs some optimization that breaks if that assumption is invalid. In practice, in this particular case, it's likely to "work".

And of course the code was written before the 1989 ANSI standard was published. There was no explicit concept of "undefined behavior" at the time. The C Reference Manual in the back of K&R1 (1978) doesn't address what happens if pointer addition goes outside the bounds of an array.

@rajrana22
Copy link

rajrana22 commented Aug 2, 2022

@Keith-S-Thompson I see. I now understand what you guys meant by undefined behavior in this instance. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment