Skip to content

Instantly share code, notes, and snippets.

@23Skidoo
Created October 22, 2012 11:14
Show Gist options
  • Save 23Skidoo/3931049 to your computer and use it in GitHub Desktop.
Save 23Skidoo/3931049 to your computer and use it in GitHub Desktop.
Don Stewart's 2004 Obfuscated Haskell Contest entry updated to work with GHC 7.4.2
-- This is the original README from here: http://www.haskell.org/haskellwiki/Obfuscation
How does it work?
-----------------
* unsafeCoerce# in GHC 6.3
* indexWordArray#
* behaviour of a [Int] optimisation in GHC
More detail
-----------
We use unsafeCoerce# and indexWordArray# to walk the heap looking for
values. Happily, GHC optimises [Int] such that we can directly index
the list without dereferencing any pointers...
The full story
--------------
Firstly, some rewriting has been applied to the code to make it
syntactically obfuscated, by replacing keywords with lambda
abstractions, and renaming values as symbols. Without this
obfuscation, the code is:
module Main where
import GHC.Base
main = print ( tail ( ( \x -> map ( \y ->
C# ( unsafeCoerce# ( indexWordArray#
( unsafeCoerce# y ) (4#) ))) x) (
[106,117,115,116,32,97,32,115,105,109,112,108,101,
32,102,117,110,99,116,105,111,110,97,108,32,108,
97,110,103,117,97,103,101, 0x11 ] :: [Int]
)))
The final output chars are firstly ord'ed to their ascii value.
Crucially, GHC optimises some Int lists, containing only small values,
into consecutive unboxed cells in the heap. Each cell appears at
consecutive slots in the heap 5 words apart, starting 5 words in from
the beginning of the list. We depend on this behaviour ;)
Using -keep-hc-files we can see what code GHC generates:
> Hp[-170]=(W_)(17);
SET_HDR_(Hp-169,(P_)&GHCziBase_ZC_con_info,0,0);
Hp[-168]=(W_)(Hp-171);
Hp[-167]=(W_)((P_)&GHCziBase_ZMZN_closure);
SET_HDR_(Hp-166,(P_)&GHCziBase_Izh_con_info,0,0);
> Hp[-165]=(W_)(101);
SET_HDR_(Hp-164,(P_)&GHCziBase_ZC_con_info,0,0);
Hp[-163]=(W_)(Hp-166);
Hp[-162]=(W_)(Hp-169);
SET_HDR_(Hp-161,(P_)&GHCziBase_Izh_con_info,0,0);
> Hp[-160]=(W_)(103);
SET_HDR_(Hp-159,(P_)&GHCziBase_ZC_con_info,0,0);
Hp[-158]=(W_)(Hp-161);
Hp[-157]=(W_)(Hp-164);
SET_HDR_(Hp-156,(P_)&GHCziBase_Izh_con_info,0,0); > Hp[-155]=(W_)(97);
SET_HDR_(Hp-154,(P_)&GHCziBase_ZC_con_info,0,0);
Hp[-153]=(W_)(Hp-156);
Hp[-152]=(W_)(Hp-159);
SET_HDR_(Hp-151,(P_)&GHCziBase_Izh_con_info,0,0);
> Hp[-150]=(W_)(117);
So our Int fields (i.e, (W_)(117)) have been slotted in at Hp[-5],
Hp[-10] .. Hp[-170]. Note that the first element in the list starts at
Hp[-5], not Hp[0]. So Hp[0] will actually contain meta data, and is
not a value we want. We have to discard this value later using tail.
Strangely, this behaviour only occurs if you annotate the whole list
as :: [Int]. If you instead constrain one of the values of the list to
type Int, the optimisation won't occur, and you'll just get a list of
pointers. So GHC treates [Int], where the values are small, magically.
So, in the heap is a list of Int cells, with their Int# fields
unpacked. We then map unsafeCoerce# over the list, coercing each cell
to a ByteArray#. This lets us treat each 5 word cell (including STG
machine data) as a raw byte array, letting us roam around and inspect
the cell. We can use unsafeCoerce# to ignore the type system, and
worry instead about low-level representations and core dumps. Fun! ;)
Also, it appears from GHC 6.2 to 6.3 the typing of unsafeCoerce
changed to allow coercion of unboxed values, which we depend upon. So
our code only typechecks with 6.3. With 6.2 or 6.0.1, we get kind
errors:
Couldn't match `#' against `*'
When matching types `Char#' and `b'
Expected type: Char#
Inferred type: b
So what we want to do is extract the Int# field manually. So we index
the 5th word of the cell using indexWordArray#. Thanks to GHC's
unpacking, this field is available, otherwise there would be a level
of indirection. Pointers make this unsafeCoerce/byteArray trick much
harder.
We then apply unsafeCoerce# to force the raw Int# field to Char#. This
works because Char# are the same size as Word#/Int# in GHC. We can
then box up the raw field with the C# constructor, generating a Char.
Our map thus generates a String.
However, there is a bogus char at the front (the metadata at Hp[0]),
and there needs to be an extra garbage field at the end of the list,
otherwise the map truncates our 'string' in the heap, losing the last
cell. This is marked by the 0x11 element at the tail of the list.
(0x11 is the smallest value that is still unpacked directly into the
heap, smaller values are turned into pointers to magic closures).
So we drop the first garbage char with tail, giving us the original
list, which we can then print out. The quote itself is SPJ's remark
about the simplicity of the GHC Core language.
-- Don
{-# LANGUAGE MagicHash #-}
module Main where
import GHC.Base
main = let l =
[106,117,115,116,32,97,32,115,105,109,112,108,101,
32,102,117,110,99,116,105,111,110,97,108,32,108,
97,110,103,117,97,103,101,0x11] :: [Int]
in
print $ tail (map (\v -> (C# (unsafeCoerce#
(indexWordArray# (unsafeCoerce# (plusAddr# (unsafeCoerce# v) 3#)) 3#)))) l)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment