Instantly share code, notes, and snippets.

# zetter/deconstruct.markdown Created Jan 31, 2012

therubygame deconstruct by @czetter

# therubygame deconstruct by @czetter

This is a deconstuction of matematikaadit's submission to therubygame challenge 5; 'Roman numerals. What are they good IV?'. The goal of the challenge is to take a string representing a roman numeral as input and return the integer that the numeral represents.

matematikaadit currently has the honour of the shortest (by character count) submission for this challenge. At first glance I didn't understand how it worked so I re-wrote and analyzed it until I did.

```def to_arabic_numeral(roman)
n=s=0;roman.bytes{|c|s+=n-2*n%n=10**(205558%c%7)%9995};s+n
end```

That's pretty unreadable to me. Lets apply some formatting:

```def to_arabic_numeral(roman)
n = s = 0
roman.bytes { |c|
s += n - 2 * n % n = 10 ** (205558 % c % 7) % 9995
}
s + n
end```

and some bracketing:

```def to_arabic_numeral(roman)
n = s = 0
roman.bytes { |c|
s += n - ((2 * n) % (n = ((10 ** (205558 % c % 7)) % 9995)))
}
s + n
end```

and introduce some variables:

```def to_arabic_numeral(roman)
n = s = 0
roman.bytes { |c|
last_n = n
n = (10 ** (205558 % c % 7)) % 9995
i = last_n - ((2 * last_n) % n)
s += i
};
s + n
end```

and some better named variables:

```def to_arabic_numeral(roman)
value = sum = 0
roman.bytes { |char_code|
last_value = value
value = (10 ** (205558 % char_code % 7)) % 9995
increment = last_value - ((2 * last_value) % value)
sum += increment
}
sum + value
end```

```def to_arabic_numeral(roman)
value = sum = 0
roman.bytes { |char_code|
last_value = value
value = (10 ** (205558 % char_code % 7)) % 9995
increment = last_value - ((2 * last_value) % value)
sum += increment

puts [
"char:#{char_code.chr}",
"char_code:#{char_code}",
"value:#{value.to_s.ljust(4)}",
"last_value:#{last_value.to_s.ljust(4)}",
"increment:#{increment.to_s.ljust(4)}",
"sum:#{sum}"
].join('  ')
}
sum + value
end

to_arabic_numeral('MCMXCIX') #=> 1999```

When run this prints:

``````char:M  char_code:77  value:1000  last_value:0     increment:0     sum:0
char:C  char_code:67  value:100   last_value:1000  increment:1000  sum:1000
char:M  char_code:77  value:1000  last_value:100   increment:-100  sum:900
char:X  char_code:88  value:10    last_value:1000  increment:1000  sum:1900
char:C  char_code:67  value:100   last_value:10    increment:-10   sum:1890
char:I  char_code:73  value:1     last_value:100   increment:100   sum:1990
char:X  char_code:88  value:10    last_value:1     increment:-1    sum:1989
``````

We see that the `sum` is always one iteration behind and the last value is added to `sum` after the loop is finished.

There's two separate complicated lines here; the conversion of `char_code` into `value` and the calculation of `increment`.

### Converting from numeral to integer

First lets look at the calculation of `value` which converts the ascii character code of a numeral to the integer the numeral represents:

`value = (10 ** (205558 % char_code % 7)) % 9995 `

Lets wrap the to a function that takes a character:

```def roman_numeral_to_integer(char)
char_code = char.ord
(10 ** ((205558 % char_code) % 7)) % 9995
end```

remind ourselves of the expected mappings: I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000

and see if it works:

```roman_numeral_to_integer('I') # => 1
roman_numeral_to_integer('C') # => 100

roman_numeral_to_integer('i') # => 1000
roman_numeral_to_integer('Z') # => 5```

It appears to work fine for any of the roman numerals and is undefined for other characters. Lets plug some more values into it:

```numerals = "IVXLCDM"
('A'..'Z').each { |c|
value = roman_numeral_to_integer(c)
puts "char:#{c} value:#{value.to_s.ljust(4)}  #{'*NUMERAL*' if numerals.include?(c) }"
}```

Which prints:

``````char:A  value:1
char:B  value:500
char:C  value:100   *NUMERAL*
char:D  value:500   *NUMERAL*
char:E  value:1
char:F  value:1000
char:G  value:500
char:H  value:1
char:I  value:1     *NUMERAL*
char:J  value:5
char:K  value:100
char:L  value:50    *NUMERAL*
char:M  value:1000  *NUMERAL*
char:N  value:1
char:O  value:1
char:P  value:1000
char:Q  value:50
char:R  value:1000
char:S  value:10
char:T  value:1000
char:U  value:1
char:V  value:5     *NUMERAL*
char:W  value:10
char:X  value:10    *NUMERAL*
char:Y  value:10
char:Z  value:5
``````

There's no simple pattern to exploit in the relationship between ascii character codes and the distribution or magnitude of roman numerals. Instead `roman_numeral_to_integer` is a cleverly constructed function that fits all the points on the above graph, mapping all the roman numerals to integers. A lookup hash is certainly more easily constructed, read, and maintainable but this wins on cleverness and character count.

For more about a similar 'Magic Formula' for the same purpose, and how you could construct it using brute force methods see Golf: Magic Formula for Roman Numerals.

If you want to play around with this function try Wolfram Alpha: `plot (10 ^ ((205558 mod floor(x)) mod 7)) mod 9995, x=65 to 90`

### Calculating the increment

Now lets look at `increment`. Increment is the value that should be added to `sum`. Because the sum calculation is one iteration behind it is `last_value` and not `value` that should be added.

`increment = last_value - ((2 * last_value) % value)`

When a smaller numeral is before a larger one it should be subtracted rather than added to the total so the increment will be `-last_value` when `last_value < value`, and else will just be `last_value`

The code works because, for the natural numbers `x` and `y`:

``````x % y ≡ x, when x < y
x % y ≡ 0, when x >= y and y is a factor of x
``````

Note that for any numeral value `x` all smaller numeral values are factors. For example, L (50) is greater than I (1), V (5), X (10) and all of these are factors.

So when `last_value >= value` it just added to `sum` (the right hand side of the subtraction will be `0`) and when `last_value < value` it is turned negative (the right hand side of the subtraction will be `last_value * 2`).

Going back to the original program it looks like we only save 1 character doing it this way over a ternary if:

```l-2*l%n
# vs.
n>l?-l:l```

But remember that in the original program `last_value` and `value` are both stored in the single variable `n`. The program relies on the order of evaluation to use the correct value of `n` whereas a ternary if will evaluate the conditional first so would require a new variable to be used.

### Summing up

The method has clever use of a mathematical function for mapping numerals and modulus operation to avoid an if statement. Most of this is done in a single expression and trusts the evaluation order semantics of Ruby.

### Trevoke commented Feb 1, 2012

 Does this work with UTF8?
Owner Author

### zetter commented Feb 1, 2012

 Trevoke - It will work fine for 'A-Z' in UTF8 since these are encoded using 1 byte in UTF8 as they are in ascii. It won't work for use of the UTF-8 Roman numeral characters (such as 'Ⅶ') (but nor will any other of the lookup based approaches).

### Trevoke commented Feb 1, 2012

 That's ridiculously awesome.

### asavige commented Feb 29, 2012

 The magic formula used in matematikaadit's submission was described in 2009 in node_id 761053 at perlmonks.org. I believe his submission can be further shortened by two strokes via: s=0;roman.bytes{|c|s+=(n=10*_(205558%c%7)%9995)-s%n_2};s

### asavige commented Mar 4, 2012

 Though longer, this pure-functional version is arguably more elegant: roman.bytes.map{|c|10*_(205558%c%7)%9995}.reduce(0){|t,n|t+n-t%n_2} Less pure, but a bit shorter is: roman.bytes.reduce(0){|t,c|t+(n=10*_(205558%c%7)%9995)-t%n_2}

### pmarreck commented Mar 7, 2013

 Has anyone taken a shot at to_roman_numeral from the arabic? :O