Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
therubygame deconstruct by @czetter

therubygame deconstruct by @czetter

This is a deconstuction of matematikaadit's submission to therubygame challenge 5; 'Roman numerals. What are they good IV?'. The goal of the challenge is to take a string representing a roman numeral as input and return the integer that the numeral represents.

matematikaadit currently has the honour of the shortest (by character count) submission for this challenge. At first glance I didn't understand how it worked so I re-wrote and analyzed it until I did.

Making it readable

matematikaadit's original submission:

def to_arabic_numeral(roman)

That's pretty unreadable to me. Lets apply some formatting:

def to_arabic_numeral(roman)
  n = s = 0
  roman.bytes { |c|
    s += n - 2 * n % n = 10 ** (205558 % c % 7) % 9995
  s + n

and some bracketing:

def to_arabic_numeral(roman)
  n = s = 0
  roman.bytes { |c|
    s += n - ((2 * n) % (n = ((10 ** (205558 % c % 7)) % 9995)))
  s + n

and introduce some variables:

def to_arabic_numeral(roman)
  n = s = 0
  roman.bytes { |c|
    last_n = n
    n = (10 ** (205558 % c % 7)) % 9995 
    i = last_n - ((2 * last_n) % n)
    s += i
  s + n

and some better named variables:

def to_arabic_numeral(roman)
  value = sum = 0
  roman.bytes { |char_code|
    last_value = value
    value = (10 ** (205558 % char_code % 7)) % 9995 
    increment = last_value - ((2 * last_value) % value)
    sum += increment
  sum + value

and lastly add some logging:

def to_arabic_numeral(roman)
  value = sum = 0
  roman.bytes { |char_code|
    last_value = value
    value = (10 ** (205558 % char_code % 7)) % 9995 
    increment = last_value - ((2 * last_value) % value)
    sum += increment
    puts [
         ].join('  ')
  sum + value

to_arabic_numeral('MCMXCIX') #=> 1999

When run this prints:

char:M  char_code:77  value:1000  last_value:0     increment:0     sum:0
char:C  char_code:67  value:100   last_value:1000  increment:1000  sum:1000
char:M  char_code:77  value:1000  last_value:100   increment:-100  sum:900
char:X  char_code:88  value:10    last_value:1000  increment:1000  sum:1900
char:C  char_code:67  value:100   last_value:10    increment:-10   sum:1890
char:I  char_code:73  value:1     last_value:100   increment:100   sum:1990
char:X  char_code:88  value:10    last_value:1     increment:-1    sum:1989

We see that the sum is always one iteration behind and the last value is added to sum after the loop is finished.

There's two separate complicated lines here; the conversion of char_code into value and the calculation of increment.

Converting from numeral to integer

First lets look at the calculation of value which converts the ascii character code of a numeral to the integer the numeral represents:

value = (10 ** (205558 % char_code % 7)) % 9995 

Lets wrap the to a function that takes a character:

def roman_numeral_to_integer(char)
  char_code = char.ord
  (10 ** ((205558 % char_code) % 7)) % 9995

remind ourselves of the expected mappings: I = 1, V = 5, X = 10, L = 50, C = 100, D = 500, M = 1000

and see if it works:

roman_numeral_to_integer('I') # => 1
roman_numeral_to_integer('C') # => 100

roman_numeral_to_integer('i') # => 1000
roman_numeral_to_integer('Z') # => 5

It appears to work fine for any of the roman numerals and is undefined for other characters. Lets plug some more values into it:

numerals = "IVXLCDM"
('A'..'Z').each { |c| 
  value = roman_numeral_to_integer(c)
  puts "char:#{c} value:#{value.to_s.ljust(4)}  #{'*NUMERAL*' if numerals.include?(c) }"

Which prints:

char:A  value:1     
char:B  value:500   
char:C  value:100   *NUMERAL*
char:D  value:500   *NUMERAL*
char:E  value:1     
char:F  value:1000  
char:G  value:500   
char:H  value:1     
char:I  value:1     *NUMERAL*
char:J  value:5     
char:K  value:100   
char:L  value:50    *NUMERAL*
char:M  value:1000  *NUMERAL*
char:N  value:1     
char:O  value:1     
char:P  value:1000  
char:Q  value:50    
char:R  value:1000  
char:S  value:10    
char:T  value:1000  
char:U  value:1     
char:V  value:5     *NUMERAL*
char:W  value:10    
char:X  value:10    *NUMERAL*
char:Y  value:10    
char:Z  value:5

There's no simple pattern to exploit in the relationship between ascii character codes and the distribution or magnitude of roman numerals.

Numeral value vs. ascii code

Instead roman_numeral_to_integer is a cleverly constructed function that fits all the points on the above graph, mapping all the roman numerals to integers. A lookup hash is certainly more easily constructed, read, and maintainable but this wins on cleverness and character count.

For more about a similar 'Magic Formula' for the same purpose, and how you could construct it using brute force methods see Golf: Magic Formula for Roman Numerals.

If you want to play around with this function try Wolfram Alpha: plot (10 ^ ((205558 mod floor(x)) mod 7)) mod 9995, x=65 to 90

Calculating the increment

Now lets look at increment. Increment is the value that should be added to sum. Because the sum calculation is one iteration behind it is last_value and not value that should be added.

increment = last_value - ((2 * last_value) % value)

When a smaller numeral is before a larger one it should be subtracted rather than added to the total so the increment will be -last_value when last_value < value, and else will just be last_value

The code works because, for the natural numbers x and y:

x % y ≡ x, when x < y 
x % y ≡ 0, when x >= y and y is a factor of x

Note that for any numeral value x all smaller numeral values are factors. For example, L (50) is greater than I (1), V (5), X (10) and all of these are factors.

So when last_value >= value it just added to sum (the right hand side of the subtraction will be 0) and when last_value < value it is turned negative (the right hand side of the subtraction will be last_value * 2).

Going back to the original program it looks like we only save 1 character doing it this way over a ternary if:

# vs.

But remember that in the original program last_value and value are both stored in the single variable n. The program relies on the order of evaluation to use the correct value of n whereas a ternary if will evaluate the conditional first so would require a new variable to be used.

Summing up

The method has clever use of a mathematical function for mapping numerals and modulus operation to avoid an if statement. Most of this is done in a single expression and trusts the evaluation order semantics of Ruby.

Trevoke commented Feb 1, 2012

Does this work with UTF8?


zetter commented Feb 1, 2012

Trevoke - It will work fine for 'A-Z' in UTF8 since these are encoded using 1 byte in UTF8 as they are in ascii. It won't work for use of the UTF-8 Roman numeral characters (such as 'Ⅶ') (but nor will any other of the lookup based approaches).

Trevoke commented Feb 1, 2012

That's ridiculously awesome.

asavige commented Feb 29, 2012

The magic formula used in matematikaadit's submission was described in 2009 in node_id 761053 at I believe his submission can be further shortened by two strokes via:


asavige commented Mar 4, 2012

Though longer, this pure-functional version is arguably more elegant:{|c|10*_(205558%c%7)%9995}.reduce(0){|t,n|t+n-t%n_2}

Less pure, but a bit shorter is:


pmarreck commented Mar 7, 2013

Has anyone taken a shot at to_roman_numeral from the arabic? :O

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment