Created
January 2, 2014 07:48
-
-
Save dogweather/8216128 to your computer and use it in GitHub Desktop.
A simple English language number parser. How's the coding style? How would I extend this for a broader range of natural language number expressions?
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# | |
# Parse textual numbers into actual numbers | |
# | |
class NumberParser | |
ATOMS = { | |
'zero' => 0, | |
'one' => 1, | |
'two' => 2, | |
'three' => 3, | |
'four' => 4, | |
'five' => 5, | |
'six' => 6, | |
'seven' => 7, | |
'eight' => 8, | |
'nine' => 9, | |
'ten' => 10, | |
'eleven' => 11, | |
'twelve' => 12, | |
'thirteen' => 13, | |
'fourteen' => 14, | |
'fifteen' => 15, | |
'sixteen' => 16, | |
'seventeen' => 17, | |
'eighteen' => 18, | |
'nineteen' => 19, | |
'twenty' => 20, | |
'thirty' => 30, | |
'forty' => 40, | |
'fifty' => 50, | |
'sixty' => 60, | |
'seventy' => 70, | |
'eighty' => 80, | |
'ninety' => 90 | |
} | |
# | |
# Return true if the text is a number. | |
# | |
def number?(possible_number) | |
begin | |
parse(possible_number) | |
return true | |
rescue | |
return false | |
end | |
end | |
# | |
# Convert text into a number. E.g., | |
# "forty-one" -> 41 | |
# | |
def parse(text) | |
# Clean up the input | |
phrase = text.downcase.strip | |
# Handle the easy case: it's an atom. | |
result = ATOMS[phrase] | |
unless result.nil? | |
return result | |
end | |
# Compounds are e.g. "forty-one" | |
result = handle_compound(phrase) | |
unless result.nil? | |
return result | |
end | |
# Handle a hundred-based number. | |
# E.g., "five hundred eleven" | |
result = handle_hundred_based(phrase) | |
unless result.nil? | |
return result | |
end | |
raise "Couldn't parse the number: \"#{text}\"" | |
end | |
private | |
def handle_hundred_based(tok) | |
if tok =~ /^([a-z\-]+) hundred ([a-z\-]+)$/ | |
return parse($1) * 100 + parse($2) | |
end | |
if tok =~ /^([a-z\-]+) hundred$/ | |
return parse($1) * 100 | |
end | |
return nil | |
end | |
# | |
# Convert a compound phrase into | |
# a number or return nil if I | |
# can't. | |
# | |
# E.g. "forty-five" -> 45 | |
# | |
def handle_compound(tok) | |
if tok =~ /^([a-z]+)-([a-z]+)$/ | |
return ATOMS[$1] + ATOMS[$2] | |
else | |
return nil | |
end | |
end | |
end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment