Skip to content

Instantly share code, notes, and snippets.

@abevoelker
Created June 13, 2011 17:47
Show Gist options
  • Save abevoelker/1023291 to your computer and use it in GitHub Desktop.
Save abevoelker/1023291 to your computer and use it in GitHub Desktop.
Unicode 'property' Web app parser
# This snippet is for parsing output from the Unicode 'property' Web app at
# http://unicode.org/cldr/utility/properties.html
#
# Be sure to check the 'Abbreviate' and 'UCD format' boxes, and leave
# 'Escape' unchecked. Copy the hex ranges to an external file, and set its
# path in FILE below.
# Last tested on 2011/06/12
#
# Written by Abe Voelker (http://abevoelker.com)
# Released to the public domain
require 'json' #Could be replaced by active_support
FILE = '' #Set input file here
raise ArgumentError.new("You must set an input file to parse") if FILE == ''
#Monkeypatch the String class to load .. Ranges
class String
def to_range
raise ArgumentError.new("Couldn't convert to Range: #{self}") \
if self.count('.') != 2
elements = self.split('..')
return Range.new(elements[0].hex, elements[1].hex)
end
end
def is_range(range_str)
return /\.\./ =~ range_str
end
master_array = []
File.foreach(FILE) do |line|
range_str = line.split(' ')[0]
if !is_range(range_str)
master_array.push(range_str.hex)
else
master_array.concat(range_str.to_range.to_a)
end
end
puts master_array.to_json
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment