Created
June 13, 2011 17:47
-
-
Save abevoelker/1023291 to your computer and use it in GitHub Desktop.
Unicode 'property' Web app parser
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# This snippet is for parsing output from the Unicode 'property' Web app at | |
# http://unicode.org/cldr/utility/properties.html | |
# | |
# Be sure to check the 'Abbreviate' and 'UCD format' boxes, and leave | |
# 'Escape' unchecked. Copy the hex ranges to an external file, and set its | |
# path in FILE below. | |
# Last tested on 2011/06/12 | |
# | |
# Written by Abe Voelker (http://abevoelker.com) | |
# Released to the public domain | |
require 'json' #Could be replaced by active_support | |
FILE = '' #Set input file here | |
raise ArgumentError.new("You must set an input file to parse") if FILE == '' | |
#Monkeypatch the String class to load .. Ranges | |
class String | |
def to_range | |
raise ArgumentError.new("Couldn't convert to Range: #{self}") \ | |
if self.count('.') != 2 | |
elements = self.split('..') | |
return Range.new(elements[0].hex, elements[1].hex) | |
end | |
end | |
def is_range(range_str) | |
return /\.\./ =~ range_str | |
end | |
master_array = [] | |
File.foreach(FILE) do |line| | |
range_str = line.split(' ')[0] | |
if !is_range(range_str) | |
master_array.push(range_str.hex) | |
else | |
master_array.concat(range_str.to_range.to_a) | |
end | |
end | |
puts master_array.to_json |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment