Skip to content

Instantly share code, notes, and snippets.

@jcheng5
Created March 23, 2011 19:45
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
Star You must be signed in to star a gist
Embed
What would you like to do?
Generate wide character detection code
#!/usr/bin/ruby
require 'pp'
# http://unicode.org/Public/UNIDATA/EastAsianWidth.txt
data = File.read('EastAsianWidth.txt')
data.gsub!(/\s*#.*/, '')
data = data.split
data.reject! {|x| x !~ /;/}
data = data.select {|x| x =~ /;[FW]/}
data = data.collect do |line|
if line =~ /^([\w]+);/
[$1.hex]
elsif line =~ /^([\w]+)\.\.([\w]+);/
$1.hex.upto($2.hex).to_a
end
end
current_range = nil
ranges = []
data.each do |range|
range.each do |codepoint|
if current_range && codepoint == (current_range[1] + 1)
current_range[1] = codepoint
else
current_range = [codepoint, codepoint]
ranges << current_range
end
end
end
ranges.each do |range|
puts "c >= 0x%04X && c <= 0x%04X ||" % range
end
puts
ranges.each do |range|
print("[\\u%04X-\\u%04X]|" % range)
end
puts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment