Skip to content

Instantly share code, notes, and snippets.

@spect88
Created October 23, 2015 20:03
Show Gist options
  • Save spect88/da79e44bdd2c98e326d4 to your computer and use it in GitHub Desktop.
Save spect88/da79e44bdd2c98e326d4 to your computer and use it in GitHub Desktop.
Translator of unicode regex ranges
#!/usr/bin/env ruby
# Prints what regex unicode ranges passed via STDIN actually mean
#
# Example input:
#
# [\u0021-\u0027\u002A-\u002E\u003F\u0041-\u005A\u005C\u005F-\u007A\u00AA\u00B5\u00BA\u00C0-\u00D6\u00D8-\u00F6]
#
# Output:
#
# 0021 - 0027: !"#$%&'
# 002A - 002E: *+,-.
# 0041 - 005A: ABCDEFGHIJKLMNOPQRSTUVWXYZ
# 005F - 007A: _`abcdefghijklmnopqrstuvwxyz
# 00C0 - 00D6: ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ
# 00D8 - 00F6: ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö
# 003F: ?
# 005C: \
# 00AA: ª
# 00B5: µ
# 00BA: º
def unicode_regexp_to_characters(input)
output = []
input
.gsub(/\\u(\w{4})-\\u(\w{4})/) do
chars = ($1.hex .. $2.hex).map { |code| [code].pack('U') }.join
output << "#{$1} - #{$2}: #{chars}"
''
end
.scan(/\\u(\w{4})/) do
char = [$1.hex].pack('U')
output << "#{$1}: #{char}"
end
output
end
STDIN.each_line do |line|
puts unicode_regexp_to_characters(line).join("\n")
end
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment