Skip to content

Instantly share code, notes, and snippets.

@sobataro
Created December 17, 2016 21:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save sobataro/5efdf3368fa532c80a2c51be0d626838 to your computer and use it in GitHub Desktop.
Save sobataro/5efdf3368fa532c80a2c51be0d626838 to your computer and use it in GitHub Desktop.
#! /usr/bin/env ruby
#load 'unicode.rb' # ActiveSupport::Multibyte::Unicode.unpack_graphemes
def lesser_graphemes_counter(str)
codepoints = str.unpack("U*")
variation_selectors = codepoints.count { |c| (0xFE00..0xFE0F).include?(c) }
skintone_modifiers = codepoints.count { |c| (0x1F3FB..0x1F3FF).include?(c) }
regional_indicators = codepoints.count { |c| (0x1F1E6..0x1F1FF).include?(c) }
zero_width_joiners = codepoints.count(0x200D)
enclosing_keycaps = codepoints.count(0x20E3)
codepoints.count - variation_selectors - skintone_modifiers - (regional_indicators / 2) - (zero_width_joiners * 2) - enclosing_keycaps
end
puts "RUBY_VERSION = #{RUBY_VERSION}"
[
[0x31, 0xFE0F, 0x20E3], # Keycap Digit One 1️⃣
[0x1F44F, 0x1F3FD], # Clapping Hands Sign, Type-4 πŸ‘πŸ½
[0x1F468, 0x200D, 0x1F468, 0x200D, 0x1F466], # Family: Man, Man, Boy πŸ‘¨β€πŸ‘¨β€πŸ‘¦
[0x1F1EF, 0x1F1F5], # πŸ‡―πŸ‡΅
[0x1F1EF, 0x1F1F5, 0x1F1EF, 0x1F1F5, 0x1F1EF, 0x1F1F5], # πŸ‡―πŸ‡΅πŸ‡―πŸ‡΅πŸ‡―πŸ‡΅
].map { |e| e.pack("U*") }.map do |emoji|
puts "\n"
puts "\"#{emoji}\".length = #{emoji.length}"
puts "\"#{emoji}\".codepoints.count = #{emoji.codepoints.count}"
puts "\"#{emoji}\".scan(/\\X/).count = #{emoji.scan(/\X/).count}"
# puts "ActiveSupport::Multibyte::Unicode.unpack_graphemes(\"#{emoji}\").count = #{ActiveSupport::Multibyte::Unicode.unpack_graphemes(emoji).count}"
puts "lesser_graphemes_counter(\"#{emoji}\") = #{lesser_graphemes_counter(emoji)}"
end
@matt17r
Copy link

matt17r commented Apr 7, 2022

For anyone else coming across this, from Ruby 2.5.x or later this is a lot simpler with str.grapheme_clusters.length:

puts "RUBY_VERSION = #{RUBY_VERSION}"

[
  [0x31,    0xFE0F,  0x20E3],                             # Keycap Digit One 1️⃣
  [0x1F44F, 0x1F3FD],                                     # Clapping Hands Sign, Type-4 πŸ‘πŸ½
  [0x1F468, 0x200D,  0x1F468, 0x200D,  0x1F466],          # Family: Man, Man, Boy πŸ‘¨β€πŸ‘¨β€πŸ‘¦
  [0x1F1EF, 0x1F1F5],                                     # πŸ‡―πŸ‡΅
  [0x1F1EF, 0x1F1F5, 0x1F1EF, 0x1F1F5, 0x1F1EF, 0x1F1F5], # πŸ‡―πŸ‡΅πŸ‡―πŸ‡΅πŸ‡―πŸ‡΅
].map { |e| e.pack("U*") }.map do |emoji|
  puts "\n"
  puts "\"#{emoji}\".length = #{emoji.length}"
  puts "\"#{emoji}\".codepoints.count = #{emoji.codepoints.count}"
  puts "\"#{emoji}\".scan(/\\X/).count = #{emoji.scan(/\X/).count}"
  puts "\"#{emoji}\".grapheme_clusters.length = #{emoji.grapheme_clusters.length}"
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment