Skip to content

Instantly share code, notes, and snippets.

@javan

javan/emoji-squares.md

Last active Aug 29, 2015
Embed
What would you like to do?

IntegrityTest#test_"images on disk have no duplicates" [/Users/javan/Code/gemoji/test/integrity_test.rb:26]: These images share the same checksum: /emoji/unicode/25fc.png, /emoji/unicode/2b1b.png.

--

◼️ "BLACK MEDIUM SQUARE" Unicode: U+25FC U+FE0F, UTF-8: E2 97 BC EF B8 8F

⬛️ "BLACK LARGE SQUARE" Unicode: U+2B1B U+FE0F, UTF-8: E2 AC 9B EF B8 8F

(best viewed in Safari)

--

Find and confirm glyph IDs

>> ttf.cmap.unicode.first.code_map.select { |k, v| k == "25fc".to_i(16) }
=> {9724=>82}
>> ttf.postscript.glyph_for(82)
=> "u25FC"

>> ttf.cmap.unicode.first.code_map.select { |k, v| k == "2b1b".to_i(16) }
=> {11035=>163}
>> ttf.postscript.glyph_for(163)
=> "u2B1B"

Added some logging to ttfunk to inspect the offset and byte length of the PNG data

>> ttf.sbix.bitmap_data_for(82, 4)
"Offset and byte length for glyph_id 82: 7855049, 1658"

>> ttf.sbix.bitmap_data_for(163, 4)
"Offset and byte length for glyph_id 163: 8178441, 1658"

--

Glyph 82 and 163 are the only two with identical PNG data

md5s = {}

ttf.maximum_profile.num_glyphs.times do |glyph_id|
  if bitmap = ttf.sbix.bitmap_data_for(glyph_id, 4)
    digest = Digest::MD5.hexdigest(bitmap.data.read)
    md5s[digest] ||= []
    md5s[digest].push(glyph_id)
  end
end

>> md5s.keys.uniq.size
=> 845

md5s.select { |k,v| v.length > 1 }
=> {"34b981a2dd163f1cb8a453189edc446c"=>[82, 163]}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment