Skip to content

Instantly share code, notes, and snippets.

@jdunck
Created September 28, 2016 20:14
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jdunck/fb1aa58808f011aa25c3f4f1d73ff201 to your computer and use it in GitHub Desktop.
Save jdunck/fb1aa58808f011aa25c3f4f1d73ff201 to your computer and use it in GitHub Desktop.
Handmade CJK coverage util
CJK Radical Supplement ,2E80-2EFF
Kangxi Radicals ,2F00-2FDF
CJK symbols and punctuation ,3000-303F
Hiragana ,3040-309F
Katakana ,30A0-30FF
CJK strokes ,31C0-31EF
Katakana Common ,31F0-31FF
,3200-33FF
CJK compatibility ,3300-33FF
CJK Unified Ideographs ,4E00-9FFF
CJK compatibility ,F900-FAFF
,FE30-FE4F
Katakana halfwidth ,FF00-FFEF
Kana Supplement ,1B000-1B0FF
$ ls -1 font-repo/
HelveticaNeue.ttf
LucidaGrande.ttc
meiryo.ttc
ヒラギノ丸ゴ ProN W4.ttc
ヒラギノ角ゴシック W8.ttc
$ python handmade-cjk.py
num desired: 23216
num unsupported: 6640
extras (supported by not desired): 6563
# -*- coding: utf-8 -*-
from __future__ import unicode_literals
import os
from itertools import chain
from fontTools.ttLib import TTFont
def hex_range_to_dec(hex_range):
lower, upper = map(lambda o: int(o, 16), hex_range.split('-'))
# hex ranges are inclusive, and python's range end is exclusive, so +1
return range(lower, upper+1)
def desired_chars(range_lines):
needed = set()
for line in range_lines:
needed |= set(hex_range_to_dec(line))
return needed
def supported_by(font_list):
provided = set()
for font_name in font_list:
ttf = TTFont(font_name, fontNumber=0)
provided |= set(chain.from_iterable([charCode for charCode in table.cmap.keys()] for table in ttf["cmap"].tables))
return provided
if __name__ == '__main__':
with open('handmade-cjk.txt', 'r') as f:
desired_ranges = [line[:-1].rsplit(',', 1)[1] for line in f.readlines()]
desired = desired_chars(desired_ranges)
repo_path = './font-repo/'
files = [os.path.join(repo_path, fn) for fn in os.listdir(repo_path)]
supported = supported_by(files)
unsupported = desired - supported
extra = supported - desired
print "num desired: %s" % len(desired)
print "num unsupported: %s" % len(unsupported)
print "extras (supported by not desired): %s" % len(extra)
@jdunck
Copy link
Author

jdunck commented Sep 28, 2016

Strictly in "works for me" territory, but:
Requires fontTools, assumes your font repo dir is ./font-repo, assumes hex ranges are inclusive and in a specific file format.

@prestoncmatterport
Copy link

It looks like "num desired" is measured in bytes, and "num supported" is measured in characters

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment