Skip to content

Instantly share code, notes, and snippets.

@pklaus
Forked from starrhorne/gist:1637310
Last active January 26, 2024 18:20
Show Gist options
  • Star 17 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save pklaus/dce37521579513c574d0 to your computer and use it in GitHub Desktop.
Save pklaus/dce37521579513c574d0 to your computer and use it in GitHub Desktop.
Extracting font names from TTF/OTF files using Python and fontTools
#!/usr/bin/env python
"""
From
https://github.com/gddc/ttfquery/blob/master/ttfquery/describe.py
and
http://www.starrhorne.com/2012/01/18/how-to-extract-font-names-from-ttf-files-using-python-and-our-old-friend-the-command-line.html
ported to Python 3
"""
import sys
from fontTools import ttLib
FONT_SPECIFIER_NAME_ID = 4
FONT_SPECIFIER_FAMILY_ID = 1
def shortName( font ):
"""Get the short name from the font's names table"""
name = ""
family = ""
for record in font['name'].names:
if b'\x00' in record.string:
name_str = record.string.decode('utf-16-be')
else:
name_str = record.string.decode('utf-8')
if record.nameID == FONT_SPECIFIER_NAME_ID and not name:
name = name_str
elif record.nameID == FONT_SPECIFIER_FAMILY_ID and not family:
family = name_str
if name and family: break
return name, family
tt = ttLib.TTFont(sys.argv[1])
print("Name: %s Family: %s" % shortName(tt))
@tirsky
Copy link

tirsky commented Aug 26, 2016

Hello, I used your script, but got error:

name_str = record.string.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa9 in position 25: invalid start byte

Then I tryed this encoding (

.decode('latin-1')

)

And it works good:

b'Digitized data copyright \xa9 2010-2011, Google Corporation.'
b'Open Sans'
b'Bold'
b'Ascender - Open Sans Bold Build 100'
b'Open Sans Bold'

My system req:


$ uname -a
Linux dev 4.5.0-1-amd64 #1 SMP Debian 4.5.1-1 (2016-04-14) x86_64 GNU/Linux
$ locale
LANG=en_US.UTF-8
LANGUAGE=en_US:en
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

$ python
Python 3.5.1+ (default, Apr 17 2016, 16:14:06) 
[GCC 5.3.1 20160409] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 

@S-M-R-Sadeghi
Copy link

Hello.
I used your script, but got error:

for record in font['name'].names:
TypeError: string indices must be integers

I'm using Python 3.7.4

Please help me to solve this problem because I have many ttf files that must know their names to choose which of them needs to install.
tnx a lot.

@pklaus
Copy link
Author

pklaus commented Jul 21, 2020

You have to call the function shortName() with the object that the call to ttLib.TTFont returns. It can be found in the package fonttolls (pip install fonttools). The object is of type fontTools.ttLib.ttFont.TTFont:

tt = ttLib.TTFont("DejaVuSansMono-Bold.ttf")
print("Name: %s  Family: %s" % shortName(tt))

Name: DejaVu Sans Mono Bold Family: DejaVu Sans Mono

Hope that helps.

@S-M-R-Sadeghi
Copy link

S-M-R-Sadeghi commented Aug 26, 2020 via email

@S-M-R-Sadeghi
Copy link

S-M-R-Sadeghi commented Aug 26, 2020 via email

@SunHaozhe
Copy link

Can I use your code in my research project? Do you have a License requirement?

My research project will be published as open source as soon as it's ready.

@leoblum
Copy link

leoblum commented Oct 28, 2020

python 3.8 (will work for lower versions too). requirements: pip3 install fonttools

import os
from contextlib import redirect_stderr
from fontTools import ttLib


def font_name(font_path):
    font = ttLib.TTFont(font_path, ignoreDecompileErrors=True)
    with redirect_stderr(None):
        names = font['name'].names

    details = {}
    for x in names:
        if x.langID == 0 or x.langID == 1033:
            try:
                details[x.nameID] = x.toUnicode()
            except UnicodeDecodeError:
                details[x.nameID] = x.string.decode(errors='ignore')

    return details[4], details[1], details[2]

print(font_name('myfont.ttf'))  # ('Century Bold Italic', 'Century', 'Bold Italic') – name, family, style

@leoblum
Copy link

leoblum commented Oct 28, 2020

Can I use your code in my research project? Do you have a License requirement?

My research project will be published as open source as soon as it's ready.

you can use https://gist.github.com/pklaus/dce37521579513c574d0#gistcomment-3507444 its free

@moi15moi
Copy link

moi15moi commented May 13, 2022

You should never try to decode fontname by yourself.

Here is how you should do it:

from fontTools import ttLib
font = ttLib.TTFont(fontPath)
fontFamilyName = font['name'].getDebugName(1)
fullName= font['name'].getDebugName(4)

The number 1, 4 are nameID. If you need anything more, read this documentation about nameID: https://docs.microsoft.com/en-us/typography/opentype/spec/name#name-ids

Here is fonttools documentation about the naming table: https://fonttools.readthedocs.io/en/latest/ttLib/tables/_n_a_m_e.html

@S-M-R-Sadeghi
Copy link

S-M-R-Sadeghi commented May 22, 2022 via email

@500cm
Copy link

500cm commented Jul 10, 2022

Great, but does not work with woff2 files

@moi15moi
Copy link

Great, but does not work with woff2 files

@tsastsin Could you share your woff2 files?
I don't know much about that font format, but I just test with an woff2 file and it work.

Also, have you try the method I wrote?

@500cm
Copy link

500cm commented Jul 10, 2022

@moi15moi My mistake, sorry, the brotli installation was missing.
Now there are no problems with woff2 files.
Yes, I am using your method, thank you!

@tatarize
Copy link

MIT License.
Here's a version that doesn't even need fonttools. It's also built for speed. We do not read any unneeded bytes or parse anything that isn't directly helping us find these values. In testing this parsed about 31 files per millisecond.

def query_name(filename):
    def get_string(f, off, length):
        string = None
        try:
            location = f.tell()
            f.seek(off)
            string = f.read(length)
            f.seek(location)
            return string.decode("UTF-16BE")
        except UnicodeDecodeError:
            try:
                return string.decode("UTF8")
            except UnicodeDecodeError:
                return string

    with open(filename, "rb") as f:
        (
            sfnt_version,
            num_tables,
            search_range,
            entry_selector,
            range_shift,
        ) = struct.unpack(">LHHHH", f.read(12))

        name_table = False
        for i in range(num_tables):
            tag, checksum, offset, length = struct.unpack(">4sLLL", f.read(16))
            if tag == b"name":
                f.seek(offset)
                name_table = True
                break
        if not name_table:
            return None, None, None

        # We are now at the name table.
        table_start = f.tell()
        (
            fmt,
            count,
            strings_offset,
        ) = struct.unpack(">HHH", f.read(6))
        if fmt == 1:
            (langtag_count,) = struct.unpack(">H", f.read(2))
            for langtag_record in range(langtag_count):
                (langtag_len, langtag_offset) = struct.unpack(">HH", f.read(4))

        font_family = None
        font_subfamily = None
        font_name = None
        for record_index in range(count):
            (
                platform_id,
                platform_specific_id,
                language_id,
                name_id,
                length,
                record_offset,
            ) = struct.unpack(">HHHHHH", f.read(2 * 6))
            pos = table_start + strings_offset + record_offset
            if name_id == 1:
                font_family = get_string(f, pos, length)
            elif name_id == 2:
                font_family = get_string(f, pos, length)
            elif name_id == 4:
                font_name = get_string(f, pos, length)
            if font_family and font_subfamily and font_name:
                break
        return font_family, font_subfamily, font_name

@moi15moi
Copy link

@tatarize If you really want to decode it by yourself, you should use the right encoding.

Here is what GDI (windows) does:

    def get_name_encoding(name: NameRecord) -> Optional[str]:
        """
        Parameters:
            names (NameRecord): Name record from the naming record
        Returns:
            The cmap codepoint encoding.
            If GDI does not support the name, return None.
        """
        # From: https://github.com/MicrosoftDocs/typography-issues/issues/956#issuecomment-1205678068
        if name.platformID == 3:
            if name.platEncID == 3:
                return "cp936"
            elif name.platEncID == 4:
                if name.nameID == 2:
                    return "utf_16_be"
                else:
                    return "cp950"
            elif name.platEncID == 5:
                if name.nameID == 2:
                    return "utf_16_be"
                else:
                    return "cp949"
            else:
                return "utf_16_be"
        elif name.platformID == 1 and name.platEncID == 0:
            return "iso-8859-1"

        return None

    @staticmethod
    def get_decoded_name(name: NameRecord) -> str:
        """
        Parameters:
            names (NameRecord): Name record from the naming record
        Returns:
            The decoded name
        """

        encoding = FontParser.get_name_encoding(name)

        if name.platformID == 3 and encoding != "utf_16_be":
            # Compatibility for really old font
            name_to_decode = name.string.replace(b"\x00", b"")
        else:
            name_to_decode = name.string

        return name_to_decode.decode(encoding)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment