Skip to content

Instantly share code, notes, and snippets.

@maurelian
Last active February 4, 2022 02:56
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maurelian/5672d461cbd81d04f56a926d734f01ff to your computer and use it in GitHub Desktop.
Save maurelian/5672d461cbd81d04f56a926d734f01ff to your computer and use it in GitHub Desktop.
Nameprep/stringprep library search

Nameprep search

Issue

Cannot find a maintained implementation of name prep for the registrar package

Options

1. Use existing libraries:

Search terms used include: stringprep, nameprep, IDN, IDNA, IETF, RFC

2. Reimplement Nameprep

Tables from stringprep

  • Repertoire: Unicode3.2
  • Mapping: Tables B.1, B.2
  • Normalize: Unicode normalization form KC
  • Prohibit: Tables C.1.2, C.2.2, C.3, C.4, C.5, C.6, C.7, C.8, C.9
  • Check (bidi):This profile specifies checking bidirectional strings as described in STRINGPREP section 6

Background:

Stringprep

  • Preparing Unicode text strings in order to increase the likelihood that string input and string comparison work in ways that make sense for typical users throughout the world
  • Preparation steps:
    • Mapping
    • Normalize
    • Prohibit
    • Check (bidi) Check for bidirectional characters

Nameprep is the process of case-folding to lowercase and removal of some generally invisible code points before it is suitable to represent a domain name, or other such canonical name. It is used by IDNA, using the Unicode standard for NFKC normalization.

NFKC: Normalization Form Compatibility Composition Characters are decomposed by compatibility, then recomposed by canonical equivalence.

Internationalized domain names in applications: An internationalized domain name (IDN) is an Internet domain name that contains at least one label that is displayed in software applications, in whole or in part, in a language-specific script or alphabet, such as Arabic, Chinese, Cyrillic, Tamil, Hebrew or the Latin alphabet-based characters with diacritics or ligatures, such as French. These writing systems are encoded by computers in multi-byte Unicode. Internationalized domain names are stored in the Domain Name System as ASCII strings using Punyc¥ode transcription.

Refs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment