Skip to content

Instantly share code, notes, and snippets.

@mikesamuel
Last active June 21, 2018 19:30
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save mikesamuel/0c13f9105e13348fceb5aadcf6a1903d to your computer and use it in GitHub Desktop.
Save mikesamuel/0c13f9105e13348fceb5aadcf6a1903d to your computer and use it in GitHub Desktop.
GH Markdown auto-identifier corner cases

Ambiguous

Ambiguous

Ambiguous

NFC Å - Å

NFKC Äffin - Äffin

NFD Å - Å

NFKD Äffin - Äffin

HTML character reference in header - abcd

123 starts with numerals

Mixed case Roman numerals - Ⅰ Ⅱ Ⅲ Ⅳ ⅰ ⅱ ⅲ ⅳ

Extra syntax # {#custom id}

ASCII - !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~��������������������������������� ¡¢£¤¥¦§¨©ª«¬­®¯°±²³´µ¶·¸¹º»¼½¾¿ÀÁÂÃÄÅÆÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖ×ØÙÚÛÜÝÞßàáâãäåæçèéêëìíîïðñòóôõö÷øùúûüýþÿ

Non-ASCII - ZWNJ=‌ Punctuation=⁂

Lower ASCII - _!_"_#_$_%_&_'_(_)_*_+_,_-_._/_

Tab_ _

Extra spaces

-dashes-


It looks like Github auto ids

  • Proceed top to bottom and use sprintf("%s-%d", id, n) when id has been previously auto-assigned where n is the smallest integer >= 1 such that that expression has not been previously seen.
  • Codepoints < '0' are dropped except for spaces and -
  • Punctuation and non-graphical codepoints >= '0' are dropped.
  • Adjacent -s are not collapsed.
  • HTML character references are decoded and encoded letters are not replaced with -.
  • {#...} extension syntax is neither recognized nor exempted.
  • IDs may start with ASCII numerals.
  • Non-ASCII letters are not lower cased.
  • No Unicode normalization is done. See below.
// Run in dev console to see what the auto-id assigner does to Normalization examples from unicode.org.
let ids = Array.from(document.querySelectorAll('*[id]')).map(x => x.id);
let idToIdentityNormalForms = {};
let normalForms = [ ['identity', x => x], ['NFC', x => x.normalize('NFC')], ['NFD', x => x.normalize('NFD')], ['NFKC', x => x.normalize('NFKC')], ['NFKD', x => x.normalize('NFKD')] ];

for (let id of ids) {
  let matches = [];
  for (let nf of normalForms) {
    if (nf[1](id) === id) { matches.push(nf[0]) }
  }
  idToIdentityNormalForms[id] = matches;
}
console.log(JSON.stringify(idToIdentityNormalForms, null, 2));

/*
produces
...
  "user-content-nfc-Å---Å": [
    "identity"
  ],
  "user-content-nfkc-Äffin---Äffin": [
    "identity",
    "NFC"
  ],
  "user-content-nfd-Å---å": [
    "identity"
  ],
  "user-content-nfkd-Äffin---äffin": [
    "identity"
  ],
...
*/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment