Skip to content

Instantly share code, notes, and snippets.

@thanpolas
Created June 6, 2021 09:34
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thanpolas/244d9a13151caf5a12e42208b6111aa6 to your computer and use it in GitHub Desktop.
Save thanpolas/244d9a13151caf5a12e42208b6111aa6 to your computer and use it in GitHub Desktop.
Normalize all UTF quotes in Javascript
/**
* Will normalize quotes in a given string. There are many variations of quotes
* in the unicode character set, this function attempts to convert any variation
* of quote to the standard Quotation Mark - U+0022 Standard Universal: "
*
* @param {string} str The string to normalize
* @return {string} Normalized string.
* @see https://unicode-table.com/en/sets/quotation-marks/
*/
helpers.stdQuote = (str) => {
const allQuotes = [
'“', // U+201c
'”', // U+201d
'«', // U+00AB
'»', // U+00BB
'„', // U+201E
'“', // U+201C
'‟', // U+201F
'”', // U+201D
'❝', // U+275D
'❞', // U+275E
'〝', // U+301D
'〞', // U+301E
'〟', // U+301F
'"', // U+FF02
];
const stdQuote = '"'; // U+0022
const normalized = allQuotes.reduce((strNorm, quoteChar) => {
// eslint-disable-next-line security/detect-non-literal-regexp
const re = new RegExp(quoteChar, 'g');
return strNorm.replace(re, stdQuote);
}, str);
return normalized;
};
@ndtreviv
Copy link

Great! Have you got a single-quote version that takes backticks etc into account?

@DesignByOnyx
Copy link

DesignByOnyx commented Jun 15, 2023

This gave me a good start. I also needed single quotes and dashes covered too. I made a more expansive list and made sure to use the Unicode identifiers. It is very prone to human error to just use the visual rendering, especially when it comes to the dashes.

const sanitizeChars: Array<[string, '-' | "'" | '"']> = Object.entries({
  // EN DASH / HYPHEN (U+002D)
  '\u1806': '\u002D', // '᠆'
  '\u2010': '\u002D', // '‐'
  '\u2011': '\u002D', // '‑'
  '\u2012': '\u002D', // '‒'
  '\u2013': '\u002D', // '–'
  '\uFE58': '\u002D', // '﹘'
  '\uFE63': '\u002D', // '﹣'
  '\uFF0D': '\u002D', // '-'

  // SINGLE QUOTES (U+0027)
  '\u003C': '\u0027', // '<'
  '\u003E': '\u0027', // '>'
  '\u2018': '\u0027', // '‘'
  '\u2019': '\u0027', // '’'
  '\u201A': '\u0027', // '‚'
  '\u201B': '\u0027', // '‛'
  '\u2039': '\u0027', // '‹'
  '\u203A': '\u0027', // '›'
  '\u275B': '\u0027', // '❛'
  '\u275C': '\u0027', // '❜'
  '\u276E': '\u0027', // '❮'
  '\u276F': '\u0027', // '❯'
  '\uFF07': '\u0027', // '''
  '\u300C': '\u0027', // '「'
  '\u300D': '\u0027', // '」'

  // // DOUBLE QUOTES (U+0022)
  '\u00AB': '\u0022', // '«'
  '\u00BB': '\u0022', // '»'
  '\u201C': '\u0022', // '“'
  '\u201D': '\u0022', // '”'
  '\u201E': '\u0022', // '„'
  '\u201F': '\u0022', // '‟'
  '\u275D': '\u0022', // '❝'
  '\u275E': '\u0022', // '❞'
  '\u2E42': '\u0022', // '⹂'
  '\u301D': '\u0022', // '〝'
  '\u301E': '\u0022', // '〞'
  '\u301F': '\u0022', // '〟'
  '\uFF02': '\u0022', // '"'
  '\u300E': '\u0022', // '『'
  '\u300F': '\u0022', // '』'
});

/** Normalizes non-standard quotes and dashes into a uniform format.  */
const sanitizeString = (query: string) => {
  return sanitizeChars.reduce((acc, [char, stdChar]) => {
    return acc.replaceAll(char, stdChar);
  }, query);
};

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment