Skip to content

Instantly share code, notes, and snippets.

@kmckelvin
Last active August 9, 2017 14:03
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save kmckelvin/d0049b9534a6a2475af626093ac48df6 to your computer and use it in GitHub Desktop.
Save kmckelvin/d0049b9534a6a2475af626093ac48df6 to your computer and use it in GitHub Desktop.
Turn the emoji spec at http://www.unicode.org/Public/emoji/6.0/emoji-data.txt into a regular expression
const fs = require('fs');
const notCommented = str => str.trim().length && str.trim()[0] !== '#';
const notASCII = str => !str.startsWith('00');
const ranges = str => str.substr(0, str.indexOf(' '));
const isRange = str => str.indexOf('..') !== -1;
const firstRange = str => str.substr(0, str.indexOf('..'));
const secondRange = str => str.substr(str.indexOf('..') + 2);
const esc = str => `\\u{${str}}`;
const rangeToRegex = str => `[${esc(firstRange(str))}-${esc(secondRange(str))}]`;
const lineToRegex = str => esc(str);
const convert = str => isRange(str) ? rangeToRegex(str) : lineToRegex(str);
const raw = fs.readFileSync(process.argv[2]).toString();
const lines = raw.split('\n').filter(notCommented).filter(notASCII).map(ranges).map(convert);
console.log(Array.from(new Set(lines)).join('|'));
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment