Skip to content

Instantly share code, notes, and snippets.

View GRomR1's full-sized avatar

Ruslan Gainanov GRomR1

View GitHub Profile
@GRomR1
GRomR1 / regex_tokenizer.js
Last active March 18, 2016 12:47 — forked from raisch/regex_tokenizer.js
Regular Expression Sentence Tokenizer (English)
// tokenize(str)
// extracts semantically useful tokens from a string containing English-language sentences
// @param {String} the string to tokenize
// @returns {Array} containing extracted tokens
function tokenize(str) {
var punct = '\\[' + '\\!' + '\\"' + '\\#' + '\\$' + // since javascript does not
'\\%' + '\\&' + '\\\'' + '\\(' + '\\)' + // support POSIX character
'\\*' + '\\+' + '\\,' + '\\\\' + '\\-' + // classes, we'll need our