Encapsulates found regexp, strings, comments, keywords, predefined objects, numbers, brackets and operators into t-tags with a matching class.
Requires a RegExp to match regexp, strings, comments, keywords, predefined objects, numbers, brackets and operators, e.g.:
var re = /(?![\d\w]\s*)(\/[^\/\*][^\n\/]*\/[gi])|(".*?"|'.*?')|(\/\/.*?\n|\/\*[\x00-\xff\u00\uffff]*?\*\/)|(?:\b)(abstract|boolean|break|byte|case|catch|char|class|const|continue|debugger|default|delete|do|double|else|enum|export|extends|false|final|finally|float|for|function|goto|if|implements|import|in|instanceof|int|interface|long|native|new|null|package|private|protected|public|return|short|static|super|switch|synchronized|this|throw|throws|transient|true|try|typeof|var|void|volatile|while|with)(?:\b)|(?:\b)(Array|Boolean|Date|Function|Math|Number|Object|RegExp|String|document|window|arguments)(?:\b)|(\d[\d\.eE]*)|([\x28-\x2b\x2d\x3a-\x3f\x5b\x5d\x5e\x7b-\x7e]+|\x2f|(?=\D)\.(?=\D))/g;
Provides a filter inserting t-tags* with the following classNames:
- f-1 = regexp
- f1 = string
- f2 = comment
- f3 = keyword
- f4 = predefined object
- f5 = number
- f6 = operator, bracket
remember to use innerText/firstChild.data instead of innerHTML to avoid its ability to convert HTML entities which cannot be matched here. "&" needs to be escaped beforehand, otherwise will be transformed on html reinsertion.
* IE6-8 need a js shim to allow for the non-standard tag:
document.createElement('t');
This was created with the 140byt.es homepage in mind, too :-)
Here is a simple example where both your and mine approach fails.
Everything between the two slashes is highlighted as a regular expression. I'm not sure how to fix this.
Edit 1: I think the only possibility is to use a matching parenthesis instead of a lookbehind, e.g.
([^\s\w)]\s*)(...)
. The problem is, you will need to prepend this part of the match to the result.Edit 2: While commenting @jed's enlink I learned a new trick. Replace
(?!...)
with\B
. This will not solve all issues, but it helps in cases where the slash is preceded by a word character.\B
checks if there is no word boundary. A similar trick for matching dots outside of numbers is not possible, unfortunately.