To extract the specific [<mediatype>][;base64]
portions of Data URLs — per MDN's doc — the following aspects were explored:
-
Hard-coding a specific list of allowed
<type>/<subtype>
into the expression versus hard-coding only the<type>
. -
A more complete capture of the
*( ";" parameter )
portion — per RFC2397 — returning separately the attribute-value pairs and/or the lastbase64
portions.
The recommended expression for #28614 would roughly be (pending obvious refinements if will be used):
/^(?:((?:text|application)\/(?:[A-Z][-.0-9A-Z]*)?[A-Z]+)((?:;[A-Z][!%'()*\-.0-9A-Z_~]*=[!%'()*\-.0-9A-Z_~]*)*)(;base64)?),/i;
Note: See the annotated code snippet for more details.
This is expected to work as follows:
-
matcher.exec('text/javascript,')
NOTE: Assuming
text/javascript;,
to be invalid[ // 0: valid data-uri head 'text/javascript,', // 1: mime 'text/javascript', // 2: attributes '', // 3: base64 undefined, ];
-
matcher.exec('text/javascript;base64,')
NOTE: Assuming
;base64,
andbase64,
to be invalid[ // 0: valid data-uri head 'text/javascript;base64,', // 1: mime 'text/javascript', // 2: attributes '', // 3: base64 ';base64', ];
-
matcher.exec('text/javascript;a=b;base64,')
[ // 0: valid data-uri head 'text/javascript;a=b;base64,', // 1: mime 'text/javascript', // 2: attributes ';a=b', // 3: base64 ';base64', ];