Skip to content

Instantly share code, notes, and snippets.

@bgrins
Last active December 18, 2023 18:57
Show Gist options
  • Save bgrins/6194623 to your computer and use it in GitHub Desktop.
Save bgrins/6194623 to your computer and use it in GitHub Desktop.
Detect if a string is a data URL. Doesn't try to parse it or determine validity, just a quick check if a string appears to be a data URL. See http://jsfiddle.net/bgrins/aZWTB/ for a demo.
// Detecting data URLs
// data URI - MDN https://developer.mozilla.org/en-US/docs/data_URIs
// The "data" URL scheme: http://tools.ietf.org/html/rfc2397
// Valid URL Characters: http://tools.ietf.org/html/rfc2396#section2
function isDataURL(s) {
return !!s.match(isDataURL.regex);
}
isDataURL.regex = /^\s*data:([a-z]+\/[a-z]+(;[a-z\-]+\=[a-z\-]+)?)?(;base64)?,[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*$/i;
var yes = [
"",
"",
"  ",
" data:,Hello%2C%20World!",
" data:,Hello World!",
" data:text/plain;base64,SGVsbG8sIFdvcmxkIQ%3D%3D",
" data:text/html,%3Ch1%3EHello%2C%20World!%3C%2Fh1%3E",
"data:,A%20brief%20note",
"data:text/html;charset=US-ASCII,%3Ch1%3EHello!%3C%2Fh1%3E"
];
var no = [
"dataxbase64",
"data:HelloWorld",
"data:text/html;charset=,%3Ch1%3EHello!%3C%2Fh1%3E",
"data:text/html;charset,%3Ch1%3EHello!%3C%2Fh1%3E", "data:base64,iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEUAAAD///+l2Z/dAAAAM0lEQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8yw83NDDeNGe4Ug9C9zwz3gVLMDA/A6P9/AFGGFyjOXZtQAAAAAElFTkSuQmCC",
"",
"http://wikipedia.org",
"base64",
"iVBORw0KGgoAAAANSUhEUgAAABAAAAAQAQMAAAAlPW0iAAAABlBMVEUAAAD///+l2Z/dAAAAM0lEQVR4nGP4/5/h/1+G/58ZDrAz3D/McH8yw83NDDeNGe4Ug9C9zwz3gVLMDA/A6P9/AFGGFyjOXZtQAAAAAElFTkSuQmCC"
];
var log = document.createElement("pre");
document.body.appendChild(log);
function printError(msg) {
var message = document.createElement("span");
message.style.color = "red";
message.textContent = msg + "\n";
log.appendChild(message);
}
function printSuccess(msg) {
var message = document.createElement("span");
message.style.color = "green";
message.textContent = msg + "\n";
log.appendChild(message);
}
yes.forEach(function(s) {
if (!isDataURL(s)) {
printError("Expected yes, got no: " + s);
}
else {
printSuccess("Expected yes, got yes: " + s);
}
});
no.forEach(function(s) {
if (isDataURL(s)) {
printError("Expected no, got yes: " + s);
}
else {
printSuccess("Expected no, got no: " + s);
}
});
dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
mediatype  := [ type "/" subtype ] *( ";" parameter )
data       := *urlchar
parameter  := attribute "=" value

where "urlchar" is imported from [RFC2396], and "type", "subtype", "attribute" and "value" are the corresponding tokens from [RFC2045], represented using URL escaped encoding of [RFC2396] as necessary.

Attribute values in [RFC2045] are allowed to be either represented as tokens or as quoted strings. However, within a "data" URL, the "quoted-string" representation would be awkward, since the quote mark is itself not a valid urlchar. For this reason, parameter values should use the URL Escaped encoding instead of quoted string if the parameter values contain any "tspecial".

The ";base64" extension is distinguishable from a content-type parameter by the fact that it doesn't have a following "=" sign.

@guag
Copy link

guag commented May 13, 2015

I like this, but just a question: any particular reason you went with String.match(regex) instead of the faster regex.test(String)? According to this stackoverflow q&a, it can be 30-60% faster to use test(), which in this situation seems like the better option since you don't need the array of results that match() provides.

Take care :)

@guag
Copy link

guag commented May 17, 2015

One other thing I noticed, this seems to fail for data URLs of types that aren't images or text, such as audio/mp3 and video/x-ms-wmv. Check out my fork of your fiddle to see what I mean: http://jsfiddle.net/guag/o1xaL3e9/

Copy link

ghost commented Jun 23, 2015

I tweaked the regex to

isDataURI.regex = /^\s*data:([a-z]+\/[a-z0-9\-]+(;[a-z\-]+\=[a-z\-]+)?)?(;base64)?,[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*$/i;

and then it handles those additional content types. (I just added 0-9\- to the part following the forward slash).

@Mottie
Copy link

Mottie commented Jan 16, 2016

The regex doesn't work with data:image/svg+xml;base64,... or data:image/svg+xml;charset=utf-8,... (demo)

Change the regex to the following to fix it (demo)

isDataURL.regex = /^\s*data:([a-z]+\/[a-z0-9\-\+]+(;[a-z\-]+\=[a-z0-9\-]+)?)?(;base64)?,[a-z0-9\!\$\&\'\,\(\)\*\+\,\;\=\-\.\_\~\:\@\/\?\%\s]*\s*$/i;

@tansongyang
Copy link

tansongyang commented Jun 28, 2016

I made a slight tweak so that the regex will accept types with a . character, like application/vnd.ms-excel. I also removed some unnecessary backslashes (\).

isDataURL.regex = /^\s*data:([a-z]+\/[a-z0-9-+.]+(;[a-z-]+=[a-z0-9-]+)?)?(;base64)?,([a-z0-9!$&',()*+;=\-._~:@\/?%\s]*)\s*$/i;

@Pamblam
Copy link

Pamblam commented Feb 3, 2017

thanks op. you da real mvp.

@ecarrera
Copy link

Thanks! The regex very useful! Works like a charm! :) @bgrins

@killmenot
Copy link

killmenot commented Feb 13, 2018

I used this solution for creating npm package valid-data-url and got an email that explained that my package is vulnerable for ReDoS exploit. I checked this one - it also does. I found a tool that helps to validate the regex for such exploits. Take a look: https://github.com/NicolaasWeideman/RegexStaticAnalysis

Back to the initial solution on the top, I recommend

  1. drop \s* from the beginning and from the end of the regex
  2. use !!s.trim().match(isDataURL.regex); to support existing functionality and fix exploit issue in the same time

Hope this helps

@S3gillu
Copy link

S3gillu commented Feb 20, 2018

Very informative regex. But I need something like this,

{"/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxQTEhUTExMVF…WG4ph55lqTIlqWXUz1QpOnUjSPboyKtOndht+T//Z":"jpg","UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlb…bGVzLnhtbFBLBQYAAAAACwALAMECAAAvKQAAAAA=": "docx","KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqK…lL2RvY3MvZ2V0dGluZy1zdGFydGVkLmh0bWwNCg==": "txt","UEsDBBQABAAIAGJivkClq/f+WgEAADMEAAATAAAAW0NvbnRlb…Y29yZS54bWxQSwUGAAAAAAoACgB8AgAAlRUAAAAA": "xlsx"}

from

{0: ""/9j/4AAQSkZJRgABAQAAAQABAAD/2wCEAAkGBxQTEhUTExMVF…WG4ph55lqTIlqWXUz1QpOnUjSPboyKtOndht+T//Z": "jpg"", 1: ""UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlb…bGVzLnhtbFBLBQYAAAAACwALAMECAAAvKQAAAAA=": "docx"", 2: ""KioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqK…lL2RvY3MvZ2V0dGluZy1zdGFydGVkLmh0bWwNCg==": "txt"", 3: ""UEsDBBQABAAIAGJivkClq/f+WgEAADMEAAATAAAAW0NvbnRlb…Y29yZS54bWxQSwUGAAAAAAoACgB8AgAAlRUAAAAA": "xlsx""}

As I am novice in this section so can you please help me out.
Thank You in advance

@khanzadimahdi
Copy link

@Mohamed-Manil
Copy link

thanks

@KOUISAmine
Copy link

Thanks, it works great, here is a demo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment