Skip to content

Instantly share code, notes, and snippets.

@curtisz
Created November 9, 2015 23:07
Show Gist options
  • Save curtisz/11139b2cfcaef4a261e0 to your computer and use it in GitHub Desktop.
Save curtisz/11139b2cfcaef4a261e0 to your computer and use it in GitHub Desktop.
RFC 3986 URL Parsing Regular Expression (JavaScript)
/* ***********************************************************************************
Hero authors of RFC 3986 (http://www.ietf.org/rfc/rfc3986.txt) gave us this regex
for parsing (well-formed) URLs into their constituent pieces:
^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\?([^#]*))?(#(.*))?
Which for the following URL:
http://www.ics.uci.edu/pub/ietf/uri/#Related
Yields the following subexpression matches:
$1 = http:
$2 = http
$3 = //www.ics.uci.edu
$4 = www.ics.uci.edu
$5 = /pub/ietf/uri/
$6 = <undefined>
$7 = <undefined>
$8 = #Related
$9 = Related
where <undefined> indicates that the component is not present, as is
the case for the query component in the above example. Therefore, we
can determine the value of the five components as
scheme = $2
authority = $4
path = $5
query = $7
fragment = $9
*********************************************************************************** */
var parseURL = function( url ) {
var regex = RegExp("^(([^:/?#]+):)?(//([^/?#]*))?([^?#]*)(\\?([^#]*))?(#(.*))?");
var matches = url.match(regex);
return {
scheme: matches[2],
authority: matches[4],
path: matches[5],
query: matches[7],
fragment: matches[9]
};
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment