Create a gist now

Instantly share code, notes, and snippets.

What would you like to do?
// Regular Expression for URL validation
// Author: Diego Perini
// Updated: 2010/12/05
// License: MIT
// Copyright (c) 2010-2013 Diego Perini (
// Permission is hereby granted, free of charge, to any person
// obtaining a copy of this software and associated documentation
// files (the "Software"), to deal in the Software without
// restriction, including without limitation the rights to use,
// copy, modify, merge, publish, distribute, sublicense, and/or sell
// copies of the Software, and to permit persons to whom the
// Software is furnished to do so, subject to the following
// conditions:
// The above copyright notice and this permission notice shall be
// included in all copies or substantial portions of the Software.
// the regular expression composed & commented
// could be easily tweaked for RFC compliance,
// it was expressly modified to fit & satisfy
// these test for an URL shortener:
// Notes on possible differences from a standard/generic validation:
// - utf-8 char class take in consideration the full Unicode range
// - TLDs have been made mandatory so single names like "localhost" fails
// - protocols have been restricted to ftp, http and https only as requested
// Changes:
// - IP address dotted notation validation, range: -
// first and last IP address of each class is considered invalid
// (since they are broadcast/network addresses)
// - Added exclusion of private, reserved and/or local networks ranges
// - Made starting path slash optional (
// - Allow a dot (.) at the end of hostnames (
// Compressed one-line versions:
// Javascript version
// /^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$/i
// PHP version
// _^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)(?:\.(?:[a-z\x{00a1}-\x{ffff}0-9]-*)*[a-z\x{00a1}-\x{ffff}0-9]+)*(?:\.(?:[a-z\x{00a1}-\x{ffff}]{2,}))\.?)(?::\d{2,5})?(?:[/?#]\S*)?$_iuS
var re_weburl = new RegExp(
"^" +
// protocol identifier
"(?:(?:https?|ftp)://)" +
// user:pass authentication
"(?:\\S+(?::\\S*)?@)?" +
"(?:" +
// IP address exclusion
// private & local networks
"(?!(?:10|127)(?:\\.\\d{1,3}){3})" +
"(?!(?:169\\.254|192\\.168)(?:\\.\\d{1,3}){2})" +
"(?!172\\.(?:1[6-9]|2\\d|3[0-1])(?:\\.\\d{1,3}){2})" +
// IP address dotted notation octets
// excludes loopback network
// excludes reserved space >=
// excludes network & broacast addresses
// (first & last IP address of each class)
"(?:[1-9]\\d?|1\\d\\d|2[01]\\d|22[0-3])" +
"(?:\\.(?:1?\\d{1,2}|2[0-4]\\d|25[0-5])){2}" +
"(?:\\.(?:[1-9]\\d?|1\\d\\d|2[0-4]\\d|25[0-4]))" +
"|" +
// host name
"(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)" +
// domain name
"(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*" +
// TLD identifier
"(?:\\.(?:[a-z\\u00a1-\\uffff]{2,}))" +
// TLD may end with dot
"\\.?" +
")" +
// port number
"(?::\\d{2,5})?" +
// resource path
"(?:[/?#]\\S*)?" +
"$", "i"

In PHP (for use with preg_match), this becomes:


Thanks for the regex Diego, I’ve added it to the test case and it seems to pass all the tests :) Nice job!


dperini commented Dec 6, 2010

I have added simple network ranges validation, the rules I used are:
- valid range -, network adresses above and including are reserved addresses
- first and last IP address of each class is excluded since they are used as network broadcast addresses
since I don't think this is worth implementing completely in a regular expression, a following pass should exclude the Intranet address space: - - -
the loopback and the automatic configuration address space: - -
while the local, multicast and and the reserved address spaces: - (SPECIAL-IPV4-LOCAL-ID-IANA-RESERVED) - 239.255.255 (MCAST-NET) - (SPECIAL-IPV4-FUTURE-USE-IANA-RESERVED)
should already be excluded by the above regular expression.

This a very minimal list of tests to add to your testings:



Need testing :)


dperini commented Dec 6, 2010

Need to mention I took the idea of validating the possible IP address ranges in the URL while looking at other developers regular expressions I have seen in your tests, especially the one from @scottgonzales. He also sliced up the Unicode ranges :=), that's the reason his one is so long :)

jgornick commented Dec 6, 2010

Awesome stuff Diego!!


dperini commented Dec 6, 2010

Added IP address validation tweaking and optimizations suggested by @abozhilov


dperini commented Dec 9, 2010

Added exclusion of private, reserved, auto-configuration and local network ranges as described in the previous message.
Network and all networks >= are excluded by the second validation block.
The second validation block also takes care of excluding IP address terminating with 0 or 255 (non usable network and broadcast addresses of each class C network).

It is easy to just remove the unwanted parts of the validation to fit different scopes (length, precision) so I will probably add more options like the list of existing TLD (possibly grouped), the list of existing protocols and/or a fall back for a more generic protocol match too.

Hey, just randomly came across this... my JavaScript URI parsing library does strict URI validation as per RFC 3986. It uses a much larger regular expression then this one. Code can be found at:

xttam commented Feb 8, 2013

I changed it a little bit so that it's valid in Ruby. Here it is:


Hi Diego,

Just came across this awesome code. I'd like to use this as a basis, and I'm hoping you can help me with a simple tweak. I'd like to let through URL's without the protocol specified (HTTP(S) or FTP). For some reason I can't seem to get it to work.


jpillora commented Jun 2, 2013

Hey Diego, Nice work. You make it a bit shorter though:


Similarly with the subnets

@dperini Can you assign a license to this? MIT or BSD?

+1 for the license information

dkart commented Oct 11, 2013

+1 for the license information from me, too

utopiaio commented Nov 1, 2013

+infinity on the license Diego


dperini commented Nov 5, 2013

I have added the MIT License to the gist as requested.

Thank you all for the support.

@dperini: Could you add support for url such this?



pjacobs commented Dec 4, 2013

Is there a Java version of the regex available? That would be great for my android app!

@mparodi Ruby version untouched by markdown


ixti commented Dec 19, 2013

Ruby port:

class Regexp


    # protocol identifier

    # user:pass authentication

      # IP address exclusion
      # private & local networks

      # IP address dotted notation octets
      # excludes loopback network
      # excludes reserved space >=
      # excludes network & broacast addresses
      # (first & last IP address of each class)
      # host name

      # domain name

      # TLD identifier

    # port number

    # resource path



And specs:

# encoding: utf-8

require "spec_helper"

describe "Regexp::PERFECT_URL_PATTERN" do

  ].each do |valid_url|
    it "matches #{valid_url}" do
      expect(Regexp::PERFECT_URL_PATTERN =~ valid_url).to eq 0

    " should be encoded",
    ":// should fail",
    " quux",
  ].each do |invalid_url|
    it "does not match #{invalid_url}" do
      expect(Regexp::PERFECT_URL_PATTERN =~ invalid_url).to be_nil


very good, thank you for share

I added support for punycoded domain names:


dperini commented Feb 14, 2014

Updated the gist with reductions/shortenings suggested by "jpillora".

Thank you !


dperini commented Feb 14, 2014


to do that you can change line 65 from:

"(?:(?:https?|ftp)://)" +


"(?:(?:(?:https?|ftp):)?//)" +

this way the protocol and colon becomes an optional macth.

You can also just leave the double slash on that line if no URLs have the protocol prefix:

"//" +

ghost commented Mar 11, 2014

Why can't the maximum range for Unicode strings extend to U0010ffff (instead of uffff)?

What about relative URLs?


@stevenvachon relatives wouldn't be URLs they would be paths, which wouldn't need this validation at that point.

jkj commented Jun 6, 2014

I recently needed this but have a dumb question. In the very last part for the resource path, why do you use [^\\s] rather than \\S ? To my understanding they are equivalent, with the latter being a bit shorter.

dimroc commented Jun 9, 2014

For the following Regex and the one pasted by ixti:

    URL = /\A(?:(?:https?):\/\/)?(?:\S+(?::\S*)?@)?(?:(?:(?:[a-z0-9][a-z0-9\-]+)*[a-z0-9]+)(?:\.(?:[a-z0-9\-])*[a-z0-9]+)*(?:\.(?:[a-z]{2,})(:\d{1,5})?))(?:\/[^\s]*)?\z/i

You will end up with extremely slow matching, to the point where you suspect an infinite loop, if you have a long subdomain for a URL ending with a period:


it { should_not match "http://aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa.randomstring." }

The longer the subdomain "aaa....", the longer it'll take.

phiyangt commented Jun 9, 2014

Fixed the URL Regex to make the subdomain match non-recursive thereby improving performance. Long story short: it passed our existing test suite and improved performance dramatically.

    URL = /\A(?:(?:https?):\/\/)?(?:\S+(?::\S*)?@)?(?:(?:([a-z0-9][a-z0-9\-]*)?[a-z0-9]+)(?:\.(?:[a-z0-9\-])*[a-z0-9]+)*(?:\.(?:[a-z]{2,})(:\d{1,5})?))(?:\/[^\s]*)?\z/i

Anyone have a python port? My recollection was that the python regexp engine does have some differences.

@dperini you should add support for 32bit addresses and ipv6 addresses.

I vote that this should be turned into a git repository with multi-language ports.

I'm also using the top of the page gist regex in JS and finding it very slow to process long invalid URLs such as:

The more letters added there the slower the response.

It sounds like what @phiyangt is referring to above.

Is there any solution for this for JS?



dperini commented Jun 24, 2014

@DMroc @Feendish try using Firefox with the same code you say it is slow ... maybe you are just using the wrong browser for your testings/objectives, you didn't specify any code or environment info to replicate.
However I guess you are using Chrome :9) if not please provide more infos.


dperini commented Jun 24, 2014

Well after a few test I can say the slowdown and further browser crash is a Chrome only problem.
I tried the same in Firefox and everything works correctly with these REGEXP, no problem or slowdown.

I have reduced the original REGEXP to a minimal to be able to show the problem.
Try the following line in Chrome console, it will crash the browser:


So I believe this is just a bug in Chrome RE engine.

Hi Diego,

Yeah I'm on latest stable Chrome (Version 35.0.1916.153 m).

This is the "bad" url I'm checking http://qweqweqweqwesadasdqweeqweqwsdqweqweqweqwesadasdqweeqweqwsd

The original regex I'm using (the one from the Gist on top - 1 liner or full version) locks the browser in Chrome as you say. It also locks up IE11.

In Firefox 29 it gave this error:
InternalError: an error occurred while executing regular expression

I updated to latest Firefox v30. The regex runs and gives false which is correct.

From some research online it appears Chrome does not halt execution when there is catastrophic backtracking in a regex. Safari, Firefox and IE could just report 'no match' after some arbitrary number of backtracks.

I also tried your recent regex above and it doesn't lock any browsers.

However it returns true for 'isjdfofjasodfjsodifjosadifjsdoiafjaisdjfisdfjs' which is invalid.
It also returns false for '' which is incorrect.

Are you sure there isn't a runaway loop in there somewhere?


dperini commented Jun 25, 2014

I don't know why copying and pasting the above RE in Chrome console mangles some character, it actually doesn't crash the console of the browser window.

Please try to cut and paste the RE from this tweet:

I retested it and it actually crashes the console in that it doesn't answer to commands anymore after running that RE test that you can find in the above tweet.

The fact that the original RE also works on Safari pushes me to believe it's a Chrome problem but I need to do more tests. The "weburl" RE also work in PHP and other environments.

I am testing on the same Chrome Version 35.0.1916.153 under OS X 10.9.3.

Suggestion and help on this matter are welcome !

@dperini This seems to be a V8 issue. Relevant bug ticket:

@dperini I ran the RE from the tweet in RegexBuddy analyser and it says "Your regular expression leads to "catastrophic backtracking", making it too complex to be run to completion."

It locks up Chrome & Opera but not Firefox. As the ticket @mathiasbynens linked to suggests, certain browsers are more lenient when catastrophic backtracking happens. Chrome V8 seems to not have any fail limit for this and puts the onus on the regex format.


dperini commented Jun 28, 2014

can you contact me via email ?
I have a newer version of the RE that doesn't crash Chrome.
Maybe you can try it and give me some feedback before I push it to a new gist.

Sure sent it there now. Thanks.

EtaiG commented Jul 2, 2014

@dperini, we've found this issue too... looks like there's a highly exponential recursion into infinity on simple strings.

I've managed to reduce this to the way the hostname check is written (since it's followed later (eventually) by TLD).
It's this simple format that will cause the problem:

var regx = new RegExp('^(\\w+)*[^\\w]$');
regx.test('aaaaaaaaaaaaaaaaaaaaaaaaaa');  //chrome will crash

In other words, when you have a repeat of something 1 -> infinity times, and this group is repeated 0->infinity times, and the next match is for anything not in the group (obviously... but I put [^w] just to illustrate), then chrome will keep recursion to search for a possible group of (1->n) which repeats (0->m) times which has that letter matching.

Of course, internally, the regex should first be run 'greedily' to check if there's a possible match by making sure required letters are there..

Essentially, if I were to write the implementation for a regex, when encountering such a group, I would internally be doing this:

var regx = new RegExp('^(?=\w*[^\w])(?:\w+)*[^\w]$');
regx.test('aaaaaaaaaaaaaaaaaaaaaaaaaa');  //chrome will not crash

because first I'm doing a positive lookahead to check if this is even possible... though the complexity for this rises as the nested groups become more complex

Finally, I think this can be fixed here, by changing the host name from:




which is really the same thing, if you think about it.

EtaiG commented Jul 2, 2014

In fact, I believe the whole host-domain-TLD identifier is the same as this (but this should be more performant and not crash):

      // host name
      "(?:[a-z\\u00a1-\\uffff0-9]-?)*[a-z\\u00a1-\\uffff0-9]+" +
      // domain name
      "(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-?)*[a-z\\u00a1-\\uffff0-9])*" +
      // TLD identifier
      "\\.[a-z\\u00a1-\\uffff]{2,}" +

There's no need to add non-capturing groups if you're not doing anything with the group... if you plan to modify a group with a repeater, lookahead or just use an OR operator in it, then use a group, but otherwise there's really no point (since all you want, is to make sure everything in the group is present... which you don't need to use a group for!)


dperini commented Jul 4, 2014

Thank you @EtaiG,
your expression looks good too.

However I have been pushed to "re-read" the specifications throughly and was answered on a V8 ticket here:
In post #21 @erik suggested I consider rewriting the labels matching parts using lookahead.

Since most wanted a Javascript to use as a pattern checking inputs I did tests in Javascript only.

This is the result of following his advice, no ftp protocol no special IP handling, only the minimal:

var re_weburl = new RegExp(
    "^" +
        // protocol identifier (optional) + //
        "(?:(?:https?:)?//)?" +
        // user:pass authentication (optional)
        "(?:\\S+(?::\\S*)?@)?" +
        // host (optional) + domain + tld
        "(?:(?!-)[-a-z0-9\\u00a1-\\uffff]*[a-z0-9\\u00a1-\\uffff]+(?!./|\\.$)\\.?){2,}" +
        // server port number (optional)
        "(?::\\d{2,5})?" +
        // resource path (optional)
        "(?:/\\S*)?" +
    "$", "i"

This RE fits in a tweet ! But let's see how it works for you.

I also changed [^\s] with a \S as suggested by @jkj and relaxed the match on protocol identifiers.

Consecutive hyphens are allowed by specifications but they must not be found in both 3rd and 4th positions, those sequences are reserved for "xn--" and similar ASCII Compatible Encodings. If that exclusion were necessary maybe a simple lookahead (?|..--) will help there too.

EtaiG commented Jul 5, 2014

@dperini , thanks for responding.
I read all the specifications too last week (RFC's 5890 - 5894 and RFC 3492, several times), due to this issue. I'm also poster #24 in the google v8 thread.

Please note that I will be analysing this issue in depth below, and if I come off critical - that is not my intent, so I apologize in advance.

I disagree with the negative lookaheads. There are rare cases when they are truly useful.
I believe in minimizing them whenever possible, especially when repeating something up to an 'infinite' amount of times, since they can cause dreadful performance for complicated matches..

I like being more explicit about the regex- which may make it more verbose, but it's very clear what the javascript engine needs to do to match it.

For example, when you have:

// host (optional) + domain + tld
        "(?:(?!-)[-a-z0-9\\u00a1-\\uffff]*[a-z0-9\\u00a1-\\uffff]+(?!./|\\.$)\\.?){2,}" +

This part can match long strings in too many different ways, and the regex is too general, so for characters which would match both the first character group and the second (namely, almost anything except for a dot and a hyphen), it can match an exponential number of times.

For example, it can match 'ab' as:
a b | ab
and it can match 'abc' as:
a b c | a bc | abc | ab c
and it can match 'abcd' as:
a b c d | a b cd | a bc d | a bcd | abcd | ab c d | ab cd | abc d

It's easy to see that for a string of length n, it has 2^(n-1) possible matches.

The way a greedy quantifier works is that it will stop as soon as it finds a possible match - otherwise it will try the next possibility in order to continue matching the regular expression.
This means that a sufficiently long string (i.e n = 21) which would result in a non-match, such as:
'aaaaaaaaaaaaaaaaaaaa.' (note the period at the end)
can cause it to take extremely long, an possibly crash (2^20 > 1,000,000)
Ignoring what's actually placed in memory and checked during a regex, by putting this in console, you can see what I mean:

var i=0, len = 2<<20;
// approximately 8s

You can test out your regex against that string (the one with the period at the end) and you'll see what I mean.

Also, note that 'aaaaaaaaaaaaaaaaaaaaaaaaaa' will match your regex although it's invalid.

This is because of the generalization of the check using greedy quantifiers, enabled by the negative lookahead (?!./|.$) (or by both of them?)

This is why I don't like negative lookaheads and prefer to be more declarative. You're almost forced to be more declarative when you don't use the negative lookaheads... but in the end, you are giving 'better instructions' to the javascript engine.

That's why I liked this better (for the host/domain/tld):


Note that this is the same as what I posted above, with the exception of switching out the -? for -* (in both host and domain) to allow for as many hyphens in between letters.

This doesn't take care of the xn-- and 3rd/4th position issue, but unless you're allowing someone to register a domain by you, this is less of an issue (since for most cases, it's for a link, and people only need to link to something that is allowed and exists)... and even then, serverside validation would be necessary.


dperini commented Jul 6, 2014

@EtaiG many thanks for the review and the good suggestions.
After trying myself your tweaks I have to completely agree with your points.
I still believe that by moving the dot matching to the end of the RE the host/domain/tld part can be reduced to only two main groups (since the only label with don't want followed by a dot is the TLD):

// host (optional) + domain + tld
"(?:(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+\\.)+" +
   "(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+" +

I am not sure I should consider digits as valid in the TLD group (also it is considered a label itself).

Now the tests do not lock up Chrome and it also seem the overall speed for URL validation is faster.


dperini commented Jul 6, 2014

The gist have been corrected/updated so it doesn't lock up Chrome Javascript.
I haven't reduced the host / domain / tld matching groups but I will do after testing.
Many thanks to @EtaiG for the help and the suggestions to resolve the problem.

schbetsy commented Jul 7, 2014

I believe the slash before query params is optional. should pass, but it currently does not.

Changing line 93 to

"(?:/?\\S*)?" +

solves that issue, but might break other query-parameter specifications that aren't covered in the test cases.


dperini commented Jul 10, 2014

@schbetsy I am not sure it is optional either.
Anyway your change fix that if it becomes necessary for some reader.
What I can see is that browsers accept that but then they insert a slash in it when finished.
I am curious to try the effects of this change on my current tests.
Thank you for pointing that out.

eluck commented Aug 7, 2014

Hey @dperini,

Thanks for your great work! Please note that this regex fails on the following url: http://localhost:8080


dperini commented Aug 8, 2014

it is written in the comments: 'TLDs have been made mandatory so single names like "localhost" fails'.
The regex was built to match URLs having a real domain name (at least 2 labels separated by a dot).
However it will be very easy to add 'localhost' as an acceptable exception.

lysk88 commented Sep 3, 2014


can you help me make this URI valid ""

thanks ahead!

PYTHON PORT (cc @brifordwylie):

import re
URL_REGEX = re.compile(
    # protocol identifier
    # user:pass authentication
    # IP address exclusion
    # private & local networks
    # IP address dotted notation octets
    # excludes loopback network
    # excludes reserved space >=
    # excludes network & broadcast addresses
    # (first & last IP address of each class)
    # host name
    # domain name
    # TLD identifier
    # port number
    # resource path
    , re.UNICODE)

I did make one change: the "-*" in both domain and host was (incorrectly) succeeding against "" so I changed it to "-?" - I'm not sure why that's in the gist above, I'd think it would fail on a JS unit test also.


dperini commented Sep 9, 2014

it seems the URL "" you are testing against is actually a valid URL.
As is """. Just test it, it exists and resolves correctly to a Georgia State page.
I have been directed to read the relevant specs here:
and the validity criteria are here:
Thank you for the Python port !

Can you support international URLs?
For example: http://xn--80aaxitdbjk.xn--p1ai


dperini commented Sep 13, 2014

the regexp already supports international URLs, just write them using natural UTF-8 encoding.
The following is the UTF-8 version of the URL you typed above:
It would be hard to type or remember IDN URLs like the one you typed, nobody will do.
This has been written to validate URLs typed by users and/or found in log files.

Arkni commented Sep 29, 2014

@dperini thanks for sharing 👍

I'm using chai.js assert library to write a simple test for a js object in my rails app. This for initial client side form validation. Some of the uri formats as tested in @ixti spec above are failing to return false, here's the list.


Heres my code
#= require regex-weburl
class @FormValidators
  uri: (uri)->
#= require ../spec_helper
describe 'FormValidators', ->
  describe '#uri', ->
    beforeEach ->
      @formValidators = new FormValidators()
    it 'returns false for invalid urls', ->
      assert.notOk @formValidators.uri("http://")
      assert.notOk @formValidators.uri("http://.")
      assert.notOk @formValidators.uri("http://..")
      assert.notOk @formValidators.uri("http://../")
      assert.notOk @formValidators.uri("http://?")
      assert.notOk @formValidators.uri("http://??")
      assert.notOk @formValidators.uri("http://??/")
      assert.notOk @formValidators.uri("http://#")
      assert.notOk @formValidators.uri("http://##")
      assert.notOk @formValidators.uri("http://##/")
      assert.notOk @formValidators.uri(" should be encoded")
      assert.notOk @formValidators.uri("//")
      assert.notOk @formValidators.uri("//a")
      assert.notOk @formValidators.uri("///a")
      assert.notOk @formValidators.uri("///")
      assert.notOk @formValidators.uri("http:///a")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("rdar://1234")
      assert.notOk @formValidators.uri("http://")
      assert.notOk @formValidators.uri(":// should fail")
      assert.notOk @formValidators.uri(" quux")
      assert.notOk @formValidators.uri("http://-error-.invalid/")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("http://123.123.123")
      assert.notOk @formValidators.uri("http://3628126748")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")
      assert.notOk @formValidators.uri("")

Just thought I would take the time out to let you know. I'm not sure if something changed recently, if you are even supporting this script anymore. Good work by the way, saved me a tone of time.

@adamrofer fix of changing ( -* ) to ( -? ) in the host and domain name section fixed the js unit test for me


dperini commented Oct 14, 2014

I suggest you check your tests and/or the port of the Regular Expression you are currently using.
In the list of URLs failing validation that you sent above only the first one is a valid URL ("") all the others are not validating against the regex.

I tested them once more within my environment (Javascript) and everything works as expected.

Thanks Diego for your hard work! 👍 to @CMCDragonkai's comment, though: IpV6 support and a Git repo with ports to multiple languages are both really great ideas.

Hi @dperini

I love the expression, but I'm wondering what modification I would need to make, to make the pattern ignore a URL if it is proceeded by either a " or = or ] or > and succeeded with either a " or [/ or </

It is so that the following won't be validated:



<a href=""></a>

Reason is I currently use modified version gruber's regex as part of a php auto url function in the following manner, but I would like to use your's instead:-

// Regular expression for URLs
// Based on
// Improved to only pickup links begining with http https ftp ftps mailto and www
$regex = "_(?i)\b((?:https?|ftps?|mailto|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:'\".,<>?«»“”‘’]))_iuS";

// If markup is TRUE, convert URLs to html markup
if ($markup == TRUE) $string = preg_replace_callback($regex, array(&$this, 'auto_url'), $string);

Thanks, Matt

Additional, my thinking behind this question is to be able to allow the manual coding of links, using html or bbcode.


dperini commented Nov 19, 2014

just saw this ... as a quick suggestion you can try something like:


haven't tried it, not sure it does exactly what you asked/depicted.
It's a start anyway 😄


dperini commented Nov 19, 2014

a better approach to match corresponding open/close brackets and quotes would require more work:


again, I haven't tested it.


dperini commented Nov 19, 2014

yes I believe it would be a good idea to move this to a Git repo.

However I disagree about having patterns that will never be typed by users like "IPV6" and "PunyCode". I am most likely inclined to also remove IPV4 validation from the base regex, nobody remember these numbers and they will most likely change in time.

Nobody will type/remember "PunyCode" URLs and the regex already supports international UTF-8 URLs.
The above is also true for decimal notations, various forms of IPV6 URLs and other "non-human" URLs.

Thanks for sharing, Diego.
I put this in a repo:

sanbor commented Jan 16, 2015

Thanks @MarQuisKnox, @dperini and @mathiasbynens, it is really helpful!

Hey guys, here is my extended version
It builds upon your regular expression @dperini but has support for more features:

  • IPv6 addresses (actual validation via filter_var).
  • Punycode support.
  • URLs which are not in NFC form are invalid.
  • URLs with a dash on the third and fourth position are invalid.

Would you mind if I release my code with the Unlicense license? I used MIT because you used MIT, but I'm more into total freedom.

halloamt commented Feb 4, 2015

Hi, is a valid URL but the last dot ist usually not written by convention. See Paragraph 3.1. works in Firefox and IE

Veers01 commented Feb 6, 2015

Just a small comment about brodcast and network address. these address can be valid in CIDR class. Ex: If a provider have two class like and, they can combine the two in a classless network: In that network, and are two valid and usable address.

ngduc commented Feb 6, 2015

Any regex can extract URLs from below cases?

"" (string contains double quotes)
'' (string contains single quote)
[] (string contains brackets)</br> (string contains html tags)

puzrin commented Feb 12, 2015 here is JS demo with full unicode support, including astral characters.

Final regexp in ~6K and generated automatically. Src is here: . Since astral characters take 2 positions, [^negative] class is impossible. Negative lookahead is used instead

NOTE, that package does fuzzy search, not strict validation. For strict validation (^...$) required.

I changed the last block for the resource path to look like this:


This will allow URLs like or or

while they may not technically be valid, it is something I could see a user typing and most browsers will fix it for them. If they copy it out and back into a browser so they may not know what's wrong with it upon visual inspection.

This is exactly what I've been looking for.
Thank you. The only pattern it won't match for me (Using it in a Java Regex) is where the IP address is '0'(ZERO) padded, like:

Which I get as input from other tools.

Thanks again for the GREAT regex!!

anyone have a port?

'VB Port that handles domains with or without a hostname

    Public Sub MatchUrl(url As String)
    Dim rxs As String = ""
    'protocol identifier

    rxs = rxs + "(?:(?:https?)://)"
    ' user:pass authentication
    rxs = rxs + "(?:\S+(?::\S*)?@)?"
    rxs = rxs + "(?:"
    'IP address exclusion
    'private & local networks
    rxs = rxs + "(?!(?:10|127)(?:\.\d{1,3}){3})"
    rxs = rxs + "(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})"
    rxs = rxs + "(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})"
    'IP address dotted notation octets
    'excludes loopback network
    'excludes reserved space >=
    'excludes network & broacast addresses
    '(first & last IP address of each class)
    rxs = rxs + "(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])"
    rxs = rxs + "(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}"
    rxs = rxs + "(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))"
    rxs = rxs + "|"
    'host name
    rxs = rxs + "(?:(?:[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)"
    'domain name
    rxs = rxs + "(?:(?:\.[a-z\u00a1-\uffff0-9]-?)*[a-z\u00a1-\uffff0-9]+)"
    ' TLD identifier
    rxs = rxs + "(?:\.(?:[a-z\u00a1-\uffff]{2,}))"
    rxs = rxs + ")"
    ' port number
    rxs = rxs + "(?::\d{2,5})?"
    ' resource path
    rxs = rxs + "(?:/\S*)?"

    Dim rx As Regex = New Regex(rxs, RegexOptions.IgnoreCase)
    Dim match As Match = rx.Match(url)
    If match.Success Then
        Console.WriteLine("not a match")
    End If

End Sub

I also discovered that underscores are not valid if you follow this RegExp.


will fail.

Here's a link to a relevant StackOverflow question:

This is my PHP port...

I added (?=\s|$) to the end to prevent matches like (no path-slash).

I added (?<=^|\s) at the beginning to use it within text.

Additionally i reordered the hostname parts, to get it working with preg_replace_callback (I had some BACKTRACE LIMIT EXCEEDED errors).


The full expression:

const RX_LINK_ALL = '#

jnovack commented May 23, 2015 is a VALID HOST IP for a host within a subnet or larger.

  • First IP
  • Last IP

At a minimum, there are only two always-invalid IPs in the 10. subnet. I suggest only testing the following:

  • - Subnet address in (largest possible 10. subnet)
  • - Broadcast address in (largest possible 10. subnet)
  • - For validation testing.

(I'm french)
I don't understand why it don't match my string, using the javascript version of the regex ?

function fTest() {

var str = "aaa bbb ccc aaa bbb eee";

var res = str.match("/^(?:(?:https?|ftp):\/\/)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:\.\d{1,3}){3})(?!(?:169\.254|192\.168)(?:\.\d{1,3}){2})(?!172\.(?:1[6-9]|2\d|3[0-1])(?:\.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:\.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)(?:\.(?:[a-z\u00a1-\uffff0-9]-*)*[a-z\u00a1-\uffff0-9]+)*(?:\.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:\/\S*)?$/i");


--> res is empty

Anybody could explain to me why it dosn't work ?

Thx !

@danyboy85 This is because the RegExp is conceived to validate strings and not to match URLs in a strings. The ^ at the start of the RegExp means that the string should start with the URL protocol and the $ at the end of the RegExp means that the string should end with the URL pathname.

bmuessig commented Jun 4, 2015

I am not sure if anybody mentioned it before, but some of the "invalid" URL's are in fact valid!
So I make an example here: must parse as valid. could parse as valid.
If you disagree, read the actual specifications. The domain should actually be suffixed by a .

peter-fu commented Jun 8, 2015

Shouldn't this be valid?

peter-fu commented Jun 8, 2015

Just noted the workaround provided by @johnjaylward worked.

This regex and everyone's comments have been really informative! Thanks for writing this.

I'm confused about this regex's handling of UTF-8 characters. The RFC spec does not allow "" characters, so why does the regex use "" to match UTF-8 characters? From the spec:

" URI producing applications must not use percent-encoding in host unless it is used
to represent a UTF-8 character sequence. When a non-ASCII registered
name represents an internationalized domain name intended for
resolution via the DNS, the name must be transformed to the IDNA
encoding [RFC3490] prior to name lookup. URI producers should
provide these registered names in the IDNA encoding, rather than a
percent-encoding, if they wish to maximize interoperability with
legacy URI resolvers."

So, UTF-8 characters other than alphanumeric characters should be represented using % encoding and IDNA encoding. I'll post the regex I have in mind later on.

I answered my own question. Browsers reduce UTF-8 in URIs to punycode now, so from the perspective of the RFC spec, the URI actually sent over the wire will be valid.


dperini commented Jun 23, 2015

Many thanks to everybody for the comments and the suggestions.

I have updated the gist:

- Made starting path slash optional (
- Allow a dot (.) at the end of hostnames (

dperini commented Jun 25, 2015

This is an answer to @halloamt & @muessigb questions.
They are related to having/allowing a trailing dot at the end of the hostname.
I answered to this question previously on Twitter, here is an interesting link with additional info:

The title of the article say it all: "The danger of the trailing dot in the domain name".
As you can see from the previous message I recently allowed it in my regular expression.
So be careful if you use a trailing dot at the end of the domain name, it may not work in all situations.

dmose commented Jul 6, 2015

Looks like the "allowed a trailing dot" clause is missing a backslash in front of the dot, so it in fact allows a trailing character of any type, including whitespace, since that is the semantics of the . character in a RegExp.


dperini commented Jul 14, 2015

You are correct @dmose, thank you for noticing that.
I just fixed that both in the Javascript and in the PHP versions.


dperini commented Jul 14, 2015

I added the following example URLs to my tests:

all the above URLs are now passing the tests correctly !

@dperini: I don't believe your javascript one liner will match against the period in front of the TLD without two backslashes. I found this out the hard way when I put a question mark after the protocol match, making it optional.... and discovered it was passing any word ex: sethnewton

I forked and made the change here: ... hopefully it's of some use to you.


dperini commented Jul 24, 2015

I did a cut&paste of the one liner in your gist inside my tests and most of the tests fail.
It seems you have added the double backslash in the wrong place (not after the TLD block).
If you look to the one liner regular expression there is no place where a backslash need to be escaped.
It is only inside the new RegExp() constructor that it is necessary to double the backslashes (escape them).

nhahtdh commented Jul 30, 2015

There is a subtle inefficiency in this construct:


On a string without any -, the regex degenerates to [a-z\\u00a1-\\uffff0-9]*[a-z\\u00a1-\\uffff0-9]+, which is of the form A*A*. It will cause quadratic complexity in worst case. The effect is not very visible, until the length of the non-matching string goes up to a few thousand to tens of thousands characters.

This is my suggested fix:


It can only starts and ends with [a-z\\u00a1-\\uffff0-9], and any stretch of - or [a-z\\u00a1-\\uffff0-9] is still allowed. Likewise, minimum matching length is still 1.

jbardu commented Aug 5, 2015

Control-F Perl .. nothing.

A perl version is the one line Javascript version with \x{00a1}-\x{ffff} instead of \u00a1-\uffff
Tested against the test-case list and passed.

gburtini commented Aug 7, 2015

This doesn't seem to allow http://3628126748

It is a decimal address which resolves to an IP owned by The Coca Cola Corp (not an internal IP).

bazzargh commented Aug 7, 2015

The patterns for username/password are overly lax and allow you to put in almost anything as a url, if you finish with something that looks like eg re_weburl.test(""), or re_weburl.test("http://???/")

@gburtini Actually although browsers allow and resolve URLs with IP addresses that are in hexadecimal, octal or without a dot-notation, these formats are made invalid in a URL by RFC 3986: section 7.4 Rare IP Address Formats

'' check failed

o5 commented Aug 31, 2015

A few changes are visible here.

  1. Line 85 - - is also a valid address
  2. Line 97 - - port could be < 10

lucasvrm commented Sep 1, 2015

Thanks for the great regex! I am trying to use it within a custom validation rule in Laravel.

But it's not validating anything at all...

My code bellow:

$regex = '_^(?:(?:https?|ftp)://)(?:\S+(?::\S*)?@)?(?:(?!(?:10|127)(?:.\d{1,3}){3})(?!(?:169.254|192.168)(?:.\d{1,3}){2})(?!172.(?:1[6-9]|2\d|3[0-1])(?:.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\x{00a1}-\x{ffff}0-9]-)[a-z\x{00a1}-\x{ffff}0-9]+)(?:.(?:[a-z\x{00a1}-\x{ffff}0-9]-)[a-z\x{00a1}-\x{ffff}0-9]+)(?:.(?:[a-z\x{00a1}-\x{ffff}]{2,})).?)(?::\d{2,5})?(?:[/?#]\S)?$_iuS';

$rules = array('hewit' => array('required', 'regex:'. $regex));

Am I doing something wrong?

Hello Diego,
Awesome work!!!

In reference to the link:
These 2 test cases (should return false) are returning true:

worenga commented Sep 18, 2015

Hi @dperini, very nice RegEx!
I ran the regex on thousands of user input text data and it seems that urls like is recognized by the regex, im not sure whether this is intended or not.

Thanks for the great work.
I also found that someone created a npm package for this gist:

Hi @dperini !
I have a question.
Your regexp for JS:
var urlReStr = '^(?:(?:https?|ftp)://)(?:\S+(?::\S_)?@)?(?:(?!(?:10|127)(?:.\d{1,3}){3})(?!(?:169.254|192.168)(?:.\d{1,3}){2})(?!172.(?:1[6-9]|2\d|3[0-1])(?:.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]-)[a-z\u00a1-\uffff0-9]+)(?:.(?:[a-z\u00a1-\uffff0-9]-)[a-z\u00a1-\uffff0-9]+)_(?:.(?:[a-z\u00a1-\uffff]{2,})).?)(?::\d{2,5})?(?:[/?#]\S*)?$';

var urlRe = new RegExp(urlReStr, 'i');

It works perfectly for some tricky cases, but doesn't fail for such simple case as http://dddddddddddddd.
Maybe I'm doing something wrong. Please, give me advise.

Unfortunately... - Fails
http://google - Passes

goa commented Nov 23, 2015 - Fails, although it is a valid YouTube url.

http://localhost fails it's a crucial one 👍

@diegoperini's version converted to javascript:

var match_url_re=/^(?:(?:https?|ftp)://)(?:\S+(?::\S_)?@)?(?:(?!10(?:.\d{1,3}){3})(?!127(?:.\d{1,3}){3})(?!169.254(?:.\d{1,3}){2})(?!192.168(?:.\d{1,3}){2})(?!172.(?:1[6-9]|2\d|3[0-1])(?:.\d{1,3}){2})(?:[1-9]\d?|1\d\d|2[01]\d|22[0-3])(?:.(?:1?\d{1,2}|2[0-4]\d|25[0-5])){2}(?:.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(?:(?:[a-z\u00a1-\uffff0-9]+-?)[a-z\u00a1-\uffff0-9]+)(?:.(?:[a-z\u00a1-\uffff0-9]+-?)[a-z\u00a1-\uffff0-9]+)_(?:.(?:[a-z\u00a1-\uffff]{2,})))(?::\d{2,5})?(?:/[^\s]*)?$/i;


megamos commented Jan 5, 2016

Thanks for this lovly gist!
Anyone got an updated Ruby version of the regex?

umutm commented Feb 18, 2016

Wouldn't it be better if we update host name validation part from
so that the sub-domains allow underscore character?

Edit: this update fails on but not an requires some improvement. Any suggestions?

Right now shows to be true and shows to be false.

http://xn--j1ail.xn--p1ai/ - fails, but it valid url

mockdeep commented May 12, 2016

@derekshull isn't a valid url, since underscores aren't valid in domain names.

@mockdeep you were right, an underscore is not valid in a domain name, but it is valid in uri's. So is not valid, but is valid

For anyone trying to get this to work with local host names (those without any '.'), replacing the final '*' with a '?' on line 90 did the trick for me.

The regexp sadly doesn't match Twitter's short links:

dperini commented Jul 9, 2016

I have seen that some users (like @yang7229693 and @sircharleswatson) are trying to validate host names like:

without a needed protocol identifier (schema) so this will fail since these are not Web URLs.

In case you need to make the "protocol identifier" optional change the schema related line at the beginning from:

 // protocol identifier
 "(?:(?:https?|ftp)://)" +

to the following (added a question mark to the end, before the closing double quote):

 // protocol identifier
 "(?:(?:https?|ftp)://)?" +

dperini commented Jul 9, 2016

@umutm & @derekshull
as already said by user @mockdeep the underscore character is not allowed by specifications !
It is not supported either in the domain or in the sub-domain parts (even if you can configure your DNS to accept it).


dperini commented Jul 9, 2016

it works if you write it as plain UTF-8 instead of puny-encoded, I mean written like: http://кто.рф/
It was already been explained in previous comments, nobody will remember and type puny-encoded URLs.
Try to validate the http://кто.рф/ as is, you will see it passes the validation (see my last comment on 19 Nov 2014).

amogil commented Jul 18, 2016

Made a gem for ruby. Thanks, @dperini and @mathiasbynens!

kofifus commented Jul 25, 2016

validating without a protocol is tricky .. just by adding the question mark above you'll end up with any 'word.word' as a valid url which is usually not what you want.

I came up with the following, which is not perfect but works.. it will approve http://google.bla and but not google.bla

    function isUrl(s) {
        if (!isUrl.rx_url) {
            // taken from
            // valid prefixes
            isUrl.prefixes=['http:\/\/', 'https:\/\/', 'ftp:\/\/', 'www.'];
            // taken from

        if (!isUrl.rx_url.test(s)) return false;
        for (let i=0; i<isUrl.prefixes.length; i++) if (s.startsWith(isUrl.prefixes[i])) return true;
        for (let i=0; i<; i++) if (s.endsWith('.'[i]) || s.includes('.'[i]+'\/') ||s.includes('.'[i]+'?')) return true;
        return false;

amogil commented Jul 26, 2016

In that case you have to have to-date list of TLDs. It can be tricky.

luckydonald commented Sep 21, 2016

Up-to-date official TLD list can be found at, or []( as marchine parsable.

I am using the expression mentioned above - the JS version - at the below tool and it says "Pattern Error" at the first character -
When I remove the 1st character i.e. '/', the error goes away, but no match is found for the following inputs -

Please help !

marcopompili commented Oct 16, 2016

@kewlcoder You have to remove '/', you should copy only the content between '/' if you are using a regex with regex101. Also this regex is for matching one URL not a list of them, you have to use the correct flags (img) for multiple matching:

For those here that don't yet know, there is an implementation of the URL spec available via whatwg-url and in Node.js v7 (experimental). These discussions should probably move to one of these places.

spinus commented Jan 4, 2017

@dperini, you excluded localhost and few private networks from "valid" set. If the reason for it is to discourage people from accessing "local" resources, I would ass "localhost.localdomain" as well.

gajus commented Jan 25, 2017

It might be worth stating that the current JavaScript regex version does not pass the test.


spence commented Feb 12, 2017

Published a Elixir library for this. Thanks @dperini and everyone here!


anchev commented Mar 1, 2017

Thanks. You need to add also ftps and sftp. BTW this test is not quite actual it seems because when testing in these urls show as valid and according to Mathias they should not be:

and this one is invalid (according to Mathias it should be):


Hi. I am using python validators and validators.url which is based on your url validator - when I use a url with double dashes after the first word , for example it fails -- however there are many valid urls out there with double dashes -- Just wondering if you knowledge of that failure ? Thank you ..

evecalm commented Mar 24, 2017

I am trying to make this regex work with an xsd-file. So far i am just disappointed with the restrictions of xsd. Even when i left out the negative lookahead for the ip's i am still struggling with the unicode-ranges inside the characterclasses.

What i have so far:


Problem is line 5 atm: [a-zA-Z\u00a1-\uffff0-9] Has somebody any idea?

gorurs commented Jun 5, 2017

Hi Diego,

Is there a version of this regex for Java?



About page,
Latest regex " @diegoperini (502 chars) " does not completely match last valid example which is

indeed, there is a match, but it's

without the trailing '4'...?

Am I missing something? Below the Java Pattern I am using.


Why is that?

@ gorurs : here is Diego's regex as Java pattern:


Shouldn't the regex only validate matching parenthesis? I mean, syntactically it is valid to have orphan parenthesis inside the URL, but most of the time it does not happen, whereas having an URL inside parenthesis is pretty common:

This is a test (with url inside parenthesis: Done!

With this regex, the matched URL will be which is wrong. It should only validate matching parenthesis, because it is the most common use case.

Any way around that? I am a reaaaaal beginner at regexes. Don't even know how to adapt this regex in order to achieve this.

BlaM commented Aug 14, 2017

@RenaudParis I don't think you can make that assumption. There is no reason why you could not end up with a closing ")" in an ID (for example) - which does not make the URL invalid:

rokoroku commented Aug 17, 2017

Hmm this regex does not match localhost uri like http://localhost:3000

Az0res commented Aug 17, 2017

Thanks for your work! Maybe you could add support for urls such as (with no protocol defined, but www in front)?

cguillemette commented Aug 21, 2017

@rokoroku: it does not consider valid localhost per documentation in comment of gist.
See "TLDs have been made mandatory so single names like "localhost" fails".

dinbrca commented Sep 11, 2017

Hello, thanks you for the hard work, I would like to note that doesn't pass validation. Can you fix it? thanks

kaijMueller commented Oct 14, 2017

@rokoroku: you can add a '?' to the end of line 92 so the TLD is optional:
@dinbrca add the underscore to the host and domain regex:

According to can a host or domain contain some special characters like -_& but not so much like \u00a1-\uffff. Also a host can end with a '-', which is not possible with this regex

idanen commented Oct 17, 2017

@dinbrca, I also had the same problem.
Just add _ to the hostname part.


dperini commented Nov 29, 2017

@gaius @anchev and @steelliberty
two consecutive hyphen ( are considered valid by the specification see here:

this have been said repeatedly in this discussion and also a dot at the end of the domain is perfectly valid:

dperini commented Nov 29, 2017

@dinbrca and @idanen
the underscore ('_') character is not a valid character for domain names and host names.


Trying to validate the following url:

with the regex:

I got only the group : WITHOUT ")" ?????

Any idea ? Thanks in advance !

Synchro commented Dec 14, 2017

This is considered valid, even though it contains incorrectly URL-encoded elements:

I assume this is considered valid because the URL scheme itself doesn't care about higher-level concerns, however, the standard PHP http extension will fail to parse a URL containing such encoding errors, which is exactly the kind of thing I'm likely to be checking a URL for before trying to request it:

$req = new http\Client\Request('HEAD', '');

PHP Warning:  http\Client\Request::__construct(): Failed to parse query; invalid percent encoding at pos 2 in 'a=%T='

Synchro commented Dec 14, 2017

Some valid URL schemes are marked as invalid:


tel, fax and modem are defined in RFC2806 and are widely supported in mobile browsers. The callto scheme is supported by some conferencing apps, principally Skype.

alpeshp commented Dec 18, 2017

Hello sir

I want to allowed user to enter following URl Schema


is your RE this accept Url Like geo:37.786971,-122.399677

and all others

please let me know how can i used your RE

thanks and regars

mnogueron commented Jan 4, 2018

@pdalfarr Same problem for me, is not recognised even though it's a valid URL. In fact as soon as the last component of the IP has more than 2 digits, the last digit is not captured.

It seems that this part of the regex: (?:\.(?:[1-9]\d?|1\d\d|2[0-4]\d|25[0-4])) should be reordered so that it captures the whole 253.
Here is my solution: (?:\.(?:25[0-4]|1\d\d|2[0-4]\d|[1-9]\d?))

Hope it helps other people! :)

gajus commented Jan 17, 2018

@Mickael-van-der-Beek regarding:

I also discovered that underscores are not valid if you follow this RegExp.

Just add _ to the 17th matching group, i.e.

- ^((https?|ftp):\/\/)(\S+(:\S*)?@)?((?!10(\.\d{1,3}){3})(?!127(\.\d{1,3}){3})(?!169\.254(\.\d{1,3}){2})(?!192\.168(\.\d{1,3}){2})(?!172\.(1[6-9]|2\d|3[0-1])(\.\d{1,3}){2})([1-9]\d?|1\d\d|2[01]\d|22[0-3])(\.(1?\d{1,2}|2[0-4]\d|25[0-5])){2}(\.([1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(([a-z\u{00a1}-\u{ffff}0-9]+-?)*[a-z\u{00a1}-\u{ffff}0-9]+)(\.([a-z\u{00a1}-\u{ffff}0-9]+-?)*[a-z\u{00a1}-\u{ffff}0-9]+)*(\.([a-z\u{00a1}-\u{ffff}]{2,})))(:\d{2,5})?(\/[^\s]*)?$
+ ^((https?|ftp):\/\/)(\S+(:\S*)?@)?((?!10(\.\d{1,3}){3})(?!127(\.\d{1,3}){3})(?!169\.254(\.\d{1,3}){2})(?!192\.168(\.\d{1,3}){2})(?!172\.(1[6-9]|2\d|3[0-1])(\.\d{1,3}){2})([1-9]\d?|1\d\d|2[01]\d|22[0-3])(\.(1?\d{1,2}|2[0-4]\d|25[0-5])){2}(\.([1-9]\d?|1\d\d|2[0-4]\d|25[0-4]))|(([_a-z\u{00a1}-\u{ffff}0-9]+-?)*[a-z\u{00a1}-\u{ffff}0-9]+)(\.([a-z\u{00a1}-\u{ffff}0-9]+-?)*[a-z\u{00a1}-\u{ffff}0-9]+)*(\.([a-z\u{00a1}-\u{ffff}]{2,})))(:\d{2,5})?(\/[^\s]*)?$

I've just added the GraphQLURL scalar to my library. Used your regexp. Thanks, @dperini.

how to add another kind of url in this regex?
ex: www

wrone commented Feb 21, 2018

has anyone been able to use it in Salesforce Apex?

removed these "\u00a1-\uffff" and was able to execute it, code example:
String urlRegex = '^(?:(?:https?|ftp)://)(?:\\'+'S+(?::\\'+'S*)?@)?(?:(?!(?:10|127)(?:\\'+'.\\'+'d{1,3}){3})(?!(?:169\\'+'.254|192\\'+'.168)(?:\\'+'.\\'+'d{1,3}){2})(?!172\\'+'.(?:1[6-9]|2\\'+'d|3[0-1])(?:\\'+'.\\'+'d{1,3}){2})(?:[1-9]\\'+'d?|1\\'+'d\\'+'d|2[01]\\'+'d|22[0-3])(?:\\'+'.(?:1?\\'+'d{1,2}|2[0-4]\\'+'d|25[0-5])){2}(?:\\'+'.(?:[1-9]\\'+'d?|1\\'+'d\\'+'d|2[0-4]\\'+'d|25[0-4]))|(?:(?:[a-zA-Z0-9]-*)*[a-zA-Z0-9]+)(?:\\'+'.(?:[a-zA-Z0-9]-*)*[a-zA-Z0-9]+)*(?:\\'+'.(?:[a-zA-Z]{2,}))\\'+'.?)(?::\\'+'d{2,5})?(?:[/?#]\\'+'S*)?';

"\u00a1-\uffff" - these, as far as i understood, are some special characters. I believe they are rarely used, actually i have never seen urls with non-alphabetic character. Will try to use my version, should be enough I think

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment