Skip to content

Instantly share code, notes, and snippets.

@freerobby
Created October 11, 2012 17:41
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save freerobby/b3deafcd893b6747efbc to your computer and use it in GitHub Desktop.
Save freerobby/b3deafcd893b6747efbc to your computer and use it in GitHub Desktop.
"www.google.com foo".match(/^([A-Za-z0-9\-_~\.\+]*(%[0-9A-F]{2})*[0-9A-Za-z_\-\.~\+]*)*$/)
@freerobby
Copy link
Author

This regex halts indefinitely on ruby 1.8, 1.9 and JRuby. Why?

@slevithan
Copy link

The nested quantifiers you've used (without there being any required parts to the pattern that help narrow down potential paths the regex can take on the way to a match) lead to superlinear/catastrophic backtracking whenever the regex encounters something it can't match. The longer the string fragment that it can match ("www.google.com") before encountering a character that it cannot (in your case, a space), the more pronounced the problem will be. And it's not just Ruby. Any regex engine based on backtracking will encounter the same problem if given the same pattern and input, although some might short circuit after a million or so backtracking steps (or sooner if they realize they're going in circles), rather than running until the heat death of the universe.

Change your regex to the equivalent ^[A-Za-z0-9_~.+-]*(%[0-9A-F]{2}[A-Za-z0-9_~.+-]*)*$, and the performance problem will disappear entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment