Skip to content

Instantly share code, notes, and snippets.

@wwalker
Created March 7, 2013 15:59
Show Gist options
  • Save wwalker/5109072 to your computer and use it in GitHub Desktop.
Save wwalker/5109072 to your computer and use it in GitHub Desktop.
A regex greediness problem
This regex should only match the middle paragraph.
/Exception in thread.*?TimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30/m
Rubular (and testing in perl too) show I am wrong. it matches paragraph 1 and 2. The ? should make it non-greedy
TimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30
Exception in thread 1310229058:
Traceback (most recent call last):
TimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30
Exception in thread 1310249350:
TimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30
@taylor
Copy link

taylor commented Mar 7, 2013

/Exception in thread[^\n\r]\n\r?TimeoutError: QueuePool limit of size \d overflow \d* reached, connection timed out, timeout \d+/m

@taylor
Copy link

taylor commented Mar 7, 2013

.*? is not being greedy... it is saying match anything up to TimeoutError which is true for both 2 and 3 but not the first line. w/o the ? it will catch both paragraphs as one full match.

@wwalker
Copy link
Author

wwalker commented Mar 7, 2013

It's grabbing both, at once, with the ?

@taylor
Copy link

taylor commented Mar 7, 2013

Two only match the middle paragraph

/Exception in thread(?:[^\n\r]\n\r?){2}TimeoutError: QueuePool limit of size \d overflow \d* reached, connection timed out, timeout \d+/m

If there are more than those 3 lines it would not match.

@wwalker
Copy link
Author

wwalker commented Mar 7, 2013

Never mind, rubular is showing successive matches. DOH

@taylor
Copy link

taylor commented Mar 7, 2013

d="TimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30\n \nException in thread 1310229058:\nTraceback (most recent call last):\nTimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30\n \nException in thread 1310249350:\nTimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30\n"

d.scan(/Exception in thread(?:[^\n\r]*\n\r?){2}TimeoutError: QueuePool limit of size \d* overflow \d* reached, connection timed out, timeout \d+/m)

=> ["Exception in thread 1310229058:\nTraceback (most recent call last):\n TimeoutError: QueuePool limit of size 40 overflow 10 reached, connection timed out, timeout 30"]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment