Skip to content

Instantly share code, notes, and snippets.

@Groostav
Created January 27, 2016 08:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Groostav/44a31c904c3bf8c45085 to your computer and use it in GitHub Desktop.
Save Groostav/44a31c904c3bf8c45085 to your computer and use it in GitHub Desktop.
Stack-Overflow in java regex engine
class RegexTests{
@Test
public void regex_can_match_massive_repition_without_stack_overflow(){
Pattern pattern = Pattern.compile("([^\\r\\n]*?\\r?\\n){266959}(?<cool>.{30})");
//a pattern that runs successfully is (?>[^\\r\\n]?\\r?\\n){266959}(?<cool>.{30})
// note the atomic group on the front
// in my case, TwentyMegDocument is the first 300,000 lines or so
// of a wikipedia database dump (abstract-9):
// https://www.cs.purdue.edu/commugrate/data/wikipedia/dumps.wikimedia.org/enwiki/latest/
Matcher matcher = pattern.matcher(TwentyMegDocument);
boolean found = matcher.find();
assertThat(found).isTrue();
assertThat(matcher.group("cool")).isNotNull().isNotEmpty();
}
}
java.lang.StackOverflowError
at java.util.regex.Pattern$CharProperty.match(Pattern.java:3775)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4250)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4772)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Ques.match(Pattern.java:4181)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4772)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
at java.util.regex.Pattern$Ques.match(Pattern.java:4181)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$Loop.match(Pattern.java:4772)
at java.util.regex.Pattern$GroupTail.match(Pattern.java:4717)
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3798)
... and around and around
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment