Ian Eldred Pudney - Google
In the Atheris Python fuzzer, we have a need to generate strings that match regular expressions. For example, given the pattern ab[cd]ef
, we might generate abcef
. We do this so that those strings can be added to the corpus, increasing the fuzzer's effectiveness. For “real” regular expressions (those that can be evaluated with a DFA or NFA), this operation can be performed in linear time. However, Perl-Compatible Regular Expressions, including Python regular expressions, support additional features beyond what a "real" regular expression can do. These features render match generation much more difficult. This document shows that it is NP-complete to find a match for an expression containing "lookaheads", one such feature. (This is true for lookbehinds, as well.)