Skip to content

Instantly share code, notes, and snippets.

Regular Expression Lookaheads are NP-Complete

Ian Eldred Pudney - Google

Summary

In the Atheris Python fuzzer, we have a need to generate strings that match regular expressions. For example, given the pattern ab[cd]ef, we might generate abcef. We do this so that those strings can be added to the corpus, increasing the fuzzer's effectiveness. For “real” regular expressions (those that can be evaluated with a DFA or NFA), this operation can be performed in linear time. However, Perl-Compatible Regular Expressions, including Python regular expressions, support additional features beyond what a "real" regular expression can do. These features render match generation much more difficult. This document shows that it is NP-complete to find a match for an expression containing "lookaheads", one such feature. (This is true for lookbehinds, as well.)

Background Details

Keybase proof

I hereby claim:

  • I am IanPudney on github.
  • I am ianpudney (https://keybase.io/ianpudney) on keybase.
  • I have a public key whose fingerprint is 477D 6A96 5442 F566 6CAC 7052 6AA8 C091 94D6 E0DB

To claim this, I am signing this object: