Skip to content

Instantly share code, notes, and snippets.

@jba
Last active September 5, 2021 12:14
Show Gist options
  • Save jba/b0ea153dc6e1785975bbd1e650497f12 to your computer and use it in GitHub Desktop.
Save jba/b0ea153dc6e1785975bbd1e650497f12 to your computer and use it in GitHub Desktop.

Determining whether two patterns overlap

(This is a more detailed and precise version of https://stackoverflow.com/a/63914397/9459301.)

A pattern is an alternating sequence of literals, which match only themselves, and wildcards, which match zero or more characters. For example, ab*c*de is a pattern with starting literal ab, interior literal c, and ending literal de.

Two patterns p1 and p2 overlap if there is some string that they both match. We can compute whether they overlap in linear time as follows.

If at least one pattern has no wildcards, just match it against the other pattern.

So assume both patterns have wildcards. Remove the common literal prefix and suffix from the patterns.

If both patterns now start with a literal, it must be different in each pattern. The patterns don't overlap. If both end with a literal, again, it is different and they don't overlap.

Otherwise they do overlap. Here is the proof. p1 and p2 now name the adjusted patterns, with common start and end literals removed.

Now we can assume that at most one pattern starts with a literal, call it S, and at most one ends with a literal, call it E. (Both S and E may be empty.) We construct a string C that both patterns match like so:

S + interior literals of p1 + interior literals of p2 + E

where + is concatentation. An interior literal is one surrounded by wildcards.

For example, if p1 is a*b* and p2 is *c*d, then C is abcd.

p1 matches C like so: if p1 starts with S, that matches the initial S of C; otherwise p1 starts with a wildcard, which also matches C's initial S. Then p1 matches its own interior literals with the wildcards preceding them matching the empty string. p1's final wildcard matches the interior literals of p2. p1 ends with either E or a wildcard, both of which match the end of C.

p2 matches C similarly: p2 starts with either S or a wildcard, matching the beginning of C; the first wildcard in p2 matches the interior literals of p1; then p2 matches its own interior literals with empty wildcards; and finally p2 ends with either E or a wildcard, matching the end of C.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment