Recently a colleague posted a security incident report about a single-character error in a regular expression. It got me thinking about the techniques I have used to reduce errors in my regular expressions and how I could use this incident as a guide to applying these techniques. In the process, I found this is a case where elegance works against clarity; sometimes clarity needs to win especially in security or safety applications.
Before I start, I want to make one thing clear: regular expressions are complex and brittle. Everyone makes this sort of mistake so do not interpret this as a personal criticism of any individual, any group, any language, or any project. I am not second-guessing design decisions. Nobody cares about how I would have written the original code. Even I don't.
Very simply, the code is what it is; my goal is to see what we can learn from it and what a team can do to reduce the risk posed by regular expre