Created
May 15, 2012 21:06
-
-
Save bhurt/2705146 to your computer and use it in GitHub Desktop.
Comment explaining a regex
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
;; Some documentation of this regular expression from hell, so I have | |
;; some hope of debugging it later. | |
;; | |
;; Start with the core "inner" regex: | |
;; | |
;; [\w\-]([\.\w])+[\w]+@([\w\-]+\.)+[A-Za-z]{2,4} | |
;; | |
;; This matches a "bare" email address, like bhurt@spnz.org. Then we | |
;; decorate it- we want to match the email adress even if it has a | |
;; name attached, like: "Brian Hurt" <bhurt@spnz.org> | |
;; | |
;; \"([^\"]|(\\.))*\"\s*<$bare$> | |
;; | |
;; where $bare$ is replaced with the bare-email matching regex above. | |
;; Except we want to match either a bare or decorated email address, | |
;; so it's really: | |
;; | |
;; ($bare$)|(\"([^\"]|(\\.))*\"\s*<$bare$>) | |
;; | |
;; Note that $bare$ is now duplicated twice. Now we want to match | |
;; a comma seperated sequence of decorated (including bare) email | |
;; addresses, so we do: | |
;; | |
;; ($deco$\s*,\s*)*$deco$ | |
;; | |
;; where $deco$ is the decorated or bare email address matcher above. | |
;; Now we want to match a line that starts with "from:" followed by | |
;; a comma seperated list of email addresses, and nothing else: | |
;; | |
;; ^\s*from\s*:\s*($addrs$)\s*$ | |
;; | |
;; where $addrs$ is our comma seperated list of email address matcher | |
;; above. Lastly, we want to set the following flags: | |
;; i = case insensitive matching | |
;; d = unix new lines | |
;; m = multiline matching | |
;; | |
;; so we prepend (?idm) to the regular expression. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment