Created
January 10, 2017 08:10
-
-
Save ErikCorryGoogle/99825a2393bd174b9eda867595a4c51f to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I like that names have to be unique. Some regexp flavours allow dupes and unify the storage between them, | |
but that feels complicated and difficult to spec. For something like this: | |
/(<foo>..)((<foo>..))*/ | |
normally you would reset the capture whenever you iterate the *-loop, but it would be strange to delete the | |
foo capture on entering the loop, or would it? | |
By putting the named captures on the match object as properties, you are preventing any future standard from | |
ever adding a new property to the Match object, ever, since it might conflict with the name of a named capture. | |
Perhaps it makes more sense to add a .map property to the match of type Map and have string keys on that map. | |
This also avoids the question of what happens if someone makes a named capture called __proto__ or prototype. | |
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Map | |
You picked the .NET syntax for backreferences \k<name> instead of the Python syntax (P=name). I think the | |
Python one was probably better, because JS does not do a syntax error for unknown alpha escapes so \k<name> | |
would previously match the literal string "k<name>", whereas (?P...) would cause a syntax error previously. | |
If you switch to the .NET syntax for backreferences, you should of course do the same for the captures | |
themselves. | |
.NET will number all the unnamed captures first from left to right, then number the named ones from left to | |
right. Most others number all of them regardless of whether they are named. I think you went with the non- | |
.NET version, which feels right. | |
I think using maps in any context here (either as the match object itself, or as the match.groups object) would be a pretty significant performance hit.
+1 for storing captures on a sub-object, where the sub-object is a plain JS object with one property per named capture, and the sub-object exists if and only if the regexp has named captures.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I agree that adding named captures as properties on the match object is a bad idea, for the mentioned reasons.
One solution I could get behind is having a sub-object for the captures. Another way could be to change match object to be a map, so that you could do something like
/(?<foo>abc)/.exec("abc").get("foo")