Skip to content

Instantly share code, notes, and snippets.

@longzheng
Created December 23, 2011 04:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save longzheng/1513173 to your computer and use it in GitHub Desktop.
Save longzheng/1513173 to your computer and use it in GitHub Desktop.
Difference in regex quantifiers parsing of JAVA and .NET
So I discovered this interesting difference in how JAVA and .NET parses Regex quantifiers by throwing myself against a brick wall (Twitter's regex library https://github.com/twitter/twitter-text-java/blob/master/src/com/twitter/Regex.java)
In JAVA, if you do
[[a-z]\-]*
It would actually match zero or more of "a-z" AND "-". It seems to automatically include everything inside any child square brackets.
In .NET,
[[a-z]\-]*
Only matches zero or more of "-". Just one of [a-z]
So the solution is to remove the inner bracket.
[a-z\-]*
@happy-dude
Copy link

Hey Long Zheng, here is my friend (https://github.com/nicolasavru) and my take on the regex:

[[a-z]-]*

Break it down:
[ ] --> its a character class; what did you put inside it? '[a-z'. It will match a [ OR any lowercase character. Then it will match a single dash followed by zero or more ].

For this regex to match, you will need EXACTLY ONE of '[' or a lowercase letter, followed by EXACTLY ONE dash followed by ZERO OR MORE ']'.

[a-z-]*

The "solution" you had will match ONE OR MORE of a lowercase letter OR dash.

In the end, the question is: what exactly were you looking for? You might have been confused by what exactly is getting matched, because my friend and I believe that JAVA and .NET matches the regex the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment