Created
March 22, 2012 01:16
-
-
Save floere/2154980 to your computer and use it in GitHub Desktop.
A regexp to match named groups in a URL
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# I've looked at the case at hand which is: | |
# foo.bar -> foo: foo.bar | |
# foo.bar -> foo: foo, format: bar | |
# This roughly translates to: | |
# For each part (foo, format), match as many possible subexpressions consisting of multiple word characters or one non-word characters | |
# (we might say \. explicitly, in this specific case). Do this lazily, except for the last part, since that one needs to gobble up the rest. | |
# And: If we have two or more parts, join them by a lazy match of a dot (\.?), which is not included in the named group. | |
# Examples | |
# 1 part: | |
# | |
p "foo".match(/(?<foo>(\w+|\W?)+)/) # => #<MatchData "foo" foo:"foo"> | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+)/) # => #<MatchData "foo.bar" foo:"foo.bar"> | |
# 2 parts: | |
# | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+?)\.?(?<bar>(\w+|\W?)+)/) # => #<MatchData "foo.bar" foo:"foo" bar:"bar"> | |
p "foo.bar.bur".match(/(?<foo>(\w+|\W?)+?)\.?(?<bar>(\w+|\W?)+)/) # => #<MatchData "foo.bar.bur" foo:"foo" bar:"bar.bur"> | |
# 3 parts: | |
# | |
p "foo.bar.bur".match(/(?<foo>(\w+|\W?)+?)\.?(?<bar>(\w+|\W?)+?)\.?(?<bur>(\w+|\W?)+)/) # => #<MatchData "foo.bar.bur" foo:"foo" bar:"bar" bur:"bur"> | |
# (Note that the last expression is greedy – in the last example: (?<bur>(\w+|\W?)+) <- greedy – while the others are not, to gobble up the rest :) If you don't want this, don't make it greedy) | |
# So generally, what this means: If you have "pots", like foo, format etc. or let's say a, b, c, d, e, f… these regexps will distribute anything looking like: | |
# 1.2.3.4.5.6.7 | |
# in these pots. If there's not enough to go around, e.g. with 1,2, then it will only be distributed to the first two pots. | |
# If there's too much to go around, it depends whether the last expression is greedy or not: | |
# | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+?)/) # => foo matched | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+)/) # => foo.bar gobbled up |
Yeah, I meant a parser for the pattern -> regexp
step, not the request path -> route
parsing, I would still use a regexp there. the real question would be how to generate it. I would also like :name(.:format)?
to be possible.
Yes, I got that! :)
Ok, rewriting the challenge as: Find a way to elegantly map the given patterns into their corresponding regexps, such that they work for all given examples.
Full Pattern - Regexp mapping:
/ | /^\/$/
/foo | /^\/foo$/
/f\u00F6\u00F6 | /^\/f%C3%B6%C3%B6$/
/:foo | /^\/([^\/?#]+)$/
/:foo/:bar | /^\/([^\/?#]+)\/([^\/?#]+)$/
/hello/:person | /^\/hello\/([^\/?#]+)$/
/?:foo?/?:bar? | /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/
/* | /^\/(.*?)$/
/:foo/* | /^\/([^\/?#]+)\/(.*?)$/
/test.bar | /^\/test(?:\.|%2E)bar$/
/test$/ | /^\/test(?:\$|%24)\/$/
/te+st/ | /^\/te(?:\+|%2B)st\/$/
/test(bar)/ | /^\/test(?:\(|%28)bar(?:\)|%29)\/$/
/path with spaces | /^\/path(?:%20|(?:\+|%2B))with(?:%20|(?:\+|%2B))spaces$/
/foo&bar | /^\/foo(?:&|%26)bar$/
/*/foo/*/* | /^\/(.*?)\/foo\/(.*?)\/(.*?)$/
/:file.:ext | /^\/([^\/?#]+)(?:\.|%2E)([^\/?#]+)$/
/:name.?:format? | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/
/:user@?:host? | /^\/([^@%40\/?#]+)(?:@|%40)?([^@%40\/?#]+)?$/
/:name.?:format? | /^\/([^\.%2E\/?#]+)(?:\.|%2E)?([^\.%2E\/?#]+)?$/
I wouldn't mind working on this, if you don't mind :)
Ok, almost got it. Cleaning up the code and preparing for a pull request :)
Note: I'm assuming this example should actually be nil
. At least it looks like that to me. If I am wrong, please tell me.
"/:name.?:format? | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/ | /.bar | [.bar, nil]"
Let's move this to: sinatra/sinatra#492 Cheers!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Good point with the "." not being a separator character. A parser feels like the way to go here, but I had a feeling it might be doable using regexps. I'll have a look at Base#compile.
Thanks, I believe the tests would benefit a lot from being in a tabular form, but that's just me :)