-
-
Save floere/2154980 to your computer and use it in GitHub Desktop.
# I've looked at the case at hand which is: | |
# foo.bar -> foo: foo.bar | |
# foo.bar -> foo: foo, format: bar | |
# This roughly translates to: | |
# For each part (foo, format), match as many possible subexpressions consisting of multiple word characters or one non-word characters | |
# (we might say \. explicitly, in this specific case). Do this lazily, except for the last part, since that one needs to gobble up the rest. | |
# And: If we have two or more parts, join them by a lazy match of a dot (\.?), which is not included in the named group. | |
# Examples | |
# 1 part: | |
# | |
p "foo".match(/(?<foo>(\w+|\W?)+)/) # => #<MatchData "foo" foo:"foo"> | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+)/) # => #<MatchData "foo.bar" foo:"foo.bar"> | |
# 2 parts: | |
# | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+?)\.?(?<bar>(\w+|\W?)+)/) # => #<MatchData "foo.bar" foo:"foo" bar:"bar"> | |
p "foo.bar.bur".match(/(?<foo>(\w+|\W?)+?)\.?(?<bar>(\w+|\W?)+)/) # => #<MatchData "foo.bar.bur" foo:"foo" bar:"bar.bur"> | |
# 3 parts: | |
# | |
p "foo.bar.bur".match(/(?<foo>(\w+|\W?)+?)\.?(?<bar>(\w+|\W?)+?)\.?(?<bur>(\w+|\W?)+)/) # => #<MatchData "foo.bar.bur" foo:"foo" bar:"bar" bur:"bur"> | |
# (Note that the last expression is greedy – in the last example: (?<bur>(\w+|\W?)+) <- greedy – while the others are not, to gobble up the rest :) If you don't want this, don't make it greedy) | |
# So generally, what this means: If you have "pots", like foo, format etc. or let's say a, b, c, d, e, f… these regexps will distribute anything looking like: | |
# 1.2.3.4.5.6.7 | |
# in these pots. If there's not enough to go around, e.g. with 1,2, then it will only be distributed to the first two pots. | |
# If there's too much to go around, it depends whether the last expression is greedy or not: | |
# | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+?)/) # => foo matched | |
p "foo.bar".match(/(?<foo>(\w+|\W?)+)/) # => foo.bar gobbled up |
Working
=======
Pattern | Current Regexp | Example | Should Be | Is Currently
---------------------|----------------------------------------------------------|----------------------------------|--------------------------------------|--------------------------------------
"/" | /^\/$/ | "/" | [] | []
"/foo" | /^\/foo$/ | "/foo" | [] | []
"/:foo" | /^\/([^\/?#]+)$/ | "/foo" | ["foo"] | ["foo"]
"/:foo" | /^\/([^\/?#]+)$/ | "/foo?" | nil | nil
"/:foo" | /^\/([^\/?#]+)$/ | "/foo/bar" | nil | nil
"/:foo" | /^\/([^\/?#]+)$/ | "/foo%2Fbar" | ["foo%2Fbar"] | ["foo%2Fbar"]
"/:foo/:bar" | /^\/([^\/?#]+)\/([^\/?#]+)$/ | "/foo/bar" | ["foo", "bar"] | ["foo", "bar"]
"/:foo" | /^\/([^\/?#]+)$/ | "/" | nil | nil
"/:foo" | /^\/([^\/?#]+)$/ | "/foo/" | nil | nil
"/f\u00F6\u00F6" | /^\/f%C3%B6%C3%B6$/ | "/f%C3%B6%C3%B6" | [] | []
"/hello/:person" | /^\/hello\/([^\/?#]+)$/ | "/hello/Frank" | ["Frank"] | ["Frank"]
"/?:foo?/?:bar?" | /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/ | "/hello/world" | ["hello", "world"] | ["hello", "world"]
"/?:foo?/?:bar?" | /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/ | "/hello" | ["hello", nil] | ["hello", nil]
"/?:foo?/?:bar?" | /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/ | "/" | [nil, nil] | [nil, nil]
"/?:foo?/?:bar?" | /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/ | "" | [nil, nil] | [nil, nil]
"/*" | /^\/(.*?)$/ | "/" | [""] | [""]
"/*" | /^\/(.*?)$/ | "/foo" | ["foo"] | ["foo"]
"/*" | /^\/(.*?)$/ | "/" | [""] | [""]
"/*" | /^\/(.*?)$/ | "/foo/bar" | ["foo/bar"] | ["foo/bar"]
"/:foo/*" | /^\/([^\/?#]+)\/(.*?)$/ | "/foo/bar/baz" | ["foo", "bar/baz"] | ["foo", "bar/baz"]
"/:foo/:bar" | /^\/([^\/?#]+)\/([^\/?#]+)$/ | "/user@example.com/name" | ["user@example.com", "name"] | ["user@example.com", "name"]
"/:file.:ext" | /^\/([^\/?#]+)(?:\.|%2E)([^\/?#]+)$/ | "/pony.jpg" | ["pony", "jpg"] | ["pony", "jpg"]
"/:file.:ext" | /^\/([^\/?#]+)(?:\.|%2E)([^\/?#]+)$/ | "/pony%2Ejpg" | ["pony", "jpg"] | ["pony", "jpg"]
"/:file.:ext" | /^\/([^\/?#]+)(?:\.|%2E)([^\/?#]+)$/ | "/.jpg" | nil | nil
"/test.bar" | /^\/test(?:\.|%2E)bar$/ | "/test.bar" | [] | []
"/test.bar" | /^\/test(?:\.|%2E)bar$/ | "/test0bar" | nil | nil
"/test$/" | /^\/test(?:\$|%24)\/$/ | "/test$/" | [] | []
"/te+st/" | /^\/te(?:\+|%2B)st\/$/ | "/te+st/" | [] | []
"/te+st/" | /^\/te(?:\+|%2B)st\/$/ | "/test/" | nil | nil
"/te+st/" | /^\/te(?:\+|%2B)st\/$/ | "/teeest/" | nil | nil
"/test(bar)/" | /^\/test(?:\(|%28)bar(?:\)|%29)\/$/ | "/test(bar)/" | [] | []
"/path with spaces" | /^\/path(?:%20|(?:\+|%2B))with(?:%20|(?:\+|%2B))spaces$/ | "/path%20with%20spaces" | [] | []
"/path with spaces" | /^\/path(?:%20|(?:\+|%2B))with(?:%20|(?:\+|%2B))spaces$/ | "/path%2Bwith%2Bspaces" | [] | []
"/path with spaces" | /^\/path(?:%20|(?:\+|%2B))with(?:%20|(?:\+|%2B))spaces$/ | "/path+with+spaces" | [] | []
"/foo&bar" | /^\/foo(?:&|%26)bar$/ | "/foo&bar" | [] | []
"/:foo/*" | /^\/([^\/?#]+)\/(.*?)$/ | "/hello%20world/how%20are%20you" | ["hello%20world", "how%20are%20you"] | ["hello%20world", "how%20are%20you"]
"/*/foo/*/*" | /^\/(.*?)\/foo\/(.*?)\/(.*?)$/ | "/bar/foo/bling/baz/boom" | ["bar", "bling", "baz/boom"] | ["bar", "bling", "baz/boom"]
"/*/foo/*/*" | /^\/(.*?)\/foo\/(.*?)\/(.*?)$/ | "/bar/foo/baz" | nil | nil
"/:name.?:format?" | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/ | "/foo" | ["foo", nil] | ["foo", nil]
"/:name.?:format?" | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/ | "/.bar" | [".bar", nil] | [".bar", nil]
Broken
======
Pattern | Current Regexp | Example | Should Be | Is Currently
---------------------|----------------------------------------------------------|----------------------------------|--------------------------------------|--------------------------------------
"/:name.?:format?" | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/ | "/foo.bar" | ["foo", "bar"] | ["foo.bar", nil]
"/:name.?:format?" | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/ | "/foo%2Ebar" | ["foo", "bar"] | ["foo%2Ebar", nil]
"/:user@?:host?" | /^\/([^\/?#]+)(?:@|%40)?([^\/?#]+)?$/ | "/foo@bar" | ["foo", "bar"] | ["foo@bar", nil]
"/:user@?:host?" | /^\/([^\/?#]+)(?:@|%40)?([^\/?#]+)?$/ | "/foo.foo@bar" | ["foo.foo", "bar"] | ["foo.foo@bar", nil]
"/:user@?:host?" | /^\/([^\/?#]+)(?:@|%40)?([^\/?#]+)?$/ | "/foo@bar.bar" | ["foo", "bar.bar"] | ["foo@bar.bar", nil]
Code to generate list: https://gist.github.com/0c9b9e2521e0833a67b2
Great! Does the result need to be an Array? Is #match
used? Why not #split
?
I'm asking this because:
r = /[\/\.%2E]+/
p "/foo.bar" .split(r) # => ["", "foo", "bar"]
p "/foo%2Ebar" .split(r) # => ["", "foo", "bar"]
r = /[\/@]+/
p "/foo@bar" .split(r) # => ["", "foo", "bar"]
p "/foo.foo@bar".split(r) # => ["", "foo.foo", "bar"]
p "/foo@bar.bar".split(r) # => ["", "foo", "bar.bar"]
Which is very close to the expected behavior, while being relatively simple (to generate as well). Cheers!
match
is used. So you mean we should detect that scenario and "fix" it after matching?
I wonder about the usage of match
since using split
seems a much better fit (in terms of generating the regexps and mental mapping), still assuming that an array of parts is needed as a result.
No, I am suggesting to use split
if that is a possibility.
I don't get yet what the program flow would be like with split.
Can you point me to the match
in the Sinatra code, please? :) (Then I am able to help much better)
Note: From your table I assumed you actually need an array of the parts as output, which is why I offered the idea of using split
.
Thanks! I see.
Ok then, match
it is:
r = /^\/([^\/?#]+)(?:\.|%2E)+([^\/?#]+)?$/
p "/foo.bar".match(r)
p "/foo%2Ebar".match(r)
r = /^\/([^\/?#]+)(?:@|%40)+([^\/?#]+)?$/
p "/foo@bar".match(r)
p "/foo.foo@bar".match(r)
p "/foo@bar.bar".match(r)
(Both differ from the original in 1 position: ?
-> +
)
Cheers!
That doesn't match "/foo"
.
I'm beginning to understand. Also: How about being more constructive? :)
Haha, I'm sorry, I super thankful for your investigation so far. I just don't know if this is even possible.
Yeah, my problem was that I didn't understand what Sinatra was actually doing – but now I do. Source code reading helps :)
I believe I got one. On to the next.
r = /^\/([^\.%2E\/?#]+)(?:\.|%2E)?([^\.%2E\/?#]+)?$/
p "/foo".match(r)
p "/foo.bar".match(r)
p "/foo%2Ebar".match(r)
The second one is probably
r = /^\/([^@%40\/?#]+)(?:@|%40)?([^@%40\/?#]+)?$/
p "/foo".match(r)
p "/foo@bar".match(r)
p "/foo.foo@bar".match(r)
p "/foo@bar.bar".match(r)
Can you check, @rkh?
OK, that just means we'll have to do special parsing (i.e. recognize that it's of to have :format?
not match the .
). Might need to do proper parsing of the pattern then instead of just some replacements + Regexp.new
.
Unsure about that. The patterns I posted actually mimic a pattern that I've seen above, and which could be summarized as:
Match all in group except the separator character(s) up to the separator character(s), then continue matching all in group except the separator character(s).
E.g. /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/
Are these the routing tests? https://github.com/sinatra/sinatra/blob/v1.3.2/test/routing_test.rb Maybe we can add the nice table above to it? :) (If it's not in there already and I've overlooked it)
Yes, except that '.' is not a separator character. The issue is, at the moment we do a simple gsub
for :format
, so we don't know that :format
is actually part of :name.?:format?
and I got the feeling that in order to solve this properly we'd need a proper parser, as this is not describable with a regexp.
Are these the routing tests? https://github.com/sinatra/sinatra/blob/v1.3.2/test/routing_test.rb Maybe we can add the nice table above to it? :) (If it's not in there already and I've overlooked it)
All but the failing ones are actually in there already (that's were I got em from).
Good point with the "." not being a separator character. A parser feels like the way to go here, but I had a feeling it might be doable using regexps. I'll have a look at Base#compile.
Thanks, I believe the tests would benefit a lot from being in a tabular form, but that's just me :)
Yeah, I meant a parser for the pattern -> regexp
step, not the request path -> route
parsing, I would still use a regexp there. the real question would be how to generate it. I would also like :name(.:format)?
to be possible.
Yes, I got that! :)
Ok, rewriting the challenge as: Find a way to elegantly map the given patterns into their corresponding regexps, such that they work for all given examples.
Full Pattern - Regexp mapping:
/ | /^\/$/
/foo | /^\/foo$/
/f\u00F6\u00F6 | /^\/f%C3%B6%C3%B6$/
/:foo | /^\/([^\/?#]+)$/
/:foo/:bar | /^\/([^\/?#]+)\/([^\/?#]+)$/
/hello/:person | /^\/hello\/([^\/?#]+)$/
/?:foo?/?:bar? | /^\/?([^\/?#]+)?\/?([^\/?#]+)?$/
/* | /^\/(.*?)$/
/:foo/* | /^\/([^\/?#]+)\/(.*?)$/
/test.bar | /^\/test(?:\.|%2E)bar$/
/test$/ | /^\/test(?:\$|%24)\/$/
/te+st/ | /^\/te(?:\+|%2B)st\/$/
/test(bar)/ | /^\/test(?:\(|%28)bar(?:\)|%29)\/$/
/path with spaces | /^\/path(?:%20|(?:\+|%2B))with(?:%20|(?:\+|%2B))spaces$/
/foo&bar | /^\/foo(?:&|%26)bar$/
/*/foo/*/* | /^\/(.*?)\/foo\/(.*?)\/(.*?)$/
/:file.:ext | /^\/([^\/?#]+)(?:\.|%2E)([^\/?#]+)$/
/:name.?:format? | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/
/:user@?:host? | /^\/([^@%40\/?#]+)(?:@|%40)?([^@%40\/?#]+)?$/
/:name.?:format? | /^\/([^\.%2E\/?#]+)(?:\.|%2E)?([^\.%2E\/?#]+)?$/
I wouldn't mind working on this, if you don't mind :)
Ok, almost got it. Cleaning up the code and preparing for a pull request :)
Note: I'm assuming this example should actually be nil
. At least it looks like that to me. If I am wrong, please tell me.
"/:name.?:format? | /^\/([^\/?#]+)(?:\.|%2E)?([^\/?#]+)?$/ | /.bar | [.bar, nil]"
Let's move this to: sinatra/sinatra#492 Cheers!
Thanks!