-
-
Save dkubb/1138480 to your computer and use it in GitHub Desktop.
rvm use @$(basename `pwd`) --create |
The following are some ideas of a web framework I've wanted to build for a time | |
but haven't yet because I'm focusing on other things. | |
NOTE: these are mostly random thoughts and I probably need to spike something | |
out before they become coherent. | |
- Is not based on Rack | |
- Requests and responses are not buffered, and are instead treated as streams | |
- Uses at least one thread per core, each with an event loop listening to | |
a socket and handling requests independently. | |
- Look at cool.io for handling IO and http_parser.rb for parsing HTTP. | |
- Each request would be handled by a state machine that changes state and starts | |
routing the request *as* the request is being streamed in. | |
- When the request line is read in, specifically the method and URL, the | |
state machine will attempt to find a matching resource immediately. If it | |
cannot, it will return a 404. If it can, then it will ask the resource | |
what headers it needs (which should be declared in the resource) and | |
keep reading the input until they are seen, initializing the resource | |
at that point. | |
- Refer to the following flowchart when writing the request handler | |
for the state machine: http://webmachine.basho.com/diagram.html | |
- All HTML partial/fragments are included using ESI. Each fragment is generated | |
by a separate handler, and has an independent caching policy. | |
- By default everything is cachable forerver. Handlers can configure if | |
a resource cannot be cached, or if it can only be cached for a limited | |
time. | |
- All responses include an ETag/Last-Modified header generated using the | |
state of the resource object. | |
- Request headers are unknown to the event handlers unless explicitly | |
declared. When declared they will be added to the response Vary header, | |
because it will be assumed they were used in the decision making process. | |
- Instead of controllers there will be resource objects. Their constructor | |
will load the object and setup the state. | |
- GET/HEAD requests will not execute a handler, since they just return the | |
state of the object which was setup in the constructor. | |
- PUT/POST/DELETE handlers will be able to modify the state of the object. | |
- Similar to this: | |
http://roberthahn.ca/articles/2007/08/17/the-ideal-rest-framework/ | |
- Each resource will be able to specify what methods are allowed given the | |
request context. | |
- OPTIONS handlers will use this to set the Allow response header. | |
- Conditional request handling should be baked in. Before a handler is | |
invoked the system should check the appropriate If-* header and return | |
a 304 or 412 depending on the method. | |
- I want to decouple input processing, the event handler, and output | |
generation. The event handler will have no knowledge of input/output | |
formats, etc. It will just be concerned about carrying out the state | |
change (in the event of put/post/delete, etc). | |
- Input processing is handled by an object that works with streaming | |
input. The proper handler will be negotiated based on the request | |
content-type. | |
- Output processing can be done by something similar to a presenter. | |
It will be able to use content negotiation to choose the best | |
representation for the resource. | |
- Resource methods will be able to set state for the representation, | |
add general/response headers, and return the HTTP status, but | |
otherwise cannot affect the entity headers or representation directly. | |
- The general/response headers should be written to the socket as soon | |
as possible, ideally as soon as the resource method is finished. Then | |
the entity headers can be written after negotiation. |
#!/usr/bin/env ruby -Ku | |
# encoding: utf-8 | |
require 'set' | |
require 'time' # for Time#httpdate | |
require 'rubygems' | |
require 'cool.io' | |
require 'http/parser' # http_parser.rb | |
require 'query_string_parser' | |
HOST = 'localhost' | |
PORT = 3000 | |
CRLF = "\r\n" | |
STATUS_WITH_NO_ENTITY_BODY = Set.new((100..199).to_a << 204 << 205 << 304) | |
class Resources | |
include Enumerable | |
class Error < StandardError | |
def to_s | |
status_line | |
end | |
end | |
class ClientError < Error; end | |
class ServerError < Error; end | |
class NotFound < ClientError | |
attr_reader :path | |
def initialize(path) | |
@path = path | |
end | |
def status_line | |
'404 Not Found' | |
end | |
def to_s | |
"#{super} (#{path})" | |
end | |
end | |
class MethodNotAllowed < ClientError | |
# TODO: make a method that allows the response header to be modified and he Allow header added | |
attr_reader :method, :path | |
def intiailize(method, path) | |
@method = method | |
@path = path | |
end | |
def status_line | |
'405 Method Not Allowed' | |
end | |
def to_s | |
"#{super} (#{method} for #{path})" | |
end | |
end | |
class NotImplemented < ServerError | |
attr_reader :method | |
def initialize(method) | |
@method = method | |
end | |
def status_line | |
'501 Not Implemented' | |
end | |
def to_s | |
"#{super} (#{method})" | |
end | |
end | |
def initialize(resources = {}) | |
@resources = resources | |
end | |
def [](request) | |
method = request.method | |
assert_valid_method(method) | |
lookup(request.path).assert_valid_method(method).new(request.query) | |
end | |
def each | |
return to_enum unless block_given? | |
@resources.each { |uri, klass| yield uri, klass } | |
self | |
end | |
def assert_valid_method(method) | |
raise NotImplemented, method unless method_allowed?(method) | |
self | |
end | |
private | |
def method_allowed?(method) | |
Server.method_allowed?(method) | |
end | |
def lookup(path) | |
# TODO: handle lookups using regexp constraints | |
@resources.fetch(path) { raise NotFound, path } | |
end | |
end | |
# Responsible for negotiating the best parser | |
class Request | |
attr_reader :method, :path, :query, :headers | |
attr_accessor :body | |
def initialize(method, path, query, headers) | |
@method = method | |
@path = path | |
@query = query | |
@headers = headers | |
end | |
def safe? | |
# XXX: maybe make SafeRequest and UnsafeRequest classes | |
# - would allow #negotiate to be a noop | |
# - would allow @body to be a null object | |
method == 'get' || method == 'head' | |
end | |
def negotiate | |
@body = parser.new(self) | |
self | |
end | |
def dispatch(resource, request_method = method) | |
method = resource.public_method(request_method) | |
case method.arity | |
when 2 then method.call(headers, body) | |
when 1 then method.call(headers) | |
else | |
raise ArgumentError, "Invalid number arguments to #{resource.class}##{request_method}: #{method.airty}" | |
end | |
end | |
private | |
def parser | |
# TODO: negotiate a parser for the request body, return 415 if not supported | |
safe? ? NullParser : Parser | |
end | |
end | |
# Responsible for parsing the request body | |
class Parser | |
attr_reader :string # XXX: temporary until the subclasses provide accessors | |
def initialize(request, string = '') | |
@request = request | |
@string = string | |
end | |
def <<(chunk) | |
@string << chunk # TODO: parse the chunk | |
self | |
end | |
end | |
class NullParser < Parser | |
def <<(chunk) | |
self | |
end | |
end | |
# Responsible for negotiating the best representation | |
class Response | |
end | |
# Responsible for returning the response body | |
class Representation | |
end | |
class Handler | |
attr_reader :last_modified | |
# TODO: memoize this | |
def self.method_allowed?(method) | |
# TODO: should make sure the method is an actual valid HTTP method | |
public_instance_methods.any? { |meth| meth.to_s == method } | |
end | |
def self.assert_valid_method(method) | |
raise MethodNotAllowed, method unless method_allowed?(method) | |
self | |
end | |
def initialize | |
@last_modified = Time.now | |
end | |
def handle(request, request_method = request.method) | |
status = request.dispatch(self, request_method) | |
# TODO: coerce into a response object | |
status | |
end | |
def options(headers, body) | |
# TODO: set the Allow headers | |
'204 No Content' # TODO: replace with a constant | |
end | |
end | |
# Responsible for managing state for a URL | |
class Resource < Handler | |
# TODO: should memoize the response for idempotent methods | |
def initialize(query = {}) | |
super() | |
@query = query | |
end | |
def get(headers) | |
'200 OK' # TODO: replace with a constant | |
end | |
def head(headers) | |
get(headers) | |
end | |
def post(headers, body) | |
'201 Created' # TODO: replace with a constant | |
end | |
end | |
class Server < Handler | |
URI = '*'.freeze | |
def self.resources(resources = {}) | |
@resources = Resources.new(resources.merge(URI => self)) | |
end | |
# TODO: memoize this | |
def self.method_allowed?(method) | |
@resources.any? do |_uri, klass| | |
if equal?(klass) | |
super | |
else | |
klass.method_allowed?(method) | |
end | |
end | |
end | |
def initialize(*) | |
super() | |
end | |
end | |
class HttpServerConnection < Cool.io::TCPSocket | |
def initialize(*) | |
super | |
@parser = HTTP::Parser.new | |
@resources = Server.resources( | |
'/' => Resource, | |
'/favicon.ico' => Resource, | |
'/{id}' => Resource # XXX: does not work | |
) | |
end | |
def on_connect | |
@parser.on_headers_complete = lambda do |headers| | |
receive_request(headers) | |
load_resource | |
end | |
@parser.on_body = lambda do |chunk| | |
parse_body(chunk) | |
end | |
@parser.on_message_complete = lambda do | |
handle_request | |
send_response | |
end | |
end | |
def on_read(data) | |
@parser << data | |
rescue HTTP::Parser::Error | |
send_error('400 Bad Request') | |
rescue Resources::ClientError, Resources::ServerError => e | |
send_error(e.status_line) | |
end | |
def write(data) | |
super data + CRLF | |
end | |
private | |
def receive_request(headers) | |
@request = Request.new( | |
@parser.http_method.downcase, # TODO: coerce into a method object | |
@parser.request_path, # TODO: coerce into a path object | |
QueryStringParser.qs_parse(@parser.query_string), # TODO: coerce into a query object | |
headers # TODO: coerce into Header objects | |
) | |
@resource = @resources[@request] | |
# TODO: authenticate the user agent, return 401 on failure | |
# TODO: authorize the user agent, return 403 on failure | |
# TODO: create a service object that coordinates conneg | |
@request.negotiate | |
# TODO: negotiate the representation based on Accept* headers, return 406 if not supported | |
# TODO: perform conditional request handling using If-* headers, if not validated return 304 or 412 depending on the method. | |
end | |
def load_resource | |
# FIXME: GET * HTTP/1.0 does not return a 405 for some reason | |
# perform the GET request to populate the resource state | |
unless @request.method == 'options' | |
@status = @resource.handle(@request, 'get') | |
end | |
# TODO: abort the request if the status is a client or server error | |
raise "Error: #{@status}" if @status && @status !~ /\A2\d{2}\b/ # FIXME once @status is a real response object | |
end | |
def parse_body(chunk) | |
@request.body << chunk | |
end | |
def handle_request | |
@status = @resource.handle(@request) unless @request.safe? | |
end | |
def send_response | |
# TODO: send the general/response/entity headers (in that order) | |
write "HTTP/1.1 #{@status}" | |
write "Date: #{Time.now.httpdate}" | |
write 'Transfer-Encoding: chunked' if message_body? | |
# TODO: send the headers | |
# - ETag headers (if a GET/HEAD request) | |
# - Vary (based on which request headers were used) | |
# TODO: allow the Cacne-Control to be configured by the resource | |
write "Cache-Control: public" if @request.safe? | |
# TODO: have the representation add the Last-Modified headers, which | |
# can get it from the resource like below *OR* use the template mtime, | |
# whichever is greater | |
write "Last-Modified: #{@resource.last_modified.httpdate}" if @request.safe? | |
# TODO: have the representation set this | |
write 'Content-Type: text/plain' if message_body? | |
write '' | |
# TODO: have the response object handle writing, and should know when to skip it | |
unless @request.method == 'head' || @status == '204 No Content' | |
# TODO: pass the resource to the representation, and have it stream out the response | |
chunks = [ 'Hello', ' ', 'World' ] | |
chunks.each do |chunk| | |
write chunk.bytesize.to_s(16) | |
write chunk | |
end | |
end | |
# last chunk needs to be zero length | |
write '0' | |
write '' | |
# TODO: allow persistent connections if supported | |
on_write_complete { close } | |
end | |
def send_error(status_line, body = status_line) | |
write "HTTP/1.0 #{status_line}" | |
write "Date: #{Time.now.httpdate}" | |
write 'Content-Type: text/plain' | |
write "Content-Length: #{body.bytesize}" | |
write '' | |
write body | |
on_write_complete { close } | |
end | |
def message_body? | |
@message_body ||= !STATUS_WITH_NO_ENTITY_BODY.include?(@status.split(' ', 2).first.to_i) | |
end | |
end | |
server = Cool.io::TCPServer.new(HOST, PORT, HttpServerConnection) | |
server.attach(Cool.io::Loop.default) | |
$stderr.puts "HTTP Server listening on #{HOST}:#{PORT}\n" | |
Cool.io::Loop.default.run |
@solnic it's probably a concept better explained in code than in words since everyone writes their state machine code differently, and the popular ruby implementations are a bit weird. There are also actually two ideas in this sentence, so it makes it extra confusing.
Routing as the request is streamed in is the easiest to explain. When an HTTP request is sent, it sends over the method and URI immediately. It should be possible at that point in time to see if any routes exist for the URI at all. If not, then the system could possibly return 404 right away. Then as headers are read in, we can wait until the headers the handler needs are seen before we perform the action. There's no real need to wait for anything else to run the action and start the response if all the dependencies are met.
Now, this is the ideal. In doing a code spike I found http_parser.rb doesn't really allow you to specify event handers for "on_url" and "on_header", you can only really specify a handler for after the headers are read. While the other approach I specified would be faster in theory, it would also require writing my own HTTP parser lib, and I doubt I would be able to match http_parser.rb for speed and robustness (it's the same parser node.js uses, so it's also better tested than any one-off I could write).
Now the state machine is a bit more fuzzy since I don't have an implementation. I was looking at it because it has a couple of properties I like: state changes are explicit and easy to reason about, it's efficient in terms of memory, it should allow me to extend it with less effort than something where all the logic is inlined. I will probably be able to explain more once I have something spiked out (if I even do it for the spike, which I'm not sure is necessary to prove the overall approach yet).
@solnic one other thing I forgot to mention is that having a state machine where the events and state changes are explicit, it makes for natural places to have hooks. So for example, if I had an event handler like "on_read_cookie", then I could have a before/after hook for that specific event.
Ok, I think I'm mostly done with this spike. I think I got to test all the hard stuff out, and the remaining TODO's are mostly trivial to implement.
If I continue with this I'll probably TDD it properly, and build it up in a real repo. If I decide to go further, I'll post the link to the new repo here.
@dkubb btw have you seen cramp framework?
@solnic yeah I have, and I think it's an awesome idea. I do think that some frameworks try too hard to emulate rails though, especially the ideas that aren't very good.
One thing I really dislike is the mapping between resources and routes. I would rather work at the resource level, and not hide the fact that I'm doing REST stuff. This may seem funny because it's the exact opposite of how I feel about how ORMs should work. I don't think introducing indirection into how requests are handled helps much.
I also don't like the idea of the resource method actually knowing too much about the request and response. One key difference I would have is that they are decoupled from the action the client asked to be performed. The input and outputs aren't something that the request handler should know about, directly anyway. It can receive some input that has been parsed and normalized into an object that hides the original format (if possible). The output can be performed by something presenter-like that knows how to render the view given the model. An object coordinates the input processing, resource method execution and presenter rendering.
In fact thinking about this more, without really intending it, what I've described is how MVC is supposed to work. :)
@dkubb I'm not sure if I understand that part correctly: "Each request would be handled by a state machine that changes state and starts
routing the request as the request is being streamed in" - I'm trying to imagine how that would work and I fail.