Skip to content

Instantly share code, notes, and snippets.

@dkubb
Created August 10, 2011 22:44
Show Gist options
  • Save dkubb/1138480 to your computer and use it in GitHub Desktop.
Save dkubb/1138480 to your computer and use it in GitHub Desktop.
Web framework Spike
rvm use @$(basename `pwd`) --create
The following are some ideas of a web framework I've wanted to build for a time
but haven't yet because I'm focusing on other things.
NOTE: these are mostly random thoughts and I probably need to spike something
out before they become coherent.
- Is not based on Rack
- Requests and responses are not buffered, and are instead treated as streams
- Uses at least one thread per core, each with an event loop listening to
a socket and handling requests independently.
- Look at cool.io for handling IO and http_parser.rb for parsing HTTP.
- Each request would be handled by a state machine that changes state and starts
routing the request *as* the request is being streamed in.
- When the request line is read in, specifically the method and URL, the
state machine will attempt to find a matching resource immediately. If it
cannot, it will return a 404. If it can, then it will ask the resource
what headers it needs (which should be declared in the resource) and
keep reading the input until they are seen, initializing the resource
at that point.
- Refer to the following flowchart when writing the request handler
for the state machine: http://webmachine.basho.com/diagram.html
- All HTML partial/fragments are included using ESI. Each fragment is generated
by a separate handler, and has an independent caching policy.
- By default everything is cachable forerver. Handlers can configure if
a resource cannot be cached, or if it can only be cached for a limited
time.
- All responses include an ETag/Last-Modified header generated using the
state of the resource object.
- Request headers are unknown to the event handlers unless explicitly
declared. When declared they will be added to the response Vary header,
because it will be assumed they were used in the decision making process.
- Instead of controllers there will be resource objects. Their constructor
will load the object and setup the state.
- GET/HEAD requests will not execute a handler, since they just return the
state of the object which was setup in the constructor.
- PUT/POST/DELETE handlers will be able to modify the state of the object.
- Similar to this:
http://roberthahn.ca/articles/2007/08/17/the-ideal-rest-framework/
- Each resource will be able to specify what methods are allowed given the
request context.
- OPTIONS handlers will use this to set the Allow response header.
- Conditional request handling should be baked in. Before a handler is
invoked the system should check the appropriate If-* header and return
a 304 or 412 depending on the method.
- I want to decouple input processing, the event handler, and output
generation. The event handler will have no knowledge of input/output
formats, etc. It will just be concerned about carrying out the state
change (in the event of put/post/delete, etc).
- Input processing is handled by an object that works with streaming
input. The proper handler will be negotiated based on the request
content-type.
- Output processing can be done by something similar to a presenter.
It will be able to use content negotiation to choose the best
representation for the resource.
- Resource methods will be able to set state for the representation,
add general/response headers, and return the HTTP status, but
otherwise cannot affect the entity headers or representation directly.
- The general/response headers should be written to the socket as soon
as possible, ideally as soon as the resource method is finished. Then
the entity headers can be written after negotiation.
#!/usr/bin/env ruby -Ku
# encoding: utf-8
require 'set'
require 'time' # for Time#httpdate
require 'rubygems'
require 'cool.io'
require 'http/parser' # http_parser.rb
require 'query_string_parser'
HOST = 'localhost'
PORT = 3000
CRLF = "\r\n"
STATUS_WITH_NO_ENTITY_BODY = Set.new((100..199).to_a << 204 << 205 << 304)
class Resources
include Enumerable
class Error < StandardError
def to_s
status_line
end
end
class ClientError < Error; end
class ServerError < Error; end
class NotFound < ClientError
attr_reader :path
def initialize(path)
@path = path
end
def status_line
'404 Not Found'
end
def to_s
"#{super} (#{path})"
end
end
class MethodNotAllowed < ClientError
# TODO: make a method that allows the response header to be modified and he Allow header added
attr_reader :method, :path
def intiailize(method, path)
@method = method
@path = path
end
def status_line
'405 Method Not Allowed'
end
def to_s
"#{super} (#{method} for #{path})"
end
end
class NotImplemented < ServerError
attr_reader :method
def initialize(method)
@method = method
end
def status_line
'501 Not Implemented'
end
def to_s
"#{super} (#{method})"
end
end
def initialize(resources = {})
@resources = resources
end
def [](request)
method = request.method
assert_valid_method(method)
lookup(request.path).assert_valid_method(method).new(request.query)
end
def each
return to_enum unless block_given?
@resources.each { |uri, klass| yield uri, klass }
self
end
def assert_valid_method(method)
raise NotImplemented, method unless method_allowed?(method)
self
end
private
def method_allowed?(method)
Server.method_allowed?(method)
end
def lookup(path)
# TODO: handle lookups using regexp constraints
@resources.fetch(path) { raise NotFound, path }
end
end
# Responsible for negotiating the best parser
class Request
attr_reader :method, :path, :query, :headers
attr_accessor :body
def initialize(method, path, query, headers)
@method = method
@path = path
@query = query
@headers = headers
end
def safe?
# XXX: maybe make SafeRequest and UnsafeRequest classes
# - would allow #negotiate to be a noop
# - would allow @body to be a null object
method == 'get' || method == 'head'
end
def negotiate
@body = parser.new(self)
self
end
def dispatch(resource, request_method = method)
method = resource.public_method(request_method)
case method.arity
when 2 then method.call(headers, body)
when 1 then method.call(headers)
else
raise ArgumentError, "Invalid number arguments to #{resource.class}##{request_method}: #{method.airty}"
end
end
private
def parser
# TODO: negotiate a parser for the request body, return 415 if not supported
safe? ? NullParser : Parser
end
end
# Responsible for parsing the request body
class Parser
attr_reader :string # XXX: temporary until the subclasses provide accessors
def initialize(request, string = '')
@request = request
@string = string
end
def <<(chunk)
@string << chunk # TODO: parse the chunk
self
end
end
class NullParser < Parser
def <<(chunk)
self
end
end
# Responsible for negotiating the best representation
class Response
end
# Responsible for returning the response body
class Representation
end
class Handler
attr_reader :last_modified
# TODO: memoize this
def self.method_allowed?(method)
# TODO: should make sure the method is an actual valid HTTP method
public_instance_methods.any? { |meth| meth.to_s == method }
end
def self.assert_valid_method(method)
raise MethodNotAllowed, method unless method_allowed?(method)
self
end
def initialize
@last_modified = Time.now
end
def handle(request, request_method = request.method)
status = request.dispatch(self, request_method)
# TODO: coerce into a response object
status
end
def options(headers, body)
# TODO: set the Allow headers
'204 No Content' # TODO: replace with a constant
end
end
# Responsible for managing state for a URL
class Resource < Handler
# TODO: should memoize the response for idempotent methods
def initialize(query = {})
super()
@query = query
end
def get(headers)
'200 OK' # TODO: replace with a constant
end
def head(headers)
get(headers)
end
def post(headers, body)
'201 Created' # TODO: replace with a constant
end
end
class Server < Handler
URI = '*'.freeze
def self.resources(resources = {})
@resources = Resources.new(resources.merge(URI => self))
end
# TODO: memoize this
def self.method_allowed?(method)
@resources.any? do |_uri, klass|
if equal?(klass)
super
else
klass.method_allowed?(method)
end
end
end
def initialize(*)
super()
end
end
class HttpServerConnection < Cool.io::TCPSocket
def initialize(*)
super
@parser = HTTP::Parser.new
@resources = Server.resources(
'/' => Resource,
'/favicon.ico' => Resource,
'/{id}' => Resource # XXX: does not work
)
end
def on_connect
@parser.on_headers_complete = lambda do |headers|
receive_request(headers)
load_resource
end
@parser.on_body = lambda do |chunk|
parse_body(chunk)
end
@parser.on_message_complete = lambda do
handle_request
send_response
end
end
def on_read(data)
@parser << data
rescue HTTP::Parser::Error
send_error('400 Bad Request')
rescue Resources::ClientError, Resources::ServerError => e
send_error(e.status_line)
end
def write(data)
super data + CRLF
end
private
def receive_request(headers)
@request = Request.new(
@parser.http_method.downcase, # TODO: coerce into a method object
@parser.request_path, # TODO: coerce into a path object
QueryStringParser.qs_parse(@parser.query_string), # TODO: coerce into a query object
headers # TODO: coerce into Header objects
)
@resource = @resources[@request]
# TODO: authenticate the user agent, return 401 on failure
# TODO: authorize the user agent, return 403 on failure
# TODO: create a service object that coordinates conneg
@request.negotiate
# TODO: negotiate the representation based on Accept* headers, return 406 if not supported
# TODO: perform conditional request handling using If-* headers, if not validated return 304 or 412 depending on the method.
end
def load_resource
# FIXME: GET * HTTP/1.0 does not return a 405 for some reason
# perform the GET request to populate the resource state
unless @request.method == 'options'
@status = @resource.handle(@request, 'get')
end
# TODO: abort the request if the status is a client or server error
raise "Error: #{@status}" if @status && @status !~ /\A2\d{2}\b/ # FIXME once @status is a real response object
end
def parse_body(chunk)
@request.body << chunk
end
def handle_request
@status = @resource.handle(@request) unless @request.safe?
end
def send_response
# TODO: send the general/response/entity headers (in that order)
write "HTTP/1.1 #{@status}"
write "Date: #{Time.now.httpdate}"
write 'Transfer-Encoding: chunked' if message_body?
# TODO: send the headers
# - ETag headers (if a GET/HEAD request)
# - Vary (based on which request headers were used)
# TODO: allow the Cacne-Control to be configured by the resource
write "Cache-Control: public" if @request.safe?
# TODO: have the representation add the Last-Modified headers, which
# can get it from the resource like below *OR* use the template mtime,
# whichever is greater
write "Last-Modified: #{@resource.last_modified.httpdate}" if @request.safe?
# TODO: have the representation set this
write 'Content-Type: text/plain' if message_body?
write ''
# TODO: have the response object handle writing, and should know when to skip it
unless @request.method == 'head' || @status == '204 No Content'
# TODO: pass the resource to the representation, and have it stream out the response
chunks = [ 'Hello', ' ', 'World' ]
chunks.each do |chunk|
write chunk.bytesize.to_s(16)
write chunk
end
end
# last chunk needs to be zero length
write '0'
write ''
# TODO: allow persistent connections if supported
on_write_complete { close }
end
def send_error(status_line, body = status_line)
write "HTTP/1.0 #{status_line}"
write "Date: #{Time.now.httpdate}"
write 'Content-Type: text/plain'
write "Content-Length: #{body.bytesize}"
write ''
write body
on_write_complete { close }
end
def message_body?
@message_body ||= !STATUS_WITH_NO_ENTITY_BODY.include?(@status.split(' ', 2).first.to_i)
end
end
server = Cool.io::TCPServer.new(HOST, PORT, HttpServerConnection)
server.attach(Cool.io::Loop.default)
$stderr.puts "HTTP Server listening on #{HOST}:#{PORT}\n"
Cool.io::Loop.default.run
@solnic
Copy link

solnic commented Aug 14, 2011

@dkubb I'm not sure if I understand that part correctly: "Each request would be handled by a state machine that changes state and starts
routing the request as the request is being streamed in" - I'm trying to imagine how that would work and I fail.

@dkubb
Copy link
Author

dkubb commented Aug 14, 2011

@solnic it's probably a concept better explained in code than in words since everyone writes their state machine code differently, and the popular ruby implementations are a bit weird. There are also actually two ideas in this sentence, so it makes it extra confusing.

Routing as the request is streamed in is the easiest to explain. When an HTTP request is sent, it sends over the method and URI immediately. It should be possible at that point in time to see if any routes exist for the URI at all. If not, then the system could possibly return 404 right away. Then as headers are read in, we can wait until the headers the handler needs are seen before we perform the action. There's no real need to wait for anything else to run the action and start the response if all the dependencies are met.

Now, this is the ideal. In doing a code spike I found http_parser.rb doesn't really allow you to specify event handers for "on_url" and "on_header", you can only really specify a handler for after the headers are read. While the other approach I specified would be faster in theory, it would also require writing my own HTTP parser lib, and I doubt I would be able to match http_parser.rb for speed and robustness (it's the same parser node.js uses, so it's also better tested than any one-off I could write).

Now the state machine is a bit more fuzzy since I don't have an implementation. I was looking at it because it has a couple of properties I like: state changes are explicit and easy to reason about, it's efficient in terms of memory, it should allow me to extend it with less effort than something where all the logic is inlined. I will probably be able to explain more once I have something spiked out (if I even do it for the spike, which I'm not sure is necessary to prove the overall approach yet).

@dkubb
Copy link
Author

dkubb commented Aug 15, 2011

@solnic one other thing I forgot to mention is that having a state machine where the events and state changes are explicit, it makes for natural places to have hooks. So for example, if I had an event handler like "on_read_cookie", then I could have a before/after hook for that specific event.

@dkubb
Copy link
Author

dkubb commented Aug 18, 2011

Ok, I think I'm mostly done with this spike. I think I got to test all the hard stuff out, and the remaining TODO's are mostly trivial to implement.

If I continue with this I'll probably TDD it properly, and build it up in a real repo. If I decide to go further, I'll post the link to the new repo here.

@solnic
Copy link

solnic commented Aug 18, 2011

@dkubb btw have you seen cramp framework?

@dkubb
Copy link
Author

dkubb commented Aug 18, 2011

@solnic yeah I have, and I think it's an awesome idea. I do think that some frameworks try too hard to emulate rails though, especially the ideas that aren't very good.

One thing I really dislike is the mapping between resources and routes. I would rather work at the resource level, and not hide the fact that I'm doing REST stuff. This may seem funny because it's the exact opposite of how I feel about how ORMs should work. I don't think introducing indirection into how requests are handled helps much.

I also don't like the idea of the resource method actually knowing too much about the request and response. One key difference I would have is that they are decoupled from the action the client asked to be performed. The input and outputs aren't something that the request handler should know about, directly anyway. It can receive some input that has been parsed and normalized into an object that hides the original format (if possible). The output can be performed by something presenter-like that knows how to render the view given the model. An object coordinates the input processing, resource method execution and presenter rendering.

In fact thinking about this more, without really intending it, what I've described is how MVC is supposed to work. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment