Skip to content

Instantly share code, notes, and snippets.

@t0yv0
Created June 30, 2011 01:07
Show Gist options
  • Save t0yv0/1055424 to your computer and use it in GitHub Desktop.
Save t0yv0/1055424 to your computer and use it in GitHub Desktop.
An attempt to improve on OWIN<http://owin.org> in F#.
(*
// # FSCGI - F# Common Gateway Interface
//
// This gist is a response to OWIN <http://owin.org/>.
// This gist is public domain.
//
// See also the discussion stating OWIN rationale and how it is better than FSCGI:
// http://groups.google.com/group/net-http-abstractions/browse_thread/thread/ac3d7c1e3d43c1d4
//
// ## Problem Summary
//
// The goal of OWIN is to provide a low-level .NET standard for web apps to
// communicate with their environments, filling the similar role as Java
// Servlet Specification, FastCGI, SCGI (Python) and the like.
//
// OWIN as of Mar 13, 2011 is problematic in the following respects:
//
// * poor use of static typing, for example uses Dictionary<String,Object>
// * gratuitous complexity of the definition (nested Func<_> callbacks)
// * reliance on prose to communicate invariants that can be type-enforced
//
// ## Solution Summary
//
// * use more explicit typing
// * use an iteratee-based representation for the IO process
// * simplify exception handling by the rule: application code MUST NOT throw
// any exceptions; doing so is indicating a programming error and will be
// treated in a host-dependent way.
//
*)
/// Defines the common gateway interface protocol for F#.
namespace FSCGI
type Data = System.ArraySegment<byte>
type Headers = Map<string,string>
type StatusCode = int
type Status = string
type Request =
{
Headers : Headers
Method : string
Path : string
PathBase : string
QueryString : string
Scheme : string
}
type State =
| Closed
| Open
type Writer =
| Done
| Write of (State -> Data * Writer)
type Response =
| Read of (option<Data> -> Response)
| Respond of StatusCode * Status * Headers * Writer
type Application =
Request -> Response
/// Converts structural encodings to proper FSCGI.* types.
module FSCGI.Structural.Converter
type private Encodings<'R,'W> =
('W -> Writer<'W>) *
('R -> Response<'R,'W>)
let private ConvertRequest (r : FSCGI.Request) : Request =
(
r.Headers,
r.Method,
r.Path,
r.PathBase,
r.QueryString,
r.Scheme
)
let rec private ConvertWriter ((eW, _) as enc : Encodings<'R,'W>)
(w: Writer<'W>) : FSCGI.Writer =
match w with
| Choice1Of2 () ->
FSCGI.Done
| Choice2Of2 f ->
FSCGI.Write (fun state ->
let isOpen =
match state with
| FSCGI.Closed -> false
| FSCGI.Open -> true
let (data, writer) = f isOpen
(data, ConvertWriter enc (eW writer)))
let rec private ConvertResponse ((eW, eR) as enc : Encodings<'R,'W>)
(r: Response<'R,'W>) : FSCGI.Response =
match r with
| Choice1Of2 f ->
FSCGI.Read (fun d -> ConvertResponse enc (eR (f d)))
| Choice2Of2 (a, b, c, d) ->
FSCGI.Respond (a, b, c, ConvertWriter enc (eW d))
let Convert ((eW, eR, run): Application<'R,'W>) : FSCGI.Application =
ConvertResponse (eW, eR) << run << ConvertRequest
/// Provides structural encodings for the FSCGI.* types.
namespace FSCGI.Structural
type Data = System.ArraySegment<byte>
type HeaderName = string
type HeaderValue = string
type Headers = Map<HeaderName,HeaderValue>
type Method = string
type Path = string
type PathBase = string
type QueryString = string
type Scheme = string
type StatusCode = int
type Status = string
type Request =
Headers * Method * Path * PathBase * QueryString * Scheme
type Writer<'W> =
Choice<
unit,
bool -> Data * 'W
>
type Response<'R,'W> =
Choice<
option<Data> -> 'R,
StatusCode * Status * Headers * 'W
>
type Application<'R,'W> =
('W -> Writer<'W>) *
('R -> Response<'R,'W>) *
(Request -> Response<'R,'W>)
@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Thanks the input!

So, how exactly is WSGI better? Python simply does not have a typed option. On the OWIN list people are saying that one of the goal of OWIN is not to have an assembly, maybe that is why it is what it is.. Still beats me why people would want to avoid an assembly.

Here's the rationale behind the Input variant of the response: for certain requests the response headers are dependent on the request input (Data). Therefore, we need to give the response IO process a chance to read before emitting the headers. In this signature, unlike in OWIN, the headers dictionary is immutable, which makes sequencing explicit.

I kept the Read case in the Writer type thinking that perhaps reading and writing might be interleaved. I am not sure this is ever useful or practical. If it is not the case, we could drop the Read case and keep just Input, then the sequence of the automaton would be fixed: read the request, emit headers, write the response.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Summary of the OWIN position from a twitter discussion with @grumpydev:

OWIN avoids declaring named types because that leads to the need of having an assembly with those types, and the cost of shipping this assembly is deemed prohibitive.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

issue from @grumpydev: the container MUST iterate through the complete Response/Writer sequence because it does not have a way to communicate early termination to say, make the application close a file handle.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Accounted for cancellation by adding the Status parameter to the Write thunk. Now the container may send Closed to kindly ask the iteratee to stop producing data, close any pending handles, and return Done. If the iteratee does not comply, iteration continues.

@panesofglass
Copy link

Re: avoiding an assembly - we originally had a number of authors who had existing implementations and wanted to conform to an open spec. The easiest, meet-in-the-middle approach was to use delegate structures and no required type definitions. I think most understood that they would wrap the dictionary in a type within their implementation, much as Gate does. (Interesting that OWIN now has an assembly.) Further, the current spec is also directly implementable in Iron- languages.

@panesofglass
Copy link

Re: Writer | Read - my opinion is similar to yours wrt monads. I like the forced sequence of read the request, emit headers, write the response. I find this more approachable, even if you might lose a bit of speed reading the request. Of course, something in the request body may be malformed, which might alter your response. Also, why call it Writer if it is Iteratee?

@panesofglass
Copy link

Re: Response - I think I understand the Input portion, though I would have to see it. In your structural snippet, did you mean to write:

type Response<'W> =
    Choice<
        option<Data> -> Response,
        StatusCode * Status * Headers * Writer<'W>
    >

? I don't quite follow the Application definition in the structural snippet, either. It's a function that takes ??? and a request, then returns a response? I think that's an Iteratee, but I would think that should be apart of the request. On further review, I see that it is not. So then the application is a RequestBody -> Request -> Response, correct?

By the way, I like this. Do you mind if I start incorporating this into Frank and Fracture?

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Quite fascinating considerations, I would never have thought of Iron languages (F# bigotry) :)

Feel free to edit out the Read case then. I do not see how this would affect performance; it simply will disallow reading after writing has started (maybe a good thing). Performance-wise, my intuition is that iteratee-based approach will be fast, very fast. But we need to try.

Just looked at Gate - oh yes, something like this I expected to find alongside OWIN itself. Looks good! I am still a bit suspicious of the complexity of the protocol.

I have no strong opinions at all on naming. Iteratee is totally fine with me you prefer. Just note that Response<_> is also, in a sense, an iteratee, maybe that was my reasoning.

@panesofglass
Copy link

I also gladly accept pull requests. ;)

@panesofglass
Copy link

By the way, I agree about the complexity of the OWIN protocol. It's an attempt at emulating the same complexity used by node.js, which allows back pressure to be placed on the connection so that the server doesn't block or stack overflow. It's interesting, but I think there are other, potentially better ways of managing this, Lazy being quite an interesting option.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Please feel free to use code and/or ideas.

I made some mistakes in the above definition, let me fix them and then it will be clearer.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Thanks for the questions, it made me check the definition and spot that it did not compile.

As you spotted, the application type was wrong, and so was the Response type.

Basically structural types do not allow us to construct recursive types directly, therefore we need to encode them in generics, and ask for the user to provide the bijection between 'W and Writer and 'R and 'Response. Luckily, we only seem to need the projection part ('W -> Writer) to be able to reconstruct FSCGI.Application from FSCGI.Structural.Application.

I attach the conversion code to make it explicit.

The encoding makes 'W and Writer<'W> interchangeable. Turns out we also need 'R for Response encoding.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

To clarify structural encodings even more, what we expect the user to do is to define:

type W = W of Writer<W>
type R = R of Response<R,W>

let App : Application<R,W> =
  ((fun (W x) -> x), (fun (R x) -> x), ...)

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

I am amazed at the amount of notice Node.js gets, to us F# bigots it is totally unwarranted! :) I am not familiar with its workings, but I strongly suspect that what they are doing falls into the general framework of becoming more explicit about scheduling and doing it at the language level.

The trend is towards tackling async programming with language-level cooperative threads. State of the art for me is OCAML LWT "light-weight threading" - by now old and proven. OCAML's single-process model dictated this kind of a model where processes are encoded as 'a Lwt.t values with a monadic interface. Process interleaving is done by the library itself during the bind of the monad (hence cooperative behavior). OCAML users then found out that it actually performs great!

With WebSharper we are in the same boat on top of the JavaScript runtime - we have no threads available to the language. For the upcoming release we just did the LWT trick - Joel Bjornson re-implemented Async support to make use of a round-robin scheduler.

Iteratees do the same, they are just specialized for IO. But what they actually do, if you think about it, they encode processes as explicit state machines, with every state a value and every transition a function. Essentially the same as LWT, except the states are more constrained.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

The last commit eliminates the flexibility to start responding before the reading is complete, fixing the sequence of events: read-request, emit-headers, write-response - making the definition a bit simpler.

@panesofglass
Copy link

Funny, I was looking at the various Iteratee implementations, looked at some state machine code I had written for comparison, and was going to ask you whether Iteratee was essentially a constrained state machine, which you have already noted. :) I need to look into OCaml LWT. I'm not familiar with that.

@panesofglass
Copy link

Before I throw this into Frank, let me make sure I understand correctly. In this implementation, there is no Async, correct? This is all Iteratee-based? In other words, it uses a lazy state machine approach to construct the next chunk as needed rather than asynchronously accessing the underlying Stream? Would the underlying Stream access use Async? Is AsyncSeq a potentially appropriate mechanism for reading each chunk?

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Yes this is completely synchronous on purpose. The application wouldn't be able to use async in any other way then Async.RunSynchronously. The host might use Async or something else to multiplex serving several requests / application instances. I was thinking that after all that's the only difference in functionality to OWIN - in OWIN the application can do asynchronous stuff because OWIN gives it result-expecting callbacks.

Also to clear things up a bit, if we would have a mutable interface for the application, here's what would happen to it:

app.Request <- ...
while (input.DataAvailable && app.AcceptsData) do
    app.Accept(input.Read())
output.Write(app.GetResponse())
while (app.HasMoreData()) do
   output.Write(app.Read())

The F# hackery above is just trying to guarantee that these things happen in the order listed. The host also has a good chance to multiplex several apps:

app.Request <- ...
while (input.DataAvailable && app.AcceptsData) do
    (* here we might switch between applications *)
    app.Accept(input.Read())
output.Write(app.GetResponse())
while (app.HasMoreData()) do
   (* here we might switch between applications *)
   output.Write(app.Read())

@panesofglass
Copy link

The reason I added AsyncBody versions of the Body types in the new Frank signatures was to provide applications the ability to say, "I need to do some look-ups on my own; check back with me in a bit." Your examples would be useful in the case that an application is doing some immediate results, but I don't think it addresses the common need to call out to a database or web service from the server. Or am I missing that aspect?

@panesofglass
Copy link

Hmm... I suppose the application could use Async.RunWithContinuations and supply the Response write mechanism as a callback. That should still facilitate what you've described above. The application would then control any internal asynchronicity.

@t0yv0
Copy link
Author

t0yv0 commented Jun 30, 2011

Spot on, the "check back in a bit" scenario is prohibited. The application will be stuck to blocking (doing Async.RunSynchronously) if it wants to talk to the network or the database.

If we allow this "check back in a bit", I think the interface quickly becomes isomorphic to OWIN..

What I am struggling to grasp right now - is are these "check back in a bit" scenarios really safe? Or, how do we write apps that are safe?

It is just so easy to trip. Consider Petricek's AsyncSeq: readInBlocks is broken because it does not close the file descriptor. If it were to close it, when would it? And how can the reader signal lack of interest in the rest of the sequence?

@panesofglass
Copy link

Lots to think about. With Fracture, we have the pipeline model which uses agents to progress things along. We could always leverage that within Frank to allow for delayed or long-running work to not block the current stuff. Of course, if we have a number of agents running, they are not preventing the primary server from blocking anyway, so we're probably safe. Another aspect we are planning is to have load management be able to spin up new agents as necessary, so again, blocking within a given application shouldn't affect the overall system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment