-
-
Save t0yv0/1055424 to your computer and use it in GitHub Desktop.
(* | |
// # FSCGI - F# Common Gateway Interface | |
// | |
// This gist is a response to OWIN <http://owin.org/>. | |
// This gist is public domain. | |
// | |
// See also the discussion stating OWIN rationale and how it is better than FSCGI: | |
// http://groups.google.com/group/net-http-abstractions/browse_thread/thread/ac3d7c1e3d43c1d4 | |
// | |
// ## Problem Summary | |
// | |
// The goal of OWIN is to provide a low-level .NET standard for web apps to | |
// communicate with their environments, filling the similar role as Java | |
// Servlet Specification, FastCGI, SCGI (Python) and the like. | |
// | |
// OWIN as of Mar 13, 2011 is problematic in the following respects: | |
// | |
// * poor use of static typing, for example uses Dictionary<String,Object> | |
// * gratuitous complexity of the definition (nested Func<_> callbacks) | |
// * reliance on prose to communicate invariants that can be type-enforced | |
// | |
// ## Solution Summary | |
// | |
// * use more explicit typing | |
// * use an iteratee-based representation for the IO process | |
// * simplify exception handling by the rule: application code MUST NOT throw | |
// any exceptions; doing so is indicating a programming error and will be | |
// treated in a host-dependent way. | |
// | |
*) | |
/// Defines the common gateway interface protocol for F#. | |
namespace FSCGI | |
type Data = System.ArraySegment<byte> | |
type Headers = Map<string,string> | |
type StatusCode = int | |
type Status = string | |
type Request = | |
{ | |
Headers : Headers | |
Method : string | |
Path : string | |
PathBase : string | |
QueryString : string | |
Scheme : string | |
} | |
type State = | |
| Closed | |
| Open | |
type Writer = | |
| Done | |
| Write of (State -> Data * Writer) | |
type Response = | |
| Read of (option<Data> -> Response) | |
| Respond of StatusCode * Status * Headers * Writer | |
type Application = | |
Request -> Response |
/// Converts structural encodings to proper FSCGI.* types. | |
module FSCGI.Structural.Converter | |
type private Encodings<'R,'W> = | |
('W -> Writer<'W>) * | |
('R -> Response<'R,'W>) | |
let private ConvertRequest (r : FSCGI.Request) : Request = | |
( | |
r.Headers, | |
r.Method, | |
r.Path, | |
r.PathBase, | |
r.QueryString, | |
r.Scheme | |
) | |
let rec private ConvertWriter ((eW, _) as enc : Encodings<'R,'W>) | |
(w: Writer<'W>) : FSCGI.Writer = | |
match w with | |
| Choice1Of2 () -> | |
FSCGI.Done | |
| Choice2Of2 f -> | |
FSCGI.Write (fun state -> | |
let isOpen = | |
match state with | |
| FSCGI.Closed -> false | |
| FSCGI.Open -> true | |
let (data, writer) = f isOpen | |
(data, ConvertWriter enc (eW writer))) | |
let rec private ConvertResponse ((eW, eR) as enc : Encodings<'R,'W>) | |
(r: Response<'R,'W>) : FSCGI.Response = | |
match r with | |
| Choice1Of2 f -> | |
FSCGI.Read (fun d -> ConvertResponse enc (eR (f d))) | |
| Choice2Of2 (a, b, c, d) -> | |
FSCGI.Respond (a, b, c, ConvertWriter enc (eW d)) | |
let Convert ((eW, eR, run): Application<'R,'W>) : FSCGI.Application = | |
ConvertResponse (eW, eR) << run << ConvertRequest |
/// Provides structural encodings for the FSCGI.* types. | |
namespace FSCGI.Structural | |
type Data = System.ArraySegment<byte> | |
type HeaderName = string | |
type HeaderValue = string | |
type Headers = Map<HeaderName,HeaderValue> | |
type Method = string | |
type Path = string | |
type PathBase = string | |
type QueryString = string | |
type Scheme = string | |
type StatusCode = int | |
type Status = string | |
type Request = | |
Headers * Method * Path * PathBase * QueryString * Scheme | |
type Writer<'W> = | |
Choice< | |
unit, | |
bool -> Data * 'W | |
> | |
type Response<'R,'W> = | |
Choice< | |
option<Data> -> 'R, | |
StatusCode * Status * Headers * 'W | |
> | |
type Application<'R,'W> = | |
('W -> Writer<'W>) * | |
('R -> Response<'R,'W>) * | |
(Request -> Response<'R,'W>) |
Summary of the OWIN position from a twitter discussion with @grumpydev:
OWIN avoids declaring named types because that leads to the need of having an assembly with those types, and the cost of shipping this assembly is deemed prohibitive.
issue from @grumpydev: the container MUST iterate through the complete Response/Writer sequence because it does not have a way to communicate early termination to say, make the application close a file handle.
Accounted for cancellation by adding the Status parameter to the Write thunk. Now the container may send Closed to kindly ask the iteratee to stop producing data, close any pending handles, and return Done. If the iteratee does not comply, iteration continues.
Re: avoiding an assembly - we originally had a number of authors who had existing implementations and wanted to conform to an open spec. The easiest, meet-in-the-middle approach was to use delegate structures and no required type definitions. I think most understood that they would wrap the dictionary in a type within their implementation, much as Gate does. (Interesting that OWIN now has an assembly.) Further, the current spec is also directly implementable in Iron- languages.
Re: Writer | Read
- my opinion is similar to yours wrt monads. I like the forced sequence of read the request, emit headers, write the response. I find this more approachable, even if you might lose a bit of speed reading the request. Of course, something in the request body may be malformed, which might alter your response. Also, why call it Writer
if it is Iteratee
?
Re: Response
- I think I understand the Input
portion, though I would have to see it. In your structural snippet, did you mean to write:
type Response<'W> =
Choice<
option<Data> -> Response,
StatusCode * Status * Headers * Writer<'W>
>
? I don't quite follow the Application
definition in the structural snippet, either. It's a function that takes ??? and a request, then returns a response? I think that's an Iteratee, but I would think that should be apart of the request. On further review, I see that it is not. So then the application is a RequestBody -> Request -> Response
, correct?
By the way, I like this. Do you mind if I start incorporating this into Frank and Fracture?
Quite fascinating considerations, I would never have thought of Iron languages (F# bigotry) :)
Feel free to edit out the Read case then. I do not see how this would affect performance; it simply will disallow reading after writing has started (maybe a good thing). Performance-wise, my intuition is that iteratee-based approach will be fast, very fast. But we need to try.
Just looked at Gate - oh yes, something like this I expected to find alongside OWIN itself. Looks good! I am still a bit suspicious of the complexity of the protocol.
I have no strong opinions at all on naming. Iteratee is totally fine with me you prefer. Just note that Response<_> is also, in a sense, an iteratee, maybe that was my reasoning.
I also gladly accept pull requests. ;)
By the way, I agree about the complexity of the OWIN protocol. It's an attempt at emulating the same complexity used by node.js, which allows back pressure to be placed on the connection so that the server doesn't block or stack overflow. It's interesting, but I think there are other, potentially better ways of managing this, Lazy
being quite an interesting option.
Please feel free to use code and/or ideas.
I made some mistakes in the above definition, let me fix them and then it will be clearer.
Thanks for the questions, it made me check the definition and spot that it did not compile.
As you spotted, the application type was wrong, and so was the Response type.
Basically structural types do not allow us to construct recursive types directly, therefore we need to encode them in generics, and ask for the user to provide the bijection between 'W and Writer and 'R and 'Response. Luckily, we only seem to need the projection part ('W -> Writer) to be able to reconstruct FSCGI.Application from FSCGI.Structural.Application.
I attach the conversion code to make it explicit.
The encoding makes 'W and Writer<'W> interchangeable. Turns out we also need 'R for Response encoding.
To clarify structural encodings even more, what we expect the user to do is to define:
type W = W of Writer<W>
type R = R of Response<R,W>
let App : Application<R,W> =
((fun (W x) -> x), (fun (R x) -> x), ...)
I am amazed at the amount of notice Node.js gets, to us F# bigots it is totally unwarranted! :) I am not familiar with its workings, but I strongly suspect that what they are doing falls into the general framework of becoming more explicit about scheduling and doing it at the language level.
The trend is towards tackling async programming with language-level cooperative threads. State of the art for me is OCAML LWT "light-weight threading" - by now old and proven. OCAML's single-process model dictated this kind of a model where processes are encoded as 'a Lwt.t
values with a monadic interface. Process interleaving is done by the library itself during the bind of the monad (hence cooperative behavior). OCAML users then found out that it actually performs great!
With WebSharper we are in the same boat on top of the JavaScript runtime - we have no threads available to the language. For the upcoming release we just did the LWT trick - Joel Bjornson re-implemented Async support to make use of a round-robin scheduler.
Iteratees do the same, they are just specialized for IO. But what they actually do, if you think about it, they encode processes as explicit state machines, with every state a value and every transition a function. Essentially the same as LWT, except the states are more constrained.
The last commit eliminates the flexibility to start responding before the reading is complete, fixing the sequence of events: read-request, emit-headers, write-response - making the definition a bit simpler.
Funny, I was looking at the various Iteratee
implementations, looked at some state machine code I had written for comparison, and was going to ask you whether Iteratee
was essentially a constrained state machine, which you have already noted. :) I need to look into OCaml LWT. I'm not familiar with that.
Before I throw this into Frank, let me make sure I understand correctly. In this implementation, there is no Async
, correct? This is all Iteratee
-based? In other words, it uses a lazy state machine approach to construct the next chunk as needed rather than asynchronously accessing the underlying Stream
? Would the underlying Stream
access use Async
? Is AsyncSeq
a potentially appropriate mechanism for reading each chunk?
Yes this is completely synchronous on purpose. The application wouldn't be able to use async in any other way then Async.RunSynchronously. The host might use Async or something else to multiplex serving several requests / application instances. I was thinking that after all that's the only difference in functionality to OWIN - in OWIN the application can do asynchronous stuff because OWIN gives it result-expecting callbacks.
Also to clear things up a bit, if we would have a mutable interface for the application, here's what would happen to it:
app.Request <- ...
while (input.DataAvailable && app.AcceptsData) do
app.Accept(input.Read())
output.Write(app.GetResponse())
while (app.HasMoreData()) do
output.Write(app.Read())
The F# hackery above is just trying to guarantee that these things happen in the order listed. The host also has a good chance to multiplex several apps:
app.Request <- ...
while (input.DataAvailable && app.AcceptsData) do
(* here we might switch between applications *)
app.Accept(input.Read())
output.Write(app.GetResponse())
while (app.HasMoreData()) do
(* here we might switch between applications *)
output.Write(app.Read())
The reason I added AsyncBody
versions of the Body
types in the new Frank signatures was to provide applications the ability to say, "I need to do some look-ups on my own; check back with me in a bit." Your examples would be useful in the case that an application is doing some immediate results, but I don't think it addresses the common need to call out to a database or web service from the server. Or am I missing that aspect?
Hmm... I suppose the application could use Async.RunWithContinuations
and supply the Response
write mechanism as a callback. That should still facilitate what you've described above. The application would then control any internal asynchronicity.
Spot on, the "check back in a bit" scenario is prohibited. The application will be stuck to blocking (doing Async.RunSynchronously) if it wants to talk to the network or the database.
If we allow this "check back in a bit", I think the interface quickly becomes isomorphic to OWIN..
What I am struggling to grasp right now - is are these "check back in a bit" scenarios really safe? Or, how do we write apps that are safe?
It is just so easy to trip. Consider Petricek's AsyncSeq: readInBlocks
is broken because it does not close the file descriptor. If it were to close it, when would it? And how can the reader signal lack of interest in the rest of the sequence?
Lots to think about. With Fracture, we have the pipeline model which uses agents to progress things along. We could always leverage that within Frank to allow for delayed or long-running work to not block the current stuff. Of course, if we have a number of agents running, they are not preventing the primary server from blocking anyway, so we're probably safe. Another aspect we are planning is to have load management be able to spin up new agents as necessary, so again, blocking within a given application shouldn't affect the overall system.
Thanks the input!
So, how exactly is WSGI better? Python simply does not have a typed option. On the OWIN list people are saying that one of the goal of OWIN is not to have an assembly, maybe that is why it is what it is.. Still beats me why people would want to avoid an assembly.
Here's the rationale behind the Input variant of the response: for certain requests the response headers are dependent on the request input (Data). Therefore, we need to give the response IO process a chance to read before emitting the headers. In this signature, unlike in OWIN, the headers dictionary is immutable, which makes sequencing explicit.
I kept the Read case in the Writer type thinking that perhaps reading and writing might be interleaved. I am not sure this is ever useful or practical. If it is not the case, we could drop the Read case and keep just Input, then the sequence of the automaton would be fixed: read the request, emit headers, write the response.