sandfox/convo.md

## convo.md

      
    Raw
  

              convo.md
            
          
    [2:20pm] felixge: ^--- would love if you could explain ideas on how to do it here
[2:20pm] felixge: but ideally it'd be an optional part of the protocol
[2:21pm] felixge: Baughn: <3 thanks!
[2:22pm] Baughn: felixge: The problem is, TCP uses a 16-bit checksum. The algorithm is resilient against single-bit errors - the checksum should always change - but for multiple-bit errors, there is a 2^-16 chance you get the /same/ checksum.
[2:22pm] felixge: yes, we need to figure it out
[2:22pm] Baughn: And you're aiming this protocol squarely for noisy networks.
[2:22pm] felixge: a few thoughts: clients may not always be able to generate a checksum, and doing this for large files can be problematic (especially in JS)
[2:22pm] Baughn: Right, I'd mark that part of the protocol with SHOULD.
[2:22pm] felixge: so a good proposal should allow checksums for each PUT, rather than just the entire file
[2:23pm] Baughn: "Entire file" is easy to implement, and covers most use cases. Per-put has some simplicity advantages, but I don't think it's very useful overall..
[2:23pm] felixge: another use case that was brough up before is de-duplication: A server should be able to say he already has a certain chunk if the checksum matches
[2:23pm] Baughn: There's also the checksum tree (aka. Bittorrent) option, which is /great/
[2:23pm] felixge: Baughn: fair enough, I'm looking forward to the comments
[2:23pm] felixge: not aware of checksum trees
[2:23pm] Baughn: Ah yes, this would also allow for de-duplication. 
[2:23pm] felixge: so I'd <3 your help coming up with a good solution
[2:24pm] felixge: and I agree that it's probably important enough to be a SHOULD in the protocol
[2:24pm] Baughn: I'd recommend CRC32 or CRC64. Preferably the latter.
[2:24pm] Baughn: There's no point in specifying a cryptographic checksum.
[2:25pm] Baughn: Those two are fast enough to work nicely in JS, too. 
[2:25pm] Baughn: (Or js.asm, heh)
[2:25pm] sandfox: what about content-md5 header?
[2:25pm] Baughn: Oh, huh. That exists.
[2:25pm] sandfox: clients can send that with an upload and then server can verify on completion
[2:25pm] sandfox: i think amazon s3 uses it
[2:25pm] Baughn: Content-MD5 sounds good
[2:26pm] Baughn: Checksum trees are a potential future enhancement, then..
[2:27pm] Baughn: felixge: Also, have a look at how Bittorrent works. Obviously we don't want to clone it wholesale, but its checksumming features are nice.
[2:27pm] sandfox: dammit you edit issues quickly Baughn:
[2:27pm] Baughn: It's that or finish this privacy training. 
[2:28pm] sandfox: understandable… 
[2:30pm] sandfox: Baughn: were you thinking some kind scenario where once an 'upload' is created, multiple (distributed) clients could begin uploading segments at the same time or at least in a fairly un-coordinated fashion?
[2:30pm] Baughn: No
[2:30pm] Baughn: I just meant to point at bittorrent's implementation of checksum trees
[2:30pm] Baughn: http://en.wikipedia.org/wiki/Hash_tree <- Though it's pretty simple, really.
[2:31pm] sandfox: Just me then… already have some mad science ideas brewing
[2:31pm] Baughn: Well, I'd rather extend bittorrent for that purpose than make anything new.
[2:32pm] sandfox: thats probably more sensible, but I have never let that be barrier
[2:33pm] • Baughn has seen some *hilarious* postmortems caused by random corruption, including on-wire corruption. TCP and all.
[2:33pm] Baughn: felixge: To be quite blunt, including checksumming makes this protocol far more likely to be accepted.
[2:39pm] sandfox: As far as I can tell content-MD5 is only correct in the context of a individual http request, so it would make sense for PUTs but we'd need some other header else for initial POST. thoughts?
[2:39pm] felixge: Baughn: sure, lets do it 
[2:41pm] Baughn: Make up our own header. 'Resource-MD5' sounds about right.
[2:41pm] tim-kos:
[2:41pm] Baughn: Which is added by the client on POST, and the server on HEAD/GET
[2:42pm] Baughn: ..thus allowing for de-duplication both ways
[2:43pm] Baughn: Oh, right. Default implementations should set ETags to the same value.
[2:43pm] Baughn: ..but this isn't really a file /server/, so that's a separate issue.
[2:44pm] Baughn: ..tim-kos. Where did I see that name before...
[2:44pm] tim-kos: not sure, nodecopter?
[2:45pm] Baughn: Nope
[2:45pm] Baughn: Ever been to #bay12games, by any chance?
[2:45pm] sandfox: ETags always seem a little wishy-washy and vague but that just imho. Also - good reading about content-md5 :: http://trac.tools.ietf.org/wg/httpbis/trac/ticket/178
[2:46pm] Baughn: Right, Resource-MD5 it is.
[2:47pm] felixge: Baughn: Resource-MD5: Should we use an X- Header?
[2:47pm] felixge: or is that no longer recommended
[2:47pm] felixge: I remember reading sth about that
[2:48pm] Baughn: I'm not terribly familiar with the HTTP process, sorry..
[2:48pm] Baughn: I'd prefer to not use an X- prefix, just so if it gets to be an official RFC we don't have to change anything
[2:49pm] Baughn: ..and yeah, Resource-MD5 should probably be a separate RFC from tus.
[2:49pm] Baughn: ' On June 2011, the first IETF draft was posted to deprecate the use of the "X-" prefix for non-standard headers. The reason is that when non-standard headers prefixed with "X-" become standard, removing the "X-" prefix breaks backwards compatibility, forcing application protocols to support both names (E.g, x-gzip & gzip are now equivalent). So, the recommendation is to just name them sensibly without the "X-" prefix.'
[2:49pm] Baughn: IETF agrees. 
[2:49pm] Baughn: http://tools.ietf.org/html/rfc6648
[2:50pm] felixge: Baughn: awesome
[2:50pm] felixge: I think we need to restructure the doc a little
[2:50pm] felixge: into the "Core" protocol
[2:50pm] felixge: and the various optional parts
[2:50pm] felixge: so that implementors can get a compatible implementation quickly
[2:50pm] Baughn: Have you done this before?
[2:50pm] felixge: and add the other features as needed
[2:50pm] felixge: Baughn: done what?
[2:50pm] Baughn: Written an rfc. 
[2:50pm] felixge: no
[2:51pm] felixge: this isn't an RFC so
[2:51pm] Baughn: Eh, yes it is.
[2:51pm] felixge: but maybe we'll submit it one day
[2:51pm] Baughn: Or it should be.
[2:51pm] felixge: but for now that'd be a distraction
[2:51pm] felixge: for now we need to figure out what we want/need
[2:51pm] Baughn: It just means "request for comment", and you're doing /that/ fine. 
[2:51pm] felixge: in order to bring resumable file uploading the web
[2:51pm] felixge: Baughn: there are actually a lot of editorial rules for writing RFCs
[2:51pm] felixge: I've researched it
[2:51pm] Baughn: Point is, there's a tendency for first-time RFC writers to try following the form, rather than the spirit, of existing ones. The only thing that /really/ matters is that the document is clear and easy to follow.
[2:52pm] Baughn: ..I am, of course, talking about golden-age RFCs. Not the mess we have todya.
[2:52pm] felixge: : )
[2:55pm] sandfox: Is there any mileage in supporting 'Range' header for HEAD requests so partial chunks can verified by a client before attempting a re-upload
[2:56pm] Baughn: Define 'verified'
[2:58pm] sandfox: good point, client sends HEAD request with byte range in "Range" header, server responds with content-MD5 for that range so client can see if it matches what it thinks should be there.
[2:58pm] Baughn: I'd call that a kluge. Hash trees are the correct solution..
[2:59pm] Baughn: Oh, it'd /work/, but it'd be a lot more expensive
[2:59pm] Baughn: Although..
[3:00pm] Baughn: If it's just for resuming then, yeah, it's reasonable. The question is what to do if the range doesn't match.
[3:00pm] Baughn: And with hash trees, you'd get all of this for free including how to fix damage.
[3:01pm] sandfox: How easy/difficult are hash tree's to do from a client side perspective
[3:01pm] Baughn: (Plus the server would need a hash tree anyway, to efficiently compute the range MD5. You do not want to actually compute an MD5 from scratch on every HEAD, that'd be hilariously DoS-vulnerable.)
[3:01pm] Baughn: Hash trees? Easy-peasy.
[3:03pm] Baughn: You can even do things like leave out the topmost levels (compute only the bottom N levels of the tree), which lets you stream data to the server without computing the hash in advance
[3:03pm] Baughn: Which would, in turn, allow for live video uploads and whatnot
[3:04pm] Baughn: Start with 64KiB chunks, compute 4 levels, and the client only needs to buffer 1MiB at a time
[3:04pm] sandfox: I suppose the only thing in my mind is that "Range" + HEAD supports arbitrary byte ranges (I admit it is overhear in lots of ways), what block size can/does hash trees sensibly do?
[3:04pm] Baughn: While the server can fill in the higher tree at will
[3:05pm] Baughn: Well, 64KiB chunks are pretty common - that gives you ~0.3% overhead.
[3:06pm] sandfox: Hmmm, I'm feeling that hash trees represent a much more solid approach overall and that at best Range + HEAD could be an optional thing, if not left out altogether for people to implement on top if they so choose
[3:06pm] Baughn: You'd use Range+HEAD to implement it, though..
[3:07pm] Baughn: I mean, okay. Scenario:
[3:07pm] sandfox: is it possible to restrict Range requests to block sizes?
[3:07pm] Baughn: It's not necessary. So. Scenario:
[3:07pm] Baughn: 1MiB file. 64KiB chunk size.
[3:08pm] Baughn: The client starts by transferring the entire hash tree to the server. As I said, it /could/ stream, but this is a simple scenario.
[3:08pm] Baughn: Then it transfers, say, 700 bytes before the connection fails.
[3:08pm] Baughn: ..er, 700000. 
[3:09pm] Baughn: That's 10 full chunks, plus a bit.
[3:09pm] Baughn: The server can immediately verify those chunks, and see that they're correct. The "plus a bit", it can either discard or keep.
[3:10pm] Baughn: When the client reconnects, it gets a range request from the server, asking it to start at byte 700,001. (If we keep everything)
[3:10pm] Baughn: Assuming the connection doesn't break a second time, the server can then go ahead and compute the hash for chunk 11 through 16
[3:11pm] sandfox: ok, that make sense
[3:12pm] Baughn: The server *may* need to finish the conversation by returning a 4XX Transmission Corrupted, in which case the client would re-attempt its upload until it /isn't/ corrupted, sending different chunks each time
[3:12pm] sandfox: ah-ha, I was thinking in my examples of the client sending a range header. but this isn't needed because the client gets the necessary info on the HEAD request
[3:12pm] Baughn: Now, there's nothing in that that required a *tree*; it's all just 64KiB chunks
[3:13pm] Baughn: The 'tree' part of the specification is there to make it easy to change the chunk size, essentially
[3:13pm] sandfox: sounds legit
[3:14pm] Baughn: There are all kinds of clever things you /can/ do, like starting by just transferring the top-most hash level (of the entire file), and requesting more levels if corruption is noticed so you can find the exact spot, but those all make things more complex
[3:15pm] Baughn: Still, if you start with the 'hash tree' abstraction, all of the things you /might/ want to do suddenly have simple descriptions in terms of the hash tree structure
[3:15pm] sandfox: would there need to be negation between client and server over the tree details?
[3:15pm] Baughn: Not if the protocol is specified correctly
[3:16pm] Baughn: It's not hard to write a client and server that just work with every possible tree
[3:16pm] Baughn: ..though of course there should be limits, i.e. 1B chunk sizes probably represent a DoS attempt
[3:18pm] Baughn: My main worry is that hash trees are best serialized as heaps, and /not/ ascii
[3:18pm] Baughn: The text blowup factor for doing this the recommended HTTP way could be hilariously large
[3:19pm] sandfox: does HTTP have any views at all on that?
[3:19pm] Baughn: It's traditional to make everything human-readable. There are good reasons to do so.
[3:20pm] Baughn: But it also increases data usage, under already tight circumstances
[3:21pm] Baughn: That seems to be changing, though..
[3:21pm] Baughn: http://tools.ietf.org/id/draft-snell-httpbis-bohe-01.html
[3:23pm] Baughn: Most uploads will not be corrupted..
[3:23pm] Baughn: I think it'd make sense to (when not streaming) transfer just the root node, then negotiate if corruption happens
[3:23pm] Baughn: There isn't really any reason the server needs to know the location of corruption before the entire file is done.
[3:23pm] Baughn: ...
[3:24pm] Baughn: Well, except one - it'd reduce roundtrips, we could pipeline patch requests.
[3:24pm] Baughn: And, oh yeah.
[3:24pm] Baughn: itu should use the PATCH verb instead of PUT.
[3:25pm] sandfox: whats browser support like for PATCH?
[3:25pm] Baughn: Supported in everything newer than IE8
[3:25pm] Baughn: (And lots of things older, but eh)
[3:26pm] Baughn: It's a bit pedantic, given that things would work fine either way.
[3:27pm] sandfox: make it optional to support PUT as a fallback?
[3:27pm] Baughn: I was thinking the same.
[3:27pm] Baughn: I'm not 100% sure what the practical consequences of PUT vs. PATCH are
[3:28pm] Baughn: PATCH has the right /semantic/ meaning, but there might be things that assume more than we'd like. About either method, really.
[3:30pm] sandfox: do you want to put the summary of this in the github issue?
[4:00pm] felixge: Baughn: bah, got to finish some slides right now, but will read up on all the stuff you wrote later!
[4:00pm] felixge: thx for thinking about this!