annevk/gist:5496188

## gistfile1.irclog
[12:47] <annevk> bz: So now I've got http://fetch.spec.whatwg.org/#requests as architecture for fetching, mostly based on Hixie's work in HTML and my work in CORS. However, I've gotten the impression that browsers have a slightly different architecture. One that involves at least a global object of sorts.
[12:48] <annevk> bz: I'm wondering if I should attempt to reconcile the two. E.g. once a global object dies (removal of an <iframe>) we might want to kill all associated fetches as happens now.
[12:52] <annevk> The global object (or browsing context or whatever we use) can also be used to find out some information, such as the origin, CSP policy, and maybe referrer?
[13:45] <bz> annevk: good morning
[13:45] * bz reads up
[13:45] <annevk> bz: good afternoon ;)
[13:48] <bz> hmm
[13:48] <bz> So I can describe the setup in Gecko if you want
[13:48] <bz> Note that I'm not sure it's a great setup
[13:48] <bz> In fact, it has some crappy bits.  ;)
[13:49] <bz> annevk: But that's not the question, right?
[13:49] <bz> annevk: What _is_ the question?
[13:51] <annevk> I think it might be helpful to know the setup and then figure out why that is the setup and it cannot be something else. (E.g. killing all connections once a global object goes away might make sense, even if that also kills window2.XMLHttpRequest's connection used by window1...)
[13:51] <annevk> bz: The question is "what's the best possible architecture for fetching resources on the web given our silly legacy constraints as a hard lower bound?"
[13:52] <annevk> (or some such)
[13:53] <bz> ok
[13:54] <bz> So the Gecko setup is as follows
[13:54] <bz> Each navigation context has an associated object called a "load group"
[13:54] <bz> Each necko request can be optionally associated with a load group
[13:55] <bz> Necko channels are requests.  Load groups themselves are requests.
[13:55] <bz> The load group of an iframe is placed in the load group of its parent navigation context.
[13:56] <bz> The load group is used to implement things like onload tracking (by seeing whether there are any live requests in it), stop() (by issuing a cancel() on the load group, which issues a cancel() on everything inside it)
[13:56] <bz> Note that for onload this setup is ... crappy, since the load group is associated with the navigation context, not the document.
[13:58] <bz> This last is why if you start some sort of subresource loads after a new pageload is kicked off but before the new page is switched to those subresource loads can continue even after the new page is switched to: That happens because the new document load is in the same loadgroup as the subresources, so we have to do the stop() right before we start the new document load, and can't do it when switching to a new document.
[13:58] <bz> I _think_ pages commonly abuse this by doing <img> loads from unload and whatnot.  :(
[13:58] <bz> When a navigation context is torn down (e.g. iframe removed from the document), the corresponding load group gets canceled, canceling all the requests in it.
[13:59] <bz> For some cases (e.g. close window or close tab) this is in fact highly desirable.
[13:59] <bz> Another consequence of this setup is that a given request can, via the load group, be associated to a particular navigation context.
[14:00] <bz> Which we certainly use internally for various stuff....
[14:00] <bz> Now there are some interesting special-cases
[14:00] <bz> Specifically, imagelib will coalesce loads across documents in Gecko
[14:01] <bz> So if you do an <img src> what goes in the loadgroup is NOT the actual HTTP load of the image.
[14:01] <bz> It's instead a proxy request; the actual HTTP load happens outside of load groups
[14:01] <bz> But canceling all the proxies will generally also cancel the underlying load, iirc.
[14:01] <bz> "proxy" here does not mean HTTP proxy.  ;)
[14:03] <bz> Make sense so far?
[14:03] <bz> Now a lot of this stuff is not really web-visible.
[14:06] <bz> annevk: Does that help?
[14:07] <annevk> yeah, the bit about <img> load from unload in particular is interesting
[14:08] <annevk> bz: what is a necko channel?
[14:09] <bz> oh, sorry
[14:09] <bz> A necko channel is something that actually gets data from somewhere
[14:09] <bz> So necko has this concept of a "request" which is basically "something that is in progress
[14:10] <bz> or at least can be in progress
[14:10] <bz> And a concept of "channel", which is a subclass of "request" and can produce data, generally speaking
[14:10] <bz> So for example, an http:// or file:// load would be a channel
[14:11] <bz> There are several non-channel requests around: loadgroups, image load proxies, the onload blocker request
[14:11] <annevk> makes sense
[14:11] <bz> (This last is used to block onload by claiming to be in progress while some counter is nonzero)
	[12:47] <annevk> bz: So now I've got http://fetch.spec.whatwg.org/#requests as architecture for fetching, mostly based on Hixie's work in HTML and my work in CORS. However, I've gotten the impression that browsers have a slightly different architecture. One that involves at least a global object of sorts.
	[12:48] <annevk> bz: I'm wondering if I should attempt to reconcile the two. E.g. once a global object dies (removal of an <iframe>) we might want to kill all associated fetches as happens now.
	[12:52] <annevk> The global object (or browsing context or whatever we use) can also be used to find out some information, such as the origin, CSP policy, and maybe referrer?
	[13:45] <bz> annevk: good morning
	[13:45] * bz reads up
	[13:45] <annevk> bz: good afternoon ;)
	[13:48] <bz> hmm
	[13:48] <bz> So I can describe the setup in Gecko if you want
	[13:48] <bz> Note that I'm not sure it's a great setup
	[13:48] <bz> In fact, it has some crappy bits. ;)
	[13:49] <bz> annevk: But that's not the question, right?
	[13:49] <bz> annevk: What _is_ the question?
	[13:51] <annevk> I think it might be helpful to know the setup and then figure out why that is the setup and it cannot be something else. (E.g. killing all connections once a global object goes away might make sense, even if that also kills window2.XMLHttpRequest's connection used by window1...)
	[13:51] <annevk> bz: The question is "what's the best possible architecture for fetching resources on the web given our silly legacy constraints as a hard lower bound?"
	[13:52] <annevk> (or some such)
	[13:53] <bz> ok
	[13:54] <bz> So the Gecko setup is as follows
	[13:54] <bz> Each navigation context has an associated object called a "load group"
	[13:54] <bz> Each necko request can be optionally associated with a load group
	[13:55] <bz> Necko channels are requests. Load groups themselves are requests.
	[13:55] <bz> The load group of an iframe is placed in the load group of its parent navigation context.
	[13:56] <bz> The load group is used to implement things like onload tracking (by seeing whether there are any live requests in it), stop() (by issuing a cancel() on the load group, which issues a cancel() on everything inside it)
	[13:56] <bz> Note that for onload this setup is ... crappy, since the load group is associated with the navigation context, not the document.
	[13:58] <bz> This last is why if you start some sort of subresource loads after a new pageload is kicked off but before the new page is switched to those subresource loads can continue even after the new page is switched to: That happens because the new document load is in the same loadgroup as the subresources, so we have to do the stop() right before we start the new document load, and can't do it when switching to a new document.
	[13:58] <bz> I _think_ pages commonly abuse this by doing <img> loads from unload and whatnot. :(
	[13:58] <bz> When a navigation context is torn down (e.g. iframe removed from the document), the corresponding load group gets canceled, canceling all the requests in it.
	[13:59] <bz> For some cases (e.g. close window or close tab) this is in fact highly desirable.
	[13:59] <bz> Another consequence of this setup is that a given request can, via the load group, be associated to a particular navigation context.
	[14:00] <bz> Which we certainly use internally for various stuff....
	[14:00] <bz> Now there are some interesting special-cases
	[14:00] <bz> Specifically, imagelib will coalesce loads across documents in Gecko
	[14:01] <bz> So if you do an <img src> what goes in the loadgroup is NOT the actual HTTP load of the image.
	[14:01] <bz> It's instead a proxy request; the actual HTTP load happens outside of load groups
	[14:01] <bz> But canceling all the proxies will generally also cancel the underlying load, iirc.
	[14:01] <bz> "proxy" here does not mean HTTP proxy. ;)
	[14:03] <bz> Make sense so far?
	[14:03] <bz> Now a lot of this stuff is not really web-visible.
	[14:06] <bz> annevk: Does that help?
	[14:07] <annevk> yeah, the bit about <img> load from unload in particular is interesting
	[14:08] <annevk> bz: what is a necko channel?
	[14:09] <bz> oh, sorry
	[14:09] <bz> A necko channel is something that actually gets data from somewhere
	[14:09] <bz> So necko has this concept of a "request" which is basically "something that is in progress
	[14:10] <bz> or at least can be in progress
	[14:10] <bz> And a concept of "channel", which is a subclass of "request" and can produce data, generally speaking
	[14:10] <bz> So for example, an http:// or file:// load would be a channel
	[14:11] <bz> There are several non-channel requests around: loadgroups, image load proxies, the onload blocker request
	[14:11] <annevk> makes sense
	[14:11] <bz> (This last is used to block onload by claiming to be in progress while some counter is nonzero)