Skip to content

Instantly share code, notes, and snippets.

@eevee
Last active March 9, 2016 17:00
Show Gist options
  • Save eevee/4e4301f9b6e674d46181 to your computer and use it in GitHub Desktop.
Save eevee/4e4301f9b6e674d46181 to your computer and use it in GitHub Desktop.
Please solve my mysterious web problem

Symptoms

  • Certain pages will stop loading at a certain point and sit idle for anywhere from 3s to a full minute. The time is often suspiciously close to a multiple of 10s.

  • This is largely a problem because a bunch of JS that actually makes the damn page work doesn't fire until after the delay.

  • While idle, Firefox is responsive, but the throbber (swirly loading animation on the tab) is active, and the "status bar" says "Waiting for" the name of some CDN. Sometimes it says "Read" or "Reading data from". I suspect this might be a red herring, and Firefox is just showing the last relevant network message for some reason.

  • It looks like something is delaying script execution, which delays DOMContentLoaded, which delays a bunch of stuff from running. In the case of GitHub, some profiling revealed that the only script on the page doesn't actually run (sometimes!) until 20 seconds have passed. The script downloads almost instantly, but it doesn't actually execute.

  • The Network tab in Firefox's dev tools does not attribute the long delay to any kind of network problem. It claims that the CDN request finishes almost instantly.

  • So far I've only seen this happen on Twitter, GitHub, Dropbox, and YouTube.

This problem is intermittent, so it's hard to tell for sure if I've fixed it or just gotten lucky. It seems to get worse the longer I have my browser open, and a force-refresh makes it much more likely.

I do have a lot of tabs? I don't know how that would directly relate.

Seemingly ruled out

It can't be the network, because I can't reproduce in Chromium, and I haven't witnessed any similar problem with any other program or on any other computer in our house.

It can't be the browser, because I can't reproduce in a clean profile.

It can't be addons, because I disabled all of them and still had the problem.

It can't be the cache, because force-refreshing is how I've been reproducing in the first place.

It seems unlikely to be a fundamental problem with the pages themselves, since it happens on several sites, and apparently just for me.

Debugging attempted

Network panel

As previously mentioned, the Network panel in the Firefox dev tools claims that all of these requests finish in under 100ms, even though the status 'bar' would seem to disagree.

I've been hammering on this page, and the results are:

  1. Requests to avatars#.githubusercontent.com are sent out, and complete successfully.
  2. ABSOLUTELY NOTHING HAPPENS FOR TWENTY SECONDS. During this time, Firefox says "Waiting for avatars#.githubusercontent.com..."
  3. A request to collector.githubapp.com sends and completes successfully.
  4. Then same for www.google-analytics.com, live.github.com, and api.github.com.
  5. Finally, the page becomes usable, by which I mean the ... expando buttons on commit messages respond to clicks.

Profiling

Running the Firefox profiler during a "broken" page load reveals nothing. There's a lot of incremental GC, but the page itself seems to be doing nothing whatsoever. Then inline scripts execute, seemingly for no particular reason, and that allows DOM ready to fire.

Wireshark

I don't know what I'm doing here, but here's a rough timeline from a request I captured, as organized into TCP conversations by Wireshark. (The times don't start at 0 because I had to alt-tab and hit refresh, of course. I trimmed everything from before the first request I made to the main github.com IP.)

Port A Address B Port B Bytes Rel Start Duration
48152 192.30.252.128 443 27340 0.583454737 10.375929519
48154 192.30.252.128 443 16379 0.584616081 6.1118591030
47060 199.27.79.133 443 44721 1.165613135 20.131691976
47062 199.27.79.133 443 85084 1.168825549 20.191475131
47064 199.27.79.133 443 76664 1.173414933 20.146767788
47066 199.27.79.133 443 143186 1.176510361 20.208427464
47068 199.27.79.133 443 1768 1.188077705 5.4528093880
47070 199.27.79.133 443 1768 1.188125779 5.4527716260
47072 199.27.79.133 443 7852 1.204606029 20.147728577
47074 199.27.79.133 443 13142 1.217547203 20.108879500
47076 199.27.79.133 443 11790 1.227242183 20.154673147
47078 199.27.79.133 443 4556 1.230630696 20.095215418

The 20s durations are a little misleading; that's just when I stopped the packet capture, because the page had finished loading.

192.30.252.128 is github.com; 199.27.79.133 is their CDN, including assets-cdn and avatars#.

I don't understand the second connection to 192.30.252.128. The first one transfers roughly the size of the page, and the Network panel doesn't show anything else being downloaded from that IP.

Looking at the actual packets, what happens is:

  • 1.33s: CDN data appears to finish downloading. Nothing else happens until...
  • 1.97s: DNS lookups for all the CDN IPs again (?). Again, nothing more until...
  • 5.60s: I send several TLS alerts to the CDN, and receive alerts back.
  • 6.60s: I send a few more alerts, and receive some back.
  • 6.70s: I go back and forth with the main github IP a few times, receiving a single packet and sending back a RST. This all happens on the second connection.
  • 10.89s: I receive a single TLS alert from the main github IP, on the first connection. A few ACKs go back and forth, and then that connection closes.
  • 11.26s – 11.35s: I send a bunch of TCP keep-alives to the CDN. Nothing more happens until:
  • 21.30s: I send a bunch of TCP keep-alives to the CDN, again.
  • 23.65s: DNS lookup for collector.githubapp.com, and the page loading resumes.

I stress that there was no network activity whatsoever with GitHub addresses in those large gaps.

@nivomi
Copy link

nivomi commented Mar 5, 2016

Does this only occur on TLS/SSL webpages? If it resets with a clean profile, could it be a SSL-related browser setting that's broken for some reason?

@mattprintz
Copy link

Check out the url about:debugging and check for workers that might be running.

@fluffy-critter
Copy link

Does Firefox maintain a global connection pool with a per-domain limit?

@trs80
Copy link

trs80 commented Mar 9, 2016

Damn, I see this too, I'd put it down to being in Australia. How many tabs do you have? I've got 1000+ at the moment. Can you see if sessionstore is doing anything? Since it's a giant pile of json it's pretty susceptible to memory fragmentation so maybe that's blocking some thread. Is there any javascript that fiddles the URI?

Hmm, dtruss while loading twitter.com has a suspicious open of /Users/xx/Library/Application Support/Firefox/Profiles/rnstz830.default/storage/default/https+++twitter.com/idb/4185313131nsortoisfriucca_tnio.sqlite{,-shm} and lots of madvise(address, 0x10000, 5) which I believe is MADV_FREE. There's also a lot of psynch_{cv{signal,wait,broad},mutex{wait,drop}} which is OS X pthreads I think.

If I had more time tonight I'd install Xcode and the debugger tools to try and get some timestamps on these events, but that will have to wait for another day. Well, I could have a quick look at gettimeofday .... there's no multisecond jumps, just a second increase at a time:

gettimeofday(0x70000164CC30, 0x0, 0x8909D600) = 1457538973 0
getrusage(0x0, 0x7FFF5FBEF140, 0x8909D600) = 0 0
psynch_cvwait(0x1007BE748, 0x8909D7018909D800, 0x8909D700) = -1 Err#316
gettimeofday(0x70000164CC30, 0x0, 0x8909D700) = 1457538974 0

gettimeofday(0x70000164CC30, 0x0, 0x890A4A00) = 1457538974 0
getrusage(0x0, 0x7FFF5FBEF140, 0x890A4A00) = 0 0
getrusage(0x0, 0x7FFF5FBEFD90, 0x890A4A00) = 0 0
psynch_cvsignal(0x1007BE748, 0x890A4C00890A4D00, 0x890A4C00) = 257 0
psynch_cvwait(0x1007BE748, 0x890A4C01890A4D00, 0x890A4C00) = 0 0
gettimeofday(0x70000164CC30, 0x0, 0x890A4C00) = 1457538975 0

gettimeofday(0x70000164CC30, 0x0, 0x890AB100) = 1457538975 0
getrusage(0x0, 0x7FFF5FBEF140, 0x890AB100) = 0 0
getrusage(0x0, 0x7FFF5FBEFD90, 0x890AB100) = 0 0
psynch_cvsignal(0x1007BE748, 0x890AB200890AB300, 0x890AB200) = 257 0
psynch_cvwait(0x1007BE748, 0x890AB201890AB300, 0x890AB200) = 0 0
gettimeofday(0x70000164CC30, 0x0, 0x890AB200) = 1457538976 0

gettimeofday(0x70000164CC30, 0x0, 0x890B1A00) = 1457538976 0
psynch_cvwait(0x118BF3878, 0x94A2E50194A2E600, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_mutexdrop(0x7000017D5E60, 0xDB66AD00DB66AE03, 0xDB66AD00) = 0 0
psynch_mutexwait(0x7000017D5E60, 0xDB66AC00DB66AE03, 0xDB66AC00) = -614027773 0
psynch_mutexdrop(0x259FEF0C0, 0x2787F0002788003, 0x2787F00) = 0 0
psynch_mutexwait(0x259FEF0C0, 0x2787E0002788003, 0x2787E00) = 41451523 0
psynch_cvsignal(0x1007BE748, 0x890B1B00890B1C00, 0x890B1B00) = 257 0
psynch_cvwait(0x1007BE748, 0x890B1B01890B1C00, 0x890B1B00) = 0 0
gettimeofday(0x70000164CC30, 0x0, 0x890B1B00) = 1457538977 0

gettimeofday(0x70000164CC30, 0x0, 0x890B8800) = 1457538977 0
psynch_cvwait(0x118BF3878, 0x94A31E0194A31F00, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvwait(0x118BF3878, 0x94A31F0194A32000, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvwait(0x118BF3878, 0x94A3200194A32100, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvwait(0x118BF3878, 0x94A3210194A32200, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvsignal(0x1007BEF48, 0x290D5900290D6100, 0x290D5900) = 256 0
psynch_cvwait(0x1007BEF48, 0x290D5902290D5A00, 0x290D5900) = 0 0
psynch_cvwait(0x118BF3878, 0x94A3220194A32300, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvwait(0x118BF3878, 0x94A3230194A32400, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
psynch_cvwait(0x118BF3878, 0x94A3240194A32500, 0x0) = -1 Err#316
psynch_cvbroad(0x7000017D5BB8, 0x100, 0x100) = 257 0
psynch_cvwait(0x7000017D5BB8, 0x100000100, 0x0) = 0 0
madvise(0x28BB00000, 0x100000, 0x5) = 0 0
psynch_cvwait(0x118BF3878, 0x94A3250194A32600, 0x0) = -1 Err#316
psynch_cvwait(0x1007BE748, 0x890B8A01890B8B00, 0x890B8800) = -1 Err#316
gettimeofday(0x70000164CC30, 0x0, 0x890B8800) = 1457538978 0

getrusage or pthreads I guess, not actually that helpful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment