Skip to content

Instantly share code, notes, and snippets.

@sranso
Last active August 29, 2015 14:08
Show Gist options
  • Save sranso/de1304e661da325c5935 to your computer and use it in GitHub Desktop.
Save sranso/de1304e661da325c5935 to your computer and use it in GitHub Desktop.

#anatomy of a web request

Web Request

  • let's make request to https://www.etsy.com
  • what does the browser do?
    1. DNS
    2. check to see if it's already looked up the IP address for this host
    3. check to see if the address is the local host itself
    4. look in /etc/hosts to see if there is a matching entry
    5. browser sends a UDP (?) packet to the resolving name server (RNS), asks "do you know where www.etsy.com is?" a. can configure your browser to use any RNS you want b. packets have no guarantee of delivery c. smallest
    6. RNS could have cached the location of www.etsy.com. if NOT...
    7. RNS will query the route zone server (RZS) asking the same question ("do you know?")
    8. RZS will say "nah man, but i do know where .com is located! good luck!"
    9. RNS will say "okay .com do you know where www.etsy.com is located?"
    10. .com will say "nah brah, but i bet etsy.com knows where to find it!"
    11. RNS will ask etsy.com the same question who will be like "OMG i know!"
    12. RNS catches the IP address and returns it to the browser
    • can see this by dig www.etsy.com
    1. TCP
    2. browser sends syn (x=rand()) to www.etsy.com server
    3. server sends syn ack (x+=1 y=rand())
    4. browser sends ack (x+1 y+1) back ^^ these three steps establish the TCP connection, the handshake is complete
    • can see this by tcpdump -c3 host www.etsy.com
    1. TLS
    2. along with that final ACK, browser sends to server ACK (ClientHello) along with version of TLS that's running and some services offered by TLS
    3. server picks a TLS protocol version, decides on a cypher suite, attaches its certificate, sends browser ServerHello Certificate
    4. client initiates key exchange, used to establish further connectivity sends ClientKeyExchange ChangeCipherSpec
    5. server processes the key exchange, tests for integrity, and returns ChangeCipherSpec Finished
    6. HTTP
    7. browser sends get request for route. includes cookies, http version it supports, the host
    8. server responds to request. lots happens. loadbalancer gets the req, queries databases and caches, creates html doc, returns to loadbalancer who returns it to browser
    9. browser gets response, status code 200 means it worked yay! header includes content link prop which specifies total # of bytes the browser can expect to receive in the response body in html which will be sent in a stream of packets
    • can see this by curl -v www.etsy.com
    1. HTML (now we're looking at the browser!)
    2. as soon as browser begins to receive bytes of html, starts processing them. incoming bytes passed thru a lookahead preparser aka speculative parser, looking for external resource urls that it can start fetching immediately (e.g. css or js files). as long as these assets are from the exact same host as the original html, they wont require another TCP request.
    3. meanwhile same bytes are being fed into the primary parser aka main parser whose job is to take html and turn it into the DOM. html tokenizer -- always returns valid tree of some kind. the render tree is very similar to the DOM but different in that it only has visible elements (e.g. wont have the head tag, no styles.)
    4. HTML is executed linerally. js is blocked until execution of script / all parsing is done.
    5. css downloaded, parsed in parallel. js is blocked entirely until all css is done (firefox and chrome). if you use async property (at least in chrome), js will execute in parallel.
    6. when DOM tree has been built, the page will be marked as interactive. js can go!
    7. browser paints to screen as soon as it is able to do so

more info on services mentioned above...

  • Domain Name Service (DNS)
    • maps human-friendly domains to IP addresses
    • heavily cached based on configured TTL
    • communication is predominately via UDP
    • server you query may ask n other servers before it gives you an answer
    • stores resolution info in zone files
    • resolution happens in parts, www.etsy.com has three levels: 1) com Top Level Domain, 2) etsy second level, 3) www third, and so on
  • Transmission Control Protocol (TCP)
    • deliver reliable, ordred, & error checked stream of packets across a network
    • protocol underlying HTTP(S), TLS, FTP, email, SSH
    • optimized for accurate delivery
    • inludes congestion control measures
    • internet is entirely latency-bound
  • Transport Layer Security (TLS)
    • means by which TCP requests can be encrypted by using public key cryptography
    • protocol for secure transmission of TCP over a network, used for HTTPS requests
    • also referred to as SSL, which is earlier version of same protocol
    • uses public key cryptography
    • client provides list of supported crypto, server picks one
    • third-party observers can infer the connection endpoints, type of encryption, frequency and amt of data transmitted but can't read or modify any of the data itself
  • Hypertext Transfer Protocol 1.1 (HTTP)
    • req/res model
    • plain-text (not encoded)
    • stateless
    • req are distinguished by method
    • res are distinguished by status codes

How to Optimize

  • cache! then you dont have to make a request.
  • use gzip!
  • optimize external asset load order - put all your external <script> tags just before your </body>
  • optimize for render performance
  • compress your assets
  • move the data closer to the user (CDNs)
  • minimize total round trips
  • remember -- majority of time is spent in the browser

What about HTTP / 2.0?

  • multiplexing
  • header compression (basically just sends diffs)
  • server push

Tools

  • Browser Dev Tools (Chrome/Firefox)
  • Web Page Test (webpagetest.org)
  • PageSpeed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment