- let's make request to https://www.etsy.com
- what does the browser do?
- DNS
- check to see if it's already looked up the IP address for this host
- check to see if the address is the local host itself
- look in
/etc/hosts
to see if there is a matching entry - browser sends a UDP (?) packet to the resolving name server (RNS), asks "do you know where www.etsy.com is?" a. can configure your browser to use any RNS you want b. packets have no guarantee of delivery c. smallest
- RNS could have cached the location of www.etsy.com. if NOT...
- RNS will query the route zone server (RZS) asking the same question ("do you know?")
- RZS will say "nah man, but i do know where
.com
is located! good luck!" - RNS will say "okay
.com
do you know wherewww.etsy.com
is located?" .com
will say "nah brah, but i betetsy.com
knows where to find it!"- RNS will ask
etsy.com
the same question who will be like "OMG i know!" - RNS catches the IP address and returns it to the browser
- can see this by
dig www.etsy.com
- TCP
- browser sends
syn
(x=rand()
) to www.etsy.com server - server sends
syn ack
(x+=1 y=rand()
) - browser sends
ack
(x+1 y+1
) back ^^ these three steps establish the TCP connection, the handshake is complete
- can see this by
tcpdump -c3 host www.etsy.com
- TLS
- along with that final
ACK
, browser sends to serverACK
(ClientHello
) along with version of TLS that's running and some services offered by TLS - server picks a TLS protocol version, decides on a cypher suite, attaches its certificate, sends browser
ServerHello Certificate
- client initiates key exchange, used to establish further connectivity sends
ClientKeyExchange ChangeCipherSpec
- server processes the key exchange, tests for integrity, and returns
ChangeCipherSpec Finished
- HTTP
- browser sends get request for route. includes cookies, http version it supports, the host
- server responds to request. lots happens. loadbalancer gets the req, queries databases and caches, creates html doc, returns to loadbalancer who returns it to browser
- browser gets response, status code 200 means it worked yay! header includes content link prop which specifies total # of bytes the browser can expect to receive in the response body in html which will be sent in a stream of packets
- can see this by
curl -v www.etsy.com
- HTML (now we're looking at the browser!)
- as soon as browser begins to receive bytes of html, starts processing them. incoming bytes passed thru a lookahead preparser aka speculative parser, looking for external resource urls that it can start fetching immediately (e.g. css or js files). as long as these assets are from the exact same host as the original html, they wont require another TCP request.
- meanwhile same bytes are being fed into the primary parser aka main parser whose job is to take html and turn it into the DOM. html tokenizer -- always returns valid tree of some kind. the render tree is very similar to the DOM but different in that it only has visible elements (e.g. wont have the
head
tag, no styles.) - HTML is executed linerally. js is blocked until execution of script / all parsing is done.
- css downloaded, parsed in parallel. js is blocked entirely until all css is done (firefox and chrome). if you use async property (at least in chrome), js will execute in parallel.
- when DOM tree has been built, the page will be marked as interactive. js can go!
- browser paints to screen as soon as it is able to do so
- Domain Name Service (DNS)
- maps human-friendly domains to IP addresses
- heavily cached based on configured TTL
- communication is predominately via UDP
- server you query may ask n other servers before it gives you an answer
- stores resolution info in
zone
files - resolution happens in parts, www.etsy.com has three levels: 1)
com
Top Level Domain, 2)etsy
second level, 3)www
third, and so on
- Transmission Control Protocol (TCP)
- deliver reliable, ordred, & error checked stream of packets across a network
- protocol underlying HTTP(S), TLS, FTP, email, SSH
- optimized for accurate delivery
- inludes congestion control measures
- internet is entirely latency-bound
- Transport Layer Security (TLS)
- means by which TCP requests can be encrypted by using public key cryptography
- protocol for secure transmission of TCP over a network, used for HTTPS requests
- also referred to as SSL, which is earlier version of same protocol
- uses public key cryptography
- client provides list of supported crypto, server picks one
- third-party observers can infer the connection endpoints, type of encryption, frequency and amt of data transmitted but can't read or modify any of the data itself
- Hypertext Transfer Protocol 1.1 (HTTP)
- req/res model
- plain-text (not encoded)
- stateless
- req are distinguished by method
- res are distinguished by status codes
- cache! then you dont have to make a request.
- use gzip!
- optimize external asset load order - put all your external
<script>
tags just before your</body>
- optimize for render performance
- compress your assets
- move the data closer to the user (CDNs)
- minimize total round trips
- remember -- majority of time is spent in the browser
- multiplexing
- header compression (basically just sends diffs)
- server push
- Browser Dev Tools (Chrome/Firefox)
- Web Page Test (webpagetest.org)
- PageSpeed