Skip to content

Instantly share code, notes, and snippets.

@alyssaq
Last active February 13, 2024 07:59
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save alyssaq/6377540 to your computer and use it in GitHub Desktop.
Save alyssaq/6377540 to your computer and use it in GitHub Desktop.
Web HTTP fundamentals - What happens when you type a url in the address bar?

HTTP Fundamentals

HTTP Resources and Messages

What happens when you type a url in the address bar?

HTTP Request: URL

http://www.google.com represents a particular resource on the web.
Resources are anything you want to interact with on the web: images, videos, pages, services.
Each resource has a uniform resource locator (URL) to find it.

The HTTP part is the URL scheme: describes how to access a particular resource. HTTP (Hypertext Transfer Protocol) is a request and response protocol. There is ftp for files, mailto for email, HTTPS for secure HTTP.
google.com is the host. It tells the browser which computer on the Interest is hosting the resource. My computer will do a domain name system (DNS) lookup to translate the human-readable domain name into a network IP address to send that request. It will request for the resource on default port 80 (443 for HTTPS).

Anything after a ? is the query string. (/user?fname=alyssa&lname=quek has 2 parameters. firstname and lastname). These are extra information to tell the server which resource you want.

Anything after a # is a fragment and is only used on the client. (/user#experience. Jump to experience section on the page) It identifies a particular section of a resource that the client should navigate or focus to.

Host Responds

The last part is the URL path and tells the host which resource is being requested and it should respond appropriately. (/user) It could be a file on the host's file system or be dynamic. The host may have to take the request and build a resource using content from a database, returning HTML for the browser to display.

When a host responds to an HTTP request, it returns a resource and also specifics the content type or media type of the resource. The server responds and labels the content in its HTTP response message based on its MIME type. MIME associations can be modified at the server such that .pdf files can map to text/html instead of the default application/pdf. In this example, a user who requests for /blah.pdf will see the pdf rendered as gibberish html text.

If a resource doesn't exist, the HTTP response can contain an error message (404: page does not exist. 500: server error). HTTP Status Codes

Content Negotiation

A resource at a single URL can have multiple representations. E.g. Same resource at google.com can represent it in English or German. Content negotiation is a mechanism defined in the HTTP specification (HTTP/1.1) that makes it possible to serve different versions of a resource. While the host can tag outgoing resources, the client can specify the media that it will accept. Think APIs: If I have a web service in JS, I'll request for a JSON representation at that URL. If its in C++, I'll request for an XML representation since its easier for parsing.

Other Stuff

URL Encoding

Beware of using unsafe characters: space, pound sign, caret.
RFC 3986: Internet standard or law for URLs, defines the safe characters as being the printable US ASCII characters.
You may transmit unsafe characters so long as it is percent or URL encoded. Place a % in front of the hexidecimal value for a character in the US ASCII character set. E.g. space = %20, ! = %21, # = %23

HTTP Messages

Request Message:
[method] [URL] [version]
[headers]
[body]

Example Request:
GET https://gist.github.com/alyssaq/6377540 HTTP/1.1
Host: github.com
Connection: keep-alive
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en, zh-Hans
Date: Fri, 30 August 2013 21:12:00 GMT
...

Response Message:
[version] [status] [reason]
[headers]
[body]

Example Response:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
...
X-headers are reserved for non-standard headers (e.g. X-Powered-By: ASP.NET)

Common Request Headers

Referer: URL of the referring page
User-Agent: information about the browser (Chrome, IE)
Accept: preferred media types
Accept-Language: preferred language
Cookie: Cookie information
If-Modified-Since: when the user-agent last retrieved this resource. If it hasn't changed, the browser can cache the resource (e.g. an image) to improve performance.

HTTP Methods

GET: Request a specific representation of a response
POST: Send data to be processed by the identified resource
HTML4 only supports GET and POST.
PUT (create/update), DELETE, HEAD and OPTIONS are HTML5 under development methods.

Manual Request

Using telnet, which uses default port of 23.

telnet github.com 80

HTTP Status Codes

100-199: Informational
200-299: Successful
200: OK Success!
300-309: Redirection
301: Moved Permanently
304: Not Modified
400-499: Client error
402: Payment required
404: Resource not found
500-599: Server error
500: Internal server error. Something went wrong during processing
503: Service unavailable. Server will not service the request maybe due to load.
501: Not Implemented

HTTP Architecture

Benefits: scalability, simplicity, reliability, and loose coupling (components have little to no knowledge of other components).
URLs act as a pointer between a client and a server application. The client takes that URL and expresses the desired intention and representation in an HTTP message. These messages are simple, in plain text, fully self-describing, easy to parse and standardized. The receiver can look at a request HTTP message and knows what the client wants to do; the path to the resource and additional information about the representation from the headers. The response includes a status code, cache instructions, content type of the resource, length of the resource and other valuable metadata. Since all the information in these messages are visible and easy to parse, HTTP applications can rely on middle services as it moves between the client and server. E.g. Client can tell the server that it supports gzip compression with "Accept-Encoding: compress, gzip" encoding in its HTTP request header. The server can compress a 100kb resource into a 25kb resource and transmit it much faster back to the client. The server may also record to a log file as messages come through.

Proxy Services

Proxy services (server or web proxy) act as an intermediary for requests from clients. It can be used to keep machines anonymous, load-balancing, speed-up resource access (caching), logging, scanning inbound/outbound content and apply access policies (block undesired sites).
Gateways or tunnelling proxies passes the requests and responses unmodified.

Fiddler can install itself as a proxy on the machine and log all HTTP traffic through the loopback IP address, the local host at 127.0.0.1 on port 8888.

Caching

Instead of sending the same bytes over the network, browsers can cache resources locally. A public cache on a forward proxy can cache resources that are popular amongst a community of uses while on a reverse proxy, it can cache resources that are popular on specific websites. The rules on what, when to cache and when to invalidate are controlled in the HTTP response message. This is set in the "Cache-Control" header with values of public, private or no cache.

self.response.cache_control.public = True
self.response.cache_control.max_age = 300
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment