What happens when you type a url in the address bar?
http://www.google.com represents a particular resource on the web.
Resources are anything you want to interact with on the web: images, videos, pages, services.
Each resource has a uniform resource locator (URL) to find it.
The HTTP part is the URL scheme: describes how to access a particular resource. HTTP (Hypertext Transfer Protocol) is a request and response protocol. There is ftp for files, mailto for email, HTTPS for secure HTTP.
google.com is the host. It tells the browser which computer on the Interest is hosting the resource. My computer will do a domain name system (DNS) lookup to translate the human-readable domain name into a network IP address to send that request. It will request for the resource on default port 80 (443 for HTTPS).
Anything after a ? is the query string. (/user?fname=alyssa&lname=quek has 2 parameters. firstname and lastname). These are extra information to tell the server which resource you want.
Anything after a # is a fragment and is only used on the client. (/user#experience. Jump to experience section on the page) It identifies a particular section of a resource that the client should navigate or focus to.
The last part is the URL path and tells the host which resource is being requested and it should respond appropriately. (/user) It could be a file on the host's file system or be dynamic. The host may have to take the request and build a resource using content from a database, returning HTML for the browser to display.
When a host responds to an HTTP request, it returns a resource and also specifics the content type or media type of the resource. The server responds and labels the content in its HTTP response message based on its MIME type. MIME associations can be modified at the server such that .pdf files can map to text/html instead of the default application/pdf. In this example, a user who requests for /blah.pdf will see the pdf rendered as gibberish html text.
If a resource doesn't exist, the HTTP response can contain an error message (404: page does not exist. 500: server error). HTTP Status Codes
A resource at a single URL can have multiple representations. E.g. Same resource at google.com can represent it in English or German. Content negotiation is a mechanism defined in the HTTP specification (HTTP/1.1) that makes it possible to serve different versions of a resource. While the host can tag outgoing resources, the client can specify the media that it will accept. Think APIs: If I have a web service in JS, I'll request for a JSON representation at that URL. If its in C++, I'll request for an XML representation since its easier for parsing.
Beware of using unsafe characters: space, pound sign, caret.
RFC 3986: Internet standard or law for URLs, defines the safe characters as being the printable US ASCII characters.
You may transmit unsafe characters so long as it is percent or URL encoded.
Place a % in front of the hexidecimal value for a character in the US ASCII character set. E.g. space = %20, ! = %21, # = %23
Request Message:
[method] [URL] [version]
[headers]
[body]
Example Request:
GET https://gist.github.com/alyssaq/6377540 HTTP/1.1
Host: github.com
Connection: keep-alive
Accept-Encoding: gzip,deflate,sdch
Accept-Language: en, zh-Hans
Date: Fri, 30 August 2013 21:12:00 GMT
...
Response Message:
[version] [status] [reason]
[headers]
[body]
Example Response:
HTTP/1.1 200 OK
Cache-Control: private
Content-Type: text/html; charset=utf-8
...
X-headers are reserved for non-standard headers (e.g. X-Powered-By: ASP.NET)
Referer: URL of the referring page
User-Agent: information about the browser (Chrome, IE)
Accept: preferred media types
Accept-Language: preferred language
Cookie: Cookie information
If-Modified-Since: when the user-agent last retrieved this resource. If it hasn't changed, the browser can cache the resource (e.g. an image) to improve performance.
GET: Request a specific representation of a response
POST: Send data to be processed by the identified resource
HTML4 only supports GET and POST.
PUT (create/update), DELETE, HEAD and OPTIONS are HTML5 under development methods.
Using telnet, which uses default port of 23.
telnet github.com 80
100-199: Informational
200-299: Successful
200: OK Success!
300-309: Redirection
301: Moved Permanently
304: Not Modified
400-499: Client error
402: Payment required
404: Resource not found
500-599: Server error
500: Internal server error. Something went wrong during processing
503: Service unavailable. Server will not service the request maybe due to load.
501: Not Implemented