sangupta/CDN.md

## CDN.md

      
    Raw
  

              CDN.md
            
          
    What are CNAME, A records?

The way Internet works is that we first type a domain name in browser: say google.com. The browser then hits the DNS servers for resolution of IP address, so that the browser know which machine to send the request to. The DNS servers return an IP (v4 or v6) against google.com and that IP is then used for all requests going forth. There is a TTL (time to live) associated with every DNS resolution after which the browser is required to re-ask the DNS servers for the new IP again (in case it has changed - this is meant to make sure that in case the machines fail or IPs change - the website can again be reached after the TTL is expired.
Now there are multiple types of DNS records:
1. A Record

Called as Address mapping records - these are used to map a domain/subdomain to a given IP - say,
google.com 8.8.8.8
google.com 8.8.4.4

The two records ensure that the browser is free to use any of the record, or the DNS server can give 50% users each IP (depends on the DNS server)
2. CNAME Record

These are also called canonical name records - or alias records. So I can say,
sangupta.com		pages.github.io

This means that any resolution for sangupta.com should query pages.github.io domain for next request. This is sort-of-a-redirect to the browser but on the DNS level as the address in the browser does not change the domain. These are usually used for multi-tenancy. Some people also use a MX record with an IP (which is infact valid) - but ideally if you have an IP - then a A record must be sed.
3. MX Record

These are used to send email to - so if I send an email to sangupta@adobe.com, the DNS server will be requested for a MX record
4. AAAA: for IPv6 resolution - work like A record

5. HINFO - host info record - for who-is query - when was it registered, who registered the domain and all

Read more at http://dns-record-viewer.online-domain-tools.com/

How static content of website is cached (stale check) and served by CDN? If not CDN then what is the problem?

Let us first see how a CDN works. With a CDN the query flow is as under
Browser   --> 	CDN	  --> 	Actual server (called Origin server or just origin)
Browser   --> 	CDN (asia)
Browser   --> 	CDN (europe)
Browser   --> 	CDN (australia)
Browser   --> 	CDN (india) - each location specific server of a CDN system is called an edge server

There are two modes of working of a CDN server:
1. Push CDN

In a push CDN the origin server knows the CDN system and wants its files to be cached eagerly - like Apple OS updates, Android OS updates, Adobe creative cloud binary images - here we pay first and make sure that there is no bottleneck for the user. So origin itself pushes the content ahead-of-time to the CDN system, which is turn propogates it to multiple edge servers. This propogation usually takes a few minutes to replicate around the world (also depending on the file size).
The files in this case are cached eternally, until the origin explitly does not update them or delete them. The origin can also ask the CDN to refresh the file in this case, which is nothing else but pushing the file again to all edge servers of the world.
2. Pull CDN

In a pull CDN, the content is lazily fetched. In this case the origin only pushes the configuration to the CDN edge servers. Now when a request lands up on an edge server the very first time, it goes ahead and fetches the response from the origin server, caches it locally and serves to the browser. There is a TTL (via caching headers, max-age header, expires header, CDN config or otherwise) associated with each file (or CDN account depending on plan purchased). After this TTL has expired, the edge server is mandated to fetch the file again from the origin server.
However, an edge server may revoke the file before the TTL if it deems that no new request every came for the while to this edge server and it needs to free up space for other requests. In such a case the edge server will make another request to origin before the TTL actually expired.
This is cheaper due to the same edge server being reused for other requests and keeping disk space low.

How does pull CDN works?

Explained above :)

How does security work between CDN and Adobe server? Is there any SSL handshake?

SSL handshakes and all usually depend on the CDN implementation. It is almost a standard practice to have the CDN as well as the origin server be both on SSL. Some accounts (read CDN users) may terminate the SSL on the CDN layer itself and the origin exposed as HTTP only.
Security of content can be implemented in multiple ways.

The origin servers may be configured to allow requests from certain IPs only via firewall - using the egress IP range of CDN servers
The content on origin may be protected using certificates that are trusted only between CDN and Origin servers
There may be Basic/OAuth authentication between CDN and Origin servers

Today, CDN servers can also serve protected content. Say purchased binary license is to be pushed via CDN. When the download URL is requested, the origin server connects with CDN to generate a unique URL that is bound to user (via cookies or request params or otherwise) and passes to user to click and download. Such URLs may also have an expiry associated with them.
In some cases the edge servers make use of Authorization header - the incoming value is checked against a webservice on the Origin server and when the origin server gives a go-ahead, the file is served from the edge server.

Static content includes images, css files, js file or html files also? How does it work when we use html templates?

For CDN servers files are just binary streams. Whether they are HTML, images, CSS or anything - it does not matter. They are just byte streams. How are they used by downstream clients is up to them. Thus, html templates can be loaded in a client and then compiled by client to use - its on them.

Now when browsers also cache the static content, do we still need CDN – except caching media files, large zip files?

Browsers may cache the content for a shorter period of time. And it is per machine. Say the same google home page image being requested by everyone inside a company. All ten-thousand users would hit the origin server without a CDN. Browser cache would reduce the image load from once every few seconds to ten-thousand per day. With a CDN the request is reduced to 1 request per day.
Also, due to edge servers being physically very near to browser - the latencies are very low - and this reduces the load time - and improving the page/client performance.

Does CDN impact search engine crawling?

CDN if implemented in a right away does not impact search engine crawling. However, it is said that the speed of the website does play some role (like being mobile compatible) in the ranking of a page. Not confirmed though.

How much of CDN is used in changing user pattern where everything is moving towards mobile?

The faster the site/client loads the higher is the user engagement. Let's compare two image hosting sites. If one site displays results in split-second and the other in minutes, the site with split-second results will have more users. Users will abandon the site which is too slow. Unless the content of that site is too too unique and needed. Which will rarely happen - as there are multiple alternatives to every site/application.
Desktop machines are usually hard-wired till the last mile - and thus the network latencies do not matter much for normal websites.
However, majority of the mobile users are still using 2G, or 3G. The speeds being too low on mobile devices (also aggravated by less CPU and less RAM) - the latencies do matter a lot. The faster the site, the mobile performance of its site or application will be faster too.
Hence, CDN becomes a must-have for mobile-based experiences.