Skip to content

Instantly share code, notes, and snippets.

@potch
Last active February 22, 2017 18:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save potch/ac20cfb6f7dd4ce7533088ff7a7766ba to your computer and use it in GitHub Desktop.
Save potch/ac20cfb6f7dd4ce7533088ff7a7766ba to your computer and use it in GitHub Desktop.
Rendering Engines Post

The Road to Quantum Part 1: What are rendering engines?

If we’re going to start from somewhere, we should start from the beginning. A web browser is a piece of software that loads files (usually from a remote server) and displays them locally, allowing for user interaction.

Quantum is the codename for an project that we’ve undertaken at Mozilla to massively upgrade the part of Firefox that actually figures what do display to users based on those remote files[1]. The industry term for that part is “rendering engine”, and without one, you would just be reading code instead of actually seeing a website. Firefox’s rendering engine is called Gecko, and it’s been around for a pretty long while(replace this with how long).

It’s pretty easy for the most part to see the rendering engine as a single black box, sort of like a TV- data goes in, and the black box figures out what to display on the screen to represent that data. The question today is how? What are the steps to turning data into what users see?

The data that makes up a web page is lots of things, but it’s mostly broken down into 3 parts:

  • code that represents the structure of a web page
  • code that provides style: the visual appearance of the structure
  • code that acts as a script of actions for the browser to take: computing, reacting to user actions, and modifying the structure and style beyond what was loaded initially

The rendering engine combines the structure and style together to figure out how to draw the web page on your screen, and which bits of it are interactive.

It all starts with structure. When a browser is asked to load a website, it’s given an address. At this address is another computer who, when contacted, will send data back to the browser. The particulars of how that happens are a whole separate article in themselves, but at the end the browser has the data. This data is sent back in a format called HTML, and it describes the structure of the web page. How does a browser understand HTML?

Rendering engines contain special pieces of code called parsers that convert data from one format into another that the browser holds in its memory[2]. The HTML parser takes the HTML, something like:

<section>
  <h1 class="main-title">Hello!</h1>
  <img src="http://example.com/image.png">
</section>

And parses it, understanding:

Okay, there’s a section. Inside the section is a heading of level 1, which itself contains the text “Hello!” Also inside the section is an image. I can find the data of the image at the location “http://example.com/image.png”

The in-memory structure of the web page is called the Document Object Model, or DOM. As opposed to a long piece of text, the DOM represents a diagram of elements of the final web page: the properties of the individual elements, and which elements are inside other elements:

(Will be a diagram)

name: section
children:
  name: h1
  class: main-title
  children:
    text: Hello!
  ---
  name: img
  src: http://example.com/image.png

In addition to describing the structure of the page, the HTML also includes addresses where styles and scripts can be found. When the browser finds these, it contacts those addresses and loads their data. That data is fed to other parsers that specialize in those data formats. The style format, CSS, plays the next role in our rendering engine.

With Style

CSS is a programming language that lets developers describe the appearance of particular elements on a page. CSS stands for “Cascading Style Sheets”, so named because it allows for multiple sets of style instructions, where instructions can override earlier or more general instructions (called the cascade). A bit of CSS could look like the following:

section {
	font-size: 15px;
  color: #333;
  border: 1px solid blue;
}
h1 {
  font-size: 2em;
}
.main-title {
	font-size: 3em; 
}
img {
  width: 100%;
}

CSS is largely broken up into groupings called rules, which themselves consist of two parts. The first part is selectors, which describe which elements of the DOM (remember those from above?) it is styling, and a list of declarations that specify the styles to be applied to elements that match the selector. The rendering engine contains a sub-system called a style engine whose job it is to take the CSS code and apply it to the DOM that was created by the HTML parser.

(Will be a diagram)

==================NETWORK==================
 |         ↑ CSS               |
 ↓         |                   ↓ CSS
HTML → HTML Parser → DOM → Style Engine

For example, in the above CSS, we have a rule that targets the selector “section”, which will match any element in the DOM with that name. Style annotations are then made for each element in the DOM. Eventually each element in the DOM is finished being styled, and we call this state the computed style for that element. Naively, this would be style information attached to every element, but this would be wasteful as many elements share nearly all the same style. Instead most rendering engines store the style information in a separate more compact space and link the DOM elements to their corresponding computed style entries. This information includes the “intrinsic style” of the elements, such as the length of a piece of text, or the dimensions of an image.

When multiple competing styles are applied to the same element, those which come later or are more specific wins. Think of stylesheets as layers of thin tracing paper- each layer can cover the previous layers, but also let them show through.

Once the rendering engine has computed styles, it’s time to put it to use! The DOM and the computed styles are fed into a layout engine that takes into account the size of the window that’s being drawn into, and uses various algorithms to take each element and draw a box that will hold its content, fit in the window, and take into account all the styles applied to it. This process largely proceeds from left to right, from the top of the page to the bottom. Sometimes, however, the engine discovers that it made an error and has to go back up the page and try again. This backtracking is called “reflow”, and it can be a very time-consuming mistake! The algorithms and the style rules themselves are designed to minimize the number of reflows necessary. Even still, the author of a site’s code needs to be on the lookout to avoid writing code that causes too many reflows.

(Diagram with layout engine included)

When layout is complete, it’s time to turn the blueprint of the page into the part you see! This process is known as painting, and it is the final combination of all the previous steps. Every box that was defined by layout gets drawn, full of the content from the DOM and with the styles from the CSS. The user now sees the page, re-constituted from the code that defines it.

That used to be all that happened!

When the user scrolled the page, we would re-paint, to show the new parts of the page that were previously outside the window. It turns out, however, that users love to scroll! The rendering engine can be fairly certain it will be asked to show content outside of the initial window it draws (called the viewport). More modern browsers take advantage of this fact and paint more of the webpage than is visible. That way when the user scrolls, the parts of the page they want to see are already drawn and ready. As a result, scrolling can be faster and smoother. This technique is the basis of compositing, which is a term for techniques to reduce the amount of painting that’s necessary.

Additionally, sometimes we need to re-draw parts of the screen. Maybe the user is watching a video that plays at 60 frames per second. Or maybe there’s a slideshow or animated list on the page. Browsers can detect that parts of the page will move or update, and instead of re-painting the whole page, they create a layer to hold that content. A page can be made of many layers that overlap one another. A layer can change position, scroll, transparency, or move behind or in front of other layers without having to re-paint anything! Pretty convenient.

Sometimes a script or an animation changes an element. If the size of the element remains the same, we can re-compute the style and skip layout and simply re-paint. If the size of the element changes, we need to re-compute the element’s style (and potentially the style of many more elements on the page), re-calculate the layout (do a reflow), and re-paint the page. This takes a lot of time as computer-speed things go, but so long as it only happens occasionally, won’t negatively affect a user’s experience.

In modern web applications, the structure of the document itself is frequently changed by scripts. This can require the entire rendering process to start more-or-less from scratch, with HTML being parsed into DOM, style calculation, reflow, and paint. Developers must be careful when they do this to avoid making the web browsing experience unbearably slow.

Standards

You may find that not every browser interprets HTML, CSS, and JavaScript the same way. The effect can vary from small visual differences all the way up to a web site that works in one browser and not at all in another. However, on the modern Web, most websites seem to work regardless of which browser you choose. How do browsers achieve this level of consistency?

The formats of web site code, as well as the rules that govern how the code is interpreted and turned into an interactive visual page, are defined by mutually-agreed-upon documents called standards. These documents are developed by committees consisting of representatives from browser makers, web developers, designers, and other members of industry. Together they determine the precise behavior a rendering engine should exhibit given a specific piece of code. There are standards for HTML, CSS, and JavaScript as well as the data format of images, video, and audio and many, many more.

Why is this important? It’s possible make a whole new browser engine and, so long as you made sure that your engine followed the standards, the engine would draw web pages in a way that matched all the other browsers, for all the billions of web pages on the Web. This means that the “secret sauce” of making web sites work isn’t a secret that belongs to any one browser, and lets users choose the browser that meets their needs.


Some sort of shipping container metaphor

What if we were to change big parts of how a browser worked?

We could use standards to verify that the new parts worked like the old parts and that users would get the proper experience they expect.

Moore's no more!

People don't just buy computers to be fast!

Portable, lightweight, inexpensive, energy efficient All trade away some aspect of traditional CPU power

So, how do you make a browser faster?

We have some ideas.

Well, we spend a lot of time thinking in terms of one CPU core. And sure, some parts of compositing or video can use that fancy GPU over there that just keeps getting better at highly specialized forms of math, but most of the browser's work is still done by the old workhorse CPU.

For now.

[1]: : I mean, in physics, a quantum transition happens all at once so like it’s not a super apt name for a long-term multi-phase thing? Anyway.

[2]: Your brain can do things that are like parsing: the word “eight” is a bunch of letters that spell a word, but you convert them them to the number 8 in your head, not the letters e-i-g-h-t.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment