Skip to content

Instantly share code, notes, and snippets.

@jdanyow
Last active September 10, 2021 10:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save jdanyow/4cd7b62c99bc5a93de381c7c61028486 to your computer and use it in GitHub Desktop.
Save jdanyow/4cd7b62c99bc5a93de381c7c61028486 to your computer and use it in GitHub Desktop.
Templating - Docs Composition Service

Templating - Docs Composition Service

Writing down some thoughts about server-side templating solutions for Docs.

Today

Today Docs.Rendering & docs-render follow roughly the same process. This is by design, for the initial release of docs-render we're prioritizing backwards compatibility, template reuse, time to market.

The heavy parts of handling a request look roughly like this:

  1. Fetch metadata from DHS (JSON, small)
  2. Fetch and compile the liquid template(s) (one-time, not every request, may happen during app startup, or lazily).
  3. Fetch content from DHS (HTML, small to large depending on the page)
  4. Parse the content HTML into a tree structure.
  5. Walk and transform the tree to apply versioning params to <a>/<img> links. For isolated we also transform URLs from absolute to relative and re-target Azure portal links.
  6. Render the tree back to a string.
  7. Create a model object comprised of metadata, content HTML, context object, and other information.
  8. Bind the template to the model producting the full HTML page.
  9. Send the response status, headers, and body.

Room for improvement

Without changing DHS or existing content, the current solution could be improved in several ways.

  1. Templating developer experience.
    1. The template language, Liquid, does not have good editor support. Syntax highlighting is poor, intellisense for the model properties does not exist.
    2. Liquid's handling of logical expressions is very counter-intuitve
  2. Performance.
    1. The current process uses a lot of memory because we parse a long HTML content string and produce a tree object from it before concatenating it back to another long string.
    2. The current process has high latency, the response is fully composed before the client receives any HTML.

Further room for improvement

Rather than template at build time, only to parse and re-render at composition time, we could take a more traditional approach where composition takes data from a variety of services and produces HTML.

  1. Fetch metadata from DHS (JSON, small)
- 2. Fetch and compile the liquid template(s) (one-time, not every request, may happen during app startup, or lazily).
+ 2. Templates are "just javascript" template literals. Built into the application.
- 3. Fetch content from DHS (HTML, small to large depending on the page)
+ 3. Fetch the model (JSON) from the content proxy service, or other service depending on the route.
- 4. Parse the content HTML into a tree structure.
- 5. Walk and transform the tree to apply versioning params to `<a>`/`<img>` links.
-    For isolated we also transform URLs from absolute to relative and re-target Azure portal links.
- 6. Render the tree back to a string.
- 7. Create a model object comprised of metadata, content HTML, context object, and other information.
  8. Bind the template to the model producting the full HTML page.
  9. Send the response status, headers, and body.

Templating with JavaScript template strings

When the time comes to introduce a new templating architecture, let's test whether a streaming approach using tagged template literals would provide better ergonomics and scalability.

  1. Templates are written in TypeScript/JavaScript as tagged template literals... the same format as the client-side templates:

    function pageTemplate(model: PageModel): ReadableStream {
      return html`
         <html>
           <head>
             <title>${model.title}</title>
             ...
           </head>
           <body>
              ${headerTemplate(model)}
              ...
              <main id="main">
                ${model.content}
              </main>
              ...
           </body>
         </html>`
    } 

    Key takeaways here are:

    1. Familiar, powerful syntax
    2. Template functions return a stream rather than a composed string
    3. Safe by default, we can html encode by default and expose an "unsafeHTML" function just like we use in lit-html client-side.

    In the example above, the server never concatenates <html>\n<head/>\n<title> with model.title. Each chunk is streamed independently in the response. This is surprisingly easy to implement, check out the stream-template library for examples.

  2. Eagerly begin streaming the response. The moment we receive the metadata from DHS/content-proxy we know the status code and the template. We can start streaming the response immediately giving the CDN/browser low latency access to the all-important head, which contains all the page dependencies (script/style refs, etc) it will need to begin downloading. TTFB!

  3. Use a streaming parser like htmlparser2 or parse5-sax-parser to transform the content HTML and pipe it into the model's content stream. Rather than parse the full page, walk and transform the tree, and render the HTML we'll process and transform tags as they're encountered (onOpenTag = (name, attributes) => { ... is it an anchor href? rewrite it ... });

In combination I think these types of changes will vastly reduce our memory footprint while providing an intuitive and consistent developer experience across the client and server.

Some pseudo code to illustrate this approach:

async function handleRequest(req, res) {
  // Fetch content metadata and context object in parallel
  const [metadata, context] = await Promise.all([
    fetchContentMetadata(req.url.pathname),
    fetchContext(req.url.searchParams.get('context')
  ]);
  
  // Create the transform stream that will use a streaming html parser to 
  // apply contextual transformations to the html like appending the moniker to urls
  const contentTransformStream = createTransformStream(metadata, context);
  
  // Create the model which the page template will be bound to.
  const model = { metadata, context, environment };
  
  // In practice there would be several template functions, broken down into
  // partial templates, to implement each layout.
  // This example is focusing on a conceptual page.
  // A similar approach would be used for structured content although we'd need to parse the JSON.
  const template = model => `<html><head><title>${model.metadata.title}</title><body>${contentTransformStream}</body></html>`;
  
  // Bind the template to the model, producing a stream.
  const stream = template(model);
  
  // Start streaming the response...
  res.writeHead(200, { 'Content-Type': 'text/html' });
  stream.pipe(res);
  
  // Fetch the content from the content proxy.
  const contentStream = await fetchContent(metadata.blobUrl);
  
  // Stream the content proxy response into the transformer.  
  contentStream.pipe(contentTransformStream);
}
@heskew
Copy link

heskew commented Jun 17, 2020

looks generally what I've been expecting. some additional rough thoughts:

  • would be nice to know if we need to parse any content html
  • would want to optimize response headers (esp. related to cache)
  • would need to expect & handle failures even in the content fetch
  • (how) could we share templates between server and client cleanly?
  • server push deps? 😈
  • any optimizations we can make with localized content and 'template strings'?

@jdanyow
Copy link
Author

jdanyow commented Dec 2, 2020

Added "further room for improvement" section

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment