Skip to content

Instantly share code, notes, and snippets.

@KyleAMathews
Last active September 3, 2016 02:02
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save KyleAMathews/438c57c846e80c75b678e2c789708f81 to your computer and use it in GitHub Desktop.
Save KyleAMathews/438c57c846e80c75b678e2c789708f81 to your computer and use it in GitHub Desktop.

Hi folks, Gatsby is now 1.25 years old — seems high time we push for a 1.0 :-)

I've been thinking about what Gatsby could/should be for many months now and more recently have been prototyping these ideas. This issue is an description of where my thinking is at. It's intended to kickstart a discussion of what Gatsby 1.0 can be. There's a lot of open questions still and I'd love feedback on my ideas to this point.

Once we've settled on a rough direction, my plan is to break out work into sub-issues for more detailed discussion and work.

Goals for Gatsby 1.0

High performance by default

Performance is king. Gatsby gives you the fastest possible frontend performance by default, no compromises.

  • Statically render everything.
  • Provide a very fast initial render by loading only the critical HTML, CSS and JS.
  • Prefetch data and code for the remainder of the site so subsequent routes changes render instantly.
  • Use service workers to intelligently cache assets and provide offline support.

I love Facebook engineering's "pit of success" mantra. Incredibly fast websites should be the default not a monumental engineering challenge.

Study after study has shown that faster websites improve user experience and improve business metrics.

Anything that prevents Gatsby from generating the fastest possible website is a bug.

Rich webapp-like feel

  • The line between web apps and web sites has become blurry.
  • Javascript enhancements have gone from a nice-to-have to a must-have for many sites.
  • Adding rich web-app functionality is an awkward tack-on for older web tools.
  • With Gatsby, adding any sort of JS-driven experience is trivial as it uses React.js for its view layer.
  • With React & Webpack you have instant access to 1000s of open source React components and modern Javascript and CSS technologies.

Source all the data

A site's value ultimately comes from its data. Whether that's copy, images, or numerical data — if your site building tool can't get the right data in the right format into your site... then it's not very useful.

I'm working on a new data layer for Gatsby based on GraphQL. Using this and the coming plugin system, you'll be able to add "source plugins" that will let you easily pull in data from any number of sources e.g. a directory of markdown files or an external API.

The GraphQL data layer will let you treat your markdown files as a database.

Each route component can query your site's schema for just the data it needs. This gives you complete flexibility to turn your data into whatever type of web experience you'd like.

Current status / how to help

Note: the following three sections will be kept updated as we move towards a 1.0.

Steps on road to 1.0

  • 9/1 — alpha 1 released. Barely sorta works. POC release. Check out Bricolage.io for a mostly working live site.

How you can help

Many of you have asked how to help!

  • Feedback! Read through this page and other issues and ask questions, point out potential problems, bring up all your website unicorns you hope Gatsby can give you.
  • Code is very prototype-quality right now so I don't suggest trying to build a site just yet. Will be doing major refactor over next week or so and release alpha 2 with beginnings of plugin system, etc. Once alpha 2 is released, please building sites and plugins and tell us how it went.
  • Also once alpha 2 is released, write PRs! There's lots to be done so if one of the new sub-systems really interests you, I'd love for you to take it on. Let me know and I'll help get you started.
  • If you're in bay area — I'd love to pair program a site — let's hang out and build a site together.
  • If you're not in the bay area — I'd love for you to pay me to come fly to wherever you live and code a site together :-)
  • Directly sponsor Gatsby's development — fund me or someone on your team or in the community to work on Gatsby. If you have R&D budget or open source sponsorship budget, anything would be helpful. Code ain't cheap and more people that are working on code, documenation, examples, and tutorials the faster we can move.

Many of the changes in 1.0 are intended to make it easier for people to contribute to Gatsby. The plugin system (and post-1.0, the theme system) will mean you can create and publish additional behaviors for Gatsby through plugins. Let's make the core smaller to increase the surface area where people can contribute.

How you can test

Alpha 1 is kinda hard to use. It has a lot of assumptions baked in to make my blog work. You can possibly get gatsby working with my blog but I'd wait until I refactor out more of the sharp edges for the second alpha.

New additions and breaking changes

To make these goals a reality there are both some breaking changes and major new additions that will be needed.

Pull data into components instead of pushing

Data in Gatsby currently is pushed into templates to be rendered into HTML (like pretty much every static site generator). This is a simple pattern and works great for many use cases. But when you start working on more complex sites, you really start to miss the flexibility of building a database-driven site. With a database, all your data is available to query against in any fashion that you'd like. Whatever bits of data you need to assemble a page, you can pull in. You want to create author pages showing their bio & last 5 posts? It's just a query away. I want this same flexibility for Gatsby. I want to be able to query my markdown (or picture or data, etc) files and treat them as a database of sorts.

This is especially important for Gatsby as unlike traditional static-site-generators, all data used to build a page is loaded into the client. Currently Gatsby loads all data for the site into the client. This is both wasteful (your site doesn't use all that data) as well as costly. Time-to-interactivity is an important web performance metric. The larger your javascript bundle, the longer it takes to download and evaluate the Javascript. This is especially noticible on low-end phones on poor networks.

With this change in Gatsby 1.0 both code and data will be split on a per-route basis. When a user visits a page, they will load just the javascript & data it needs and then lazy-load more once the first page is initialized.

Now a site can easily have "heavy" pages (in terms of data and/or code) without affecting other parts of the site. E.g. a search page or a page with data visualizations.

I've been prototyping how this will work and will provide more details further down the page.

High performance === PPRL

Many of the changes in Gatsby 1.0 are inspired by the fine work of engineers at Google (and elsewhere) who've been researching patterns for improving web performance and building these into the web platform.

Particularly helpful is the PPRL pattern.

PPRL stands for:

  • Push critical resources for the initial route.
  • Render initial route.
  • Pre-cache remaining routes.
  • Lazy-load and create remaining routes on demand.

The less work you do up front, the faster your app boots up. Code and data splitting ensure that only the critical resources needed for the inital route are loaded.

Note, "push" refers to HTTP/2 Server Push which very few hosts support yet. I've been researching this and talking to people about it. Support for Server Push in Gatsby will probably come as host-specific plugins.

New GraphQL data layer

Gatsby uses Webpack right now for everything. Javascript, CSS, images, Markdown, JSON, YAML, etc. are all handled using Webpack's rather brilliant system of treating everything as JS modules.

Using Webpack has worked out really really well for Gatsby. It gives us a ton out of the box. A lovely hot-reloading development experience. Easy interoperability with all the latest and greatest web tools. And fast, optimized production builds. It's truely a swiss-army knife of tools.

But Webpack has some problems with data.

First it only understands files. If you want to integrate data from any other source e.g. external APIs you have to first convert that data into files.

Webpack can get weird if you try to reference files from outside of web root. I've been bitten by this several times as have others.

Another big problem is you can't use just some data from a file. What if you wanted to use data in your site from a 1 gigabyte CSV file? There's no way to get around loading the entire file unless again you first preprocess the file.

The last problem is data splitting. Ideally each route can load only the data it needs. But how? Often a route will want a bit of data from a number of files or other data sources. How can a route both easily specify what data it needs as well as tell Webpack to package that minimal data set together to be shipped to the browser to power the react component(s) for that route.

I've thought through a number of different possibilities (this issue explores one of those) but could never quite figure out how to make Webpack do what I wanted it to.

So eventually I concluded the simplest thing would be to split the data layer off and remove it from Webpack's control. Let Webpack do what it does best and build a data system tailor-made for Gatsby's needs.

I've been prototyping this new data layer the past few weeks with GraphQL and am really really pleased with how well it's working.

How it'll work

When you setup a site, you'll add one to many source plugins. These source plugins can be file-based e.g. a markdown source plugin which you point at a directory of markdown files or network-based e.g. for consuming an internal API or a 3rd-party API like Github.

Each source plugin defines types which get composed together to form a schema for your site.

This combined schema is consumed by GraphQL and made available to query against.

That's fairly straightfoward. What was tricky though was figuring out how to integrate the new data layer with React components. The pattern which I eventually settled on for my initial prototype is pleasingly simple.

All routes are powered by React.js components. A route component can either power one path e.g. about.js or can power many paths e.g. for all blog posts blog-post.js. Route components need data. To get data, they can export a GraphQL query. This query is run during bootstrap and the result is written out as a JSON file which is inserted into the route component as props. During development the "query runner" watches both route components and source files for changes and re-runs queries overwriting the JSON files which then Webpack hot-reloads.

So a very minimal example. Say you have a blog and you want to create an index page listing your blog posts. In your /pages directory you'd create an index.js which would look something like:

import React from 'react'
import get from 'lodash/get'
import Link from 'react-router/lib/Link'

const BlogIndex = ({ data }) => {
  const blogPosts = get(data, 'allMarkdown.edges')
  const postList = blogPosts.map((post) => {
    return (
      <li>
        <Link
          to={post.node.path}
        >
          {post.node.frontmatter.title}
        </Link>
      </li>
    )
  })
  return (
    <div>
      <h1>Blog posts</h1>
      <ul>{postList}</ul>
    </div>
  )
}

export default BlogIndex

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        path
        frontmatter {
          title
        }
      }
    }
  }
}
`

You can now think of the various content/data files you have as a "database" to query against however you want. E.g. to create a page listing tags you could export this query.

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        frontmatter {
          tags
        }
      }
    }
  }
}
`

I created a page like this on my blog (which is running Gatsby-1.0-alpha1) https://www.bricolage.io/tags/

Stuff like pagination, tag pages, and other "meta" pages are now pretty straightfoward.

Going with GraphQL also gives us access to fantastic tooling. Facebook uses GraphQL heavily and one of the most useful internal GraphQL tools they've released is GraphiQL. An IDE for GraphQL.

Here's a gif of me exploring my blog's GraphQL schema.

graphiql

I'm super duper excited about all the possibilities the new GraphQL layer opens up. Here's a sampling of some ideas I've had.

  • Use React Docgen to make PropType or Flow information from your React components queryable. Create a living styleguide.
  • Do something similar for other JS docs systems e.g. JSDocs. Imagine writing code documentation while the documentation hot-reloads your changes.
  • Programmable data. GraphQL fields can take arguments. Query for images and pass a width value as an argument and have the image source plugin resize the image on the fly. Pass a format string to a date field and get back a formatted date (no more loading moment.js into the client).
  • Connect to 3rd party APIs e.g. Github, Twitter, Facebook, etc.
  • Build sites using hosted CMSs e.g. Contentful, DatoCMS, or Prismic.
  • Validate data e.g. require that all Markdown files have a title field that's of a minimum length.
  • Connect data. GraphQL let's you easily connect types together e.g. the author field in the frontmatter of a markdown file can be connected to data from an authors.yaml file which let's you write queries like:
{
  markdown {
    frontmatter {
      author {
        firstName
        lastName
      }
    }
  }
}
  • Query Markdown AST for advanced use cases. E.g. custom footnote rendering.
  • Pulling data from legacy systems e.g. use a Wordpress source plugin and rebuild an old site on Gatsby while still maintaining content in Wordpress.
  • Extend source plugin schemas with custom fields for your site.
  • Add standard query operators to schema so you can easily sort, filter, search, glob, regex, groupBy, sum, etc. data.

With the coming source plugin architecture, getting data into your site will soon be straightfoward. Identify the sources of data, compose source plugins, play in GraphiQL to create queries, drop queries in route components, write components.

Programmatic path creation

Gatsby currently is too magical with creating paths. It tries to auto-generate paths based on files' positions on the file system. So a file named my-sweet-blog-post.md becomes /my-sweet-blog-post/. Which is fun and works but often you want more control.

So with Gatsby 1.0, I'm planning that all paths will be created programmatically.

Within plugins and at the site level you can create a function called createRoutes. This gets called with a graphql function. With that you write queries to get data and then return an array of route objects with paths and the component responsible for that path.

Gatsby takes this route information and auto-generates a React Router config.

The beauty of static site generators is you know everything at build time. So you can calculate exactly what paths are needed.

I want to add support for purely client-side routes as well if you're loading data dynamically from an API but for server rendered stuff, we can calculate all routes ahead of time.

This means you're no longer limited to file-based routes and can easily do stuff like pagination or tag pages or a 1000 other things.

A simple example for a blog.

import _ from 'lodash'

exports.createRoutes = (graphql, cb) => {
  const paths = []
  graphql(`
    {
      allMarkdown(first: 1000) {
        edges {
          node {
            path
          }
        }
      }
    }
  `)
  .then(result => {
    const blogComponent = './pages/article-route.js'
    let routes = []
    // Create blog post routes.
    _.each(result.data.allMarkdown.edges, (edge) => {
      routes.push({
        path: edge.node.path,
        component: blogComponent,
      })
    })
    cb(null, routes)
  })
}

This is extra setup compared to what we have now but is still fairly straightforward and combined with the new GraphQL data layer, 1000x more flexible.

Also plugins and themes can provide default route creation for you so you can just install a blog theme and just start dropping markdown files in a content directory. Or install a pagination plugin and tell it to create /page/1, /page/2 (as many as needed) with 10 blog posts per page.

Details on code splitting and new Webpack configuration

  • BLOG post with details *

Normalize code.

bundles:

  • commons.js — React, React Router, utility modules you use across the site
  • route components — e.g. for a blog you might have a BlogIndex.js and BlogPost.js route component
  • Per route data bundles. Each page has its data in own module.

details about Webpack config

Plugin system

alluded to this earlier — everything built on this. Easy to compose different functionality.

code examples

Move asset handling to plugins?

Examples

  • Google Analytics
  • RSS feed
  • Markdown spell checker (in dev)

New config file

gatsby-config.js

New APIs

New APIs will be necessary for this to work.

Propose API hardening pattern.

level 1: experimental only. Underscored. Unstable, can be broken on minor release. New APIs must incubate in level 1 for 3 months.

level 2: At least 3 plugins use API at least 1 core plugin. Can only be broken on major release. Documented and tested. Alias underscore function to non-underscored.

level 3: At least 6 plugins use API. Been in core for ~6 months. Thorughly documented and tested.

So will release 1.0 with a number of underscored APIs. In subsequent minor releases will improve them and then move them to level 2 and eventually level 3.

Existing APIs will get grandfathered in at level 2.

At major releases, evaluate all level 1 & 2 apis and remove or consolidate APIs that overlap or don't have sufficient power.

Copy create-react-app

The create-react-app folks have done amazing work around improving the UX of Webpack based CLI apps. We should copy them liberally. Probably most improvements we could borrow from them aren't breaking changes so feel free to start cherry-picking now!

Things to figure out

  • store data in database?
  • extend source plugins
  • Sync data locally from remote APIs (gatsby sync my-slow-api?)
  • animations on route transitions

Future ideas...?

  • Themes — really powerful, really want. Experience, NPM install, set in config, instant blog, instant documentation site, instant marketing website, etc. Can include plugins with configuration, route components, and styling.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment