KyleAMathews/gatsby-1-rfc.md

## gatsby-1-rfc.md

      
    Raw
  

              gatsby-1-rfc.md
            
          
    Hi folks, Gatsby is now 1.25 years old — seems high time we
push for a 1.0 :-)
I've been thinking about what Gatsby could/should be for many months now
and more recently have
been prototyping these ideas. This issue is an description of where my
thinking is at. It's intended to kickstart a discussion of what Gatsby
1.0 can be. There's a lot of open questions still and I'd love feedback
on my ideas to this point.
Once we've settled on a rough direction, my plan is to break out work
into sub-issues for more detailed discussion and work.
Goals for Gatsby 1.0

High performance by default

Performance is king. Gatsby gives you the fastest possible frontend
performance by default, no compromises.

Statically render everything.
Provide a very fast initial render by loading only the critical HTML,
CSS and JS.
Prefetch data and code for the remainder of the site so subsequent
routes changes render instantly.
Use service workers to intelligently cache assets and provide offline
support.

I love Facebook engineering's "pit of success" mantra. Incredibly fast
websites should be the default not a monumental engineering challenge.
Study after study has shown that faster websites improve user
experience and improve business metrics.
Anything that prevents Gatsby from generating the fastest possible
website is a bug.
Rich webapp-like feel


The line between web apps and web sites has become blurry.
Javascript enhancements have gone from a nice-to-have to a must-have
for many sites.
Adding rich web-app functionality is an awkward tack-on for older web
tools.
With Gatsby, adding any sort of JS-driven experience is trivial as it
uses React.js for its view layer.
With React & Webpack you have instant access to 1000s of open source
React components and modern Javascript and CSS technologies.

Source all the data

A site's value ultimately comes from its data. Whether that's copy,
images, or numerical data — if your site building tool can't get the
right data in the right format into your site... then it's not very
useful.
I'm working on a new data layer for Gatsby based on GraphQL. Using this
and the coming plugin system, you'll be able to add "source plugins"
that will let you easily pull in data from any number of sources e.g. a
directory of markdown files or an external API.
The GraphQL data layer will let you treat your markdown files as a
database.
Each route component can query your site's schema for just the data it
needs. This gives you complete flexibility to turn your data into
whatever type of web experience you'd like.
Current status / how to help

Note: the following three sections will be kept updated as we move
towards a 1.0.
Steps on road to 1.0


9/1 — alpha 1 released. Barely sorta works. POC release. Check out
Bricolage.io for a mostly working live site.

How you can help

Many of you have asked how to help!

Feedback! Read through this page and other issues and ask questions,
point out potential problems, bring up all your website unicorns you
hope Gatsby can give you.
Code is very prototype-quality right now so I don't suggest trying to
build a site just yet. Will be doing major refactor over next week or so
and release alpha 2 with beginnings of plugin system, etc. Once alpha 2
is released, please building sites and plugins and tell us how it went.
Also once alpha 2 is released, write PRs! There's lots to be done so
if one of the new sub-systems really interests you, I'd love for you to
take it on. Let me know and I'll help get you started.
If you're in bay area — I'd love to pair program a site — let's hang
out and build a site together.
If you're not in the bay area — I'd love for you to pay me to come fly
to wherever you live and code a site together :-)
Directly sponsor Gatsby's development — fund me or someone on your
team or in the community to work on Gatsby. If you have R&D budget or
open source sponsorship budget, anything would be helpful. Code ain't
cheap and more people that are working on code, documenation, examples,
and tutorials the faster we can move.

Many of the changes in 1.0 are intended to make it easier for people to
contribute to Gatsby. The plugin system (and post-1.0, the theme system)
will mean you can create and publish additional behaviors for Gatsby
through plugins. Let's make the core smaller to increase the surface
area where people can contribute.
How you can test

Alpha 1 is kinda hard to use. It has a lot of assumptions baked in to
make my blog work. You can possibly get gatsby working with my blog but
I'd wait until I refactor out more of the sharp edges for the second
alpha.
New additions and breaking changes

To make these goals a reality there are both some breaking changes and
major new additions that will be needed.
Pull data into components instead of pushing

Data in Gatsby currently is pushed into templates to be rendered into
HTML (like pretty much every static site generator). This is a simple
pattern and works great for many use cases. But when you start working
on more complex sites, you really start to miss the flexibility of
building a database-driven site. With a database, all your data is
available to query against in any fashion that you'd like. Whatever bits
of data you need to assemble a page, you can pull in. You want to
create author pages showing their bio & last 5 posts? It's just a query
away. I want this same flexibility for Gatsby. I want to be able to
query my markdown (or picture or data, etc) files and treat them as a
database of sorts.
This is especially important for Gatsby as unlike traditional
static-site-generators, all data used to build a page is loaded into the
client. Currently Gatsby loads all data for the site into the client.
This is both wasteful (your site doesn't use all that data) as well as
costly. Time-to-interactivity is an important web performance metric.
The larger your javascript bundle, the longer it takes to download and
evaluate the Javascript. This is especially noticible on low-end phones
on poor networks.
With this change in Gatsby 1.0 both code and data will be split on a
per-route basis.  When a user visits a page, they will load just the
javascript & data it needs and then lazy-load more once the first page
is initialized.
Now a site can easily have "heavy" pages (in terms of data and/or code)
without affecting other parts of the site. E.g. a search page or
a page with data visualizations.
I've been prototyping how this will work and will provide more details
further down the page.
High performance === PPRL

Many of the changes in Gatsby 1.0 are inspired by the fine work of
engineers at Google (and elsewhere) who've been researching patterns for
improving web performance and building these into the web platform.
Particularly helpful is the PPRL
pattern.
PPRL stands for:

Push critical resources for the initial route.
Render initial route.
Pre-cache remaining routes.
Lazy-load and create remaining routes on demand.

The less work you do up front, the faster your app boots up. Code
and data splitting ensure that only the critical resources needed for
the inital route are loaded.
Note, "push" refers to HTTP/2 Server Push which very few hosts support
yet. I've been researching this and talking to people about it. Support
for Server Push in Gatsby will probably come as host-specific plugins.
New GraphQL data layer

Gatsby uses Webpack right now for everything. Javascript, CSS, images,
Markdown, JSON, YAML, etc. are all handled using Webpack's rather
brilliant system of treating everything as JS modules.
Using Webpack has worked out really really well for Gatsby. It gives us
a ton out of the box. A lovely hot-reloading development experience.
Easy interoperability with all the latest and greatest web tools. And
fast, optimized production builds. It's truely a swiss-army knife of
tools.
But Webpack has some problems with data.
First it only understands files. If you want to integrate data from any
other source e.g. external APIs you have to first convert that data into
files.
Webpack can get weird if you try to reference files from outside of
web root. I've been bitten by this several times as have
others.
Another big problem is you can't use just some data from a file. What
if you wanted to use data in your site from a 1 gigabyte CSV file?
There's no way to get around loading the entire file unless again you
first preprocess the file.
The last problem is data splitting. Ideally each route can load only the
data it needs. But how? Often a route will want a bit of data from a
number of files or other data sources. How can a route both easily
specify what data it needs as well as tell Webpack to package that
minimal data set together to be shipped to the browser to power the
react component(s) for that route.
I've thought through a number of different possibilities (this
issue explores one of
those) but could never quite figure out how to make Webpack do what I
wanted it to.
So eventually I concluded the simplest thing would be to split the data
layer off and remove it from Webpack's control. Let Webpack do what it
does best and build a data system tailor-made for Gatsby's needs.
I've been prototyping this new data layer the past few weeks with
GraphQL and am really really pleased with how well it's working.
How it'll work

When you setup a site, you'll add one to many source plugins. These
source plugins can be file-based e.g. a markdown source plugin which you
point at a directory of markdown files or network-based e.g. for
consuming an internal API or a 3rd-party API like Github.
Each source plugin defines types which get composed together to form a
schema for your site.
This combined schema is consumed by GraphQL and made available to query
against.
That's fairly straightfoward. What was tricky though was figuring out
how to integrate the new data layer with React components. The pattern
which I eventually settled on for my initial prototype is pleasingly
simple.
All routes are powered by React.js components. A route component can
either power one path e.g. about.js or can power many paths e.g. for
all blog posts blog-post.js. Route components need data. To get data,
they can export a GraphQL query. This query is run during bootstrap and
the result is written out as a JSON file which is inserted into the route
component as props. During development the "query runner" watches both
route components and source files for changes and re-runs queries
overwriting the JSON files which then Webpack hot-reloads.
So a very minimal example. Say you have a blog and you want to create an
index page listing your blog posts. In your /pages directory you'd
create an index.js which would look something like:
import React from 'react'
import get from 'lodash/get'
import Link from 'react-router/lib/Link'

const BlogIndex = ({ data }) => {
  const blogPosts = get(data, 'allMarkdown.edges')
  const postList = blogPosts.map((post) => {
    return (
      <li>
        <Link
          to={post.node.path}
        >
          {post.node.frontmatter.title}
        </Link>
      </li>
    )
  })
  return (
    <div>
      <h1>Blog posts</h1>
      <ul>{postList}</ul>
    </div>
  )
}

export default BlogIndex

export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        path
        frontmatter {
          title
        }
      }
    }
  }
}
`
You can now think of the various content/data files you have as a
"database" to query against however you want. E.g. to create a page
listing tags you could export this query.
export const routeQuery = `
{
  allMarkdown {
    edges {
      node {
        frontmatter {
          tags
        }
      }
    }
  }
}
`
I created a page like this on my blog (which is running
Gatsby-1.0-alpha1) https://www.bricolage.io/tags/
Stuff like pagination, tag pages, and other "meta" pages are now pretty
straightfoward.
Going with GraphQL also gives us access to fantastic tooling. Facebook
uses GraphQL heavily and one of the most useful internal GraphQL tools they've
released is GraphiQL. An IDE for GraphQL.
Here's a gif of me exploring my blog's GraphQL schema.

I'm super duper excited about all the possibilities the new GraphQL
layer opens up. Here's a sampling of some ideas I've had.

Use React Docgen to make
PropType or Flow information from your React components queryable.
Create a living styleguide.
Do something similar for other JS docs systems e.g. JSDocs. Imagine
writing code documentation while the documentation hot-reloads your
changes.
Programmable data. GraphQL fields can take arguments. Query for images
and pass a width value as an argument and have the image source plugin
resize the image on the fly. Pass a format string to a date field and
get back a formatted date (no more loading moment.js into the client).
Connect to 3rd party APIs e.g. Github, Twitter, Facebook, etc.
Build sites using hosted CMSs e.g. Contentful, DatoCMS, or Prismic.
Validate data e.g. require that all Markdown files have a title
field that's of a minimum length.
Connect data. GraphQL let's you easily connect types together e.g. the
author field in the frontmatter of a markdown file can be connected to
data from an authors.yaml file which let's you write queries like:

{
  markdown {
    frontmatter {
      author {
        firstName
        lastName
      }
    }
  }
}


Query Markdown AST for advanced use cases. E.g. custom footnote
rendering.
Pulling data from legacy systems e.g. use a Wordpress source plugin
and rebuild an old site on Gatsby while still maintaining content in
Wordpress.
Extend source plugin schemas with custom fields for your site.
Add standard query operators to schema so you can easily sort, filter,
search, glob, regex, groupBy, sum, etc. data.

With the coming source plugin architecture, getting data into your site
will soon be straightfoward. Identify the sources of data, compose
source plugins, play in GraphiQL to create queries, drop queries in
route components, write components.
Programmatic path creation

Gatsby currently is too magical with creating paths. It tries to
auto-generate paths based on files' positions on the file system. So a
file named my-sweet-blog-post.md becomes /my-sweet-blog-post/. Which
is fun and works but often you want more control.
So with Gatsby 1.0, I'm planning that all paths will be created
programmatically.
Within plugins and at the site level you can create a function
called createRoutes. This gets called with a graphql function. With
that you write queries to get data and then return an array of route
objects with paths and the component responsible for that path.
Gatsby takes this route information and auto-generates a React Router
config.
The beauty of static site generators is you know everything at build
time. So you can calculate exactly what paths are needed.
I want to add support for purely client-side routes as well if you're
loading data dynamically from an API but for server rendered stuff, we
can calculate all routes ahead of time.
This means you're no longer limited to file-based routes and can easily
do stuff like pagination or tag pages or a 1000 other things.
A simple example for a blog.
import _ from 'lodash'

exports.createRoutes = (graphql, cb) => {
  const paths = []
  graphql(`
    {
      allMarkdown(first: 1000) {
        edges {
          node {
            path
          }
        }
      }
    }
  `)
  .then(result => {
    const blogComponent = './pages/article-route.js'
    let routes = []
    // Create blog post routes.
    _.each(result.data.allMarkdown.edges, (edge) => {
      routes.push({
        path: edge.node.path,
        component: blogComponent,
      })
    })
    cb(null, routes)
  })
}
This is extra setup compared to what we have now but is still fairly
straightforward and combined with the new GraphQL data layer, 1000x more
flexible.
Also plugins and themes can provide default route creation for you so
you can just install a blog theme and just start dropping markdown files
in a content directory. Or install a pagination plugin and tell it to
create /page/1, /page/2 (as many as needed) with 10 blog posts per
page.
Details on code splitting and new Webpack configuration


BLOG post with details *

Normalize code.
bundles:

commons.js — React, React Router, utility modules you use across the
site
route components — e.g. for a blog you might have a BlogIndex.js and
BlogPost.js route component
Per route data bundles. Each page has its data in own module.

details about Webpack config
Plugin system

alluded to this earlier — everything built on this. Easy to compose
different functionality.
code examples
Move asset handling to plugins?
Examples


Google Analytics
RSS feed
Markdown spell checker (in dev)

New config file

gatsby-config.js
New APIs

New APIs will be necessary for this to work.
Propose API hardening pattern.
level 1: experimental only. Underscored. Unstable, can be broken on
minor release. New APIs must incubate in level 1 for 3 months.
level 2: At least 3 plugins use API at least 1 core plugin. Can only be
broken on major release. Documented and tested. Alias underscore
function to non-underscored.
level 3: At least 6 plugins use API. Been in core for ~6 months.
Thorughly documented and tested.
So will release 1.0 with a number of underscored APIs. In subsequent
minor releases will improve them and then move them to level 2 and
eventually level 3.
Existing APIs will get grandfathered in at level 2.
At major releases, evaluate all level 1 & 2 apis and remove or
consolidate APIs that overlap or don't have sufficient power.
Copy create-react-app

The create-react-app folks have done amazing work around improving the
UX of Webpack based CLI apps. We should copy them liberally. Probably
most improvements we could borrow from them aren't breaking changes so
feel free to start cherry-picking now!
Things to figure out


store data in database?
extend source plugins
Sync data locally from remote APIs (gatsby sync my-slow-api?)
animations on route transitions

Future ideas...?


Themes — really powerful, really want. Experience, NPM install, set in
config, instant blog, instant documentation site, instant marketing
website, etc. Can include plugins with configuration, route components,
and styling.