Hi folks, Gatsby is now 1.25 years old — seems high time we push for a 1.0 :-)
I've been thinking about what Gatsby could/should be for many months now and more recently have been prototyping these ideas. This issue is an description of where my thinking is at. It's intended to kickstart a discussion of what Gatsby 1.0 can be. There's a lot of open questions still and I'd love feedback on my ideas to this point.
Once we've settled on a rough direction, my plan is to break out work into sub-issues for more detailed discussion and work.
Performance is king. Gatsby gives you the fastest possible frontend performance by default, no compromises.
- Statically render everything.
- Provide a very fast initial render by loading only the critical HTML, CSS and JS.
- Prefetch data and code for the remainder of the site so subsequent routes changes render instantly.
- Use service workers to intelligently cache assets and provide offline support.
I love Facebook engineering's "pit of success" mantra. Incredibly fast websites should be the default not a monumental engineering challenge.
Study after study has shown that faster websites improve user experience and improve business metrics.
Anything that prevents Gatsby from generating the fastest possible website is a bug.
- The line between web apps and web sites has become blurry.
- Javascript enhancements have gone from a nice-to-have to a must-have for many sites.
- Adding rich web-app functionality is an awkward tack-on for older web tools.
- With Gatsby, adding any sort of JS-driven experience is trivial as it uses React.js for its view layer.
- With React & Webpack you have instant access to 1000s of open source React components and modern Javascript and CSS technologies.
A site's value ultimately comes from its data. Whether that's copy, images, or numerical data — if your site building tool can't get the right data in the right format into your site... then it's not very useful.
I'm working on a new data layer for Gatsby based on GraphQL. Using this and the coming plugin system, you'll be able to add "source plugins" that will let you easily pull in data from any number of sources e.g. a directory of markdown files or an external API.
The GraphQL data layer will let you treat your markdown files as a database.
Each route component can query your site's schema for just the data it needs. This gives you complete flexibility to turn your data into whatever type of web experience you'd like.
Note: the following three sections will be kept updated as we move towards a 1.0.
- 9/1 — alpha 1 released. Barely sorta works. POC release. Check out Bricolage.io for a mostly working live site.
Many of you have asked how to help!
- Feedback! Read through this page and other issues and ask questions, point out potential problems, bring up all your website unicorns you hope Gatsby can give you.
- Code is very prototype-quality right now so I don't suggest trying to build a site just yet. Will be doing major refactor over next week or so and release alpha 2 with beginnings of plugin system, etc. Once alpha 2 is released, please building sites and plugins and tell us how it went.
- Also once alpha 2 is released, write PRs! There's lots to be done so if one of the new sub-systems really interests you, I'd love for you to take it on. Let me know and I'll help get you started.
- If you're in bay area — I'd love to pair program a site — let's hang out and build a site together.
- If you're not in the bay area — I'd love for you to pay me to come fly to wherever you live and code a site together :-)
- Directly sponsor Gatsby's development — fund me or someone on your team or in the community to work on Gatsby. If you have R&D budget or open source sponsorship budget, anything would be helpful. Code ain't cheap and more people that are working on code, documenation, examples, and tutorials the faster we can move.
Many of the changes in 1.0 are intended to make it easier for people to contribute to Gatsby. The plugin system (and post-1.0, the theme system) will mean you can create and publish additional behaviors for Gatsby through plugins. Let's make the core smaller to increase the surface area where people can contribute.
Alpha 1 is kinda hard to use. It has a lot of assumptions baked in to make my blog work. You can possibly get gatsby working with my blog but I'd wait until I refactor out more of the sharp edges for the second alpha.
To make these goals a reality there are both some breaking changes and major new additions that will be needed.
Data in Gatsby currently is pushed into templates to be rendered into HTML (like pretty much every static site generator). This is a simple pattern and works great for many use cases. But when you start working on more complex sites, you really start to miss the flexibility of building a database-driven site. With a database, all your data is available to query against in any fashion that you'd like. Whatever bits of data you need to assemble a page, you can pull in. You want to create author pages showing their bio & last 5 posts? It's just a query away. I want this same flexibility for Gatsby. I want to be able to query my markdown (or picture or data, etc) files and treat them as a database of sorts.
This is especially important for Gatsby as unlike traditional static-site-generators, all data used to build a page is loaded into the client. Currently Gatsby loads all data for the site into the client. This is both wasteful (your site doesn't use all that data) as well as costly. Time-to-interactivity is an important web performance metric. The larger your javascript bundle, the longer it takes to download and evaluate the Javascript. This is especially noticible on low-end phones on poor networks.
With this change in Gatsby 1.0 both code and data will be split on a per-route basis. When a user visits a page, they will load just the javascript & data it needs and then lazy-load more once the first page is initialized.
Now a site can easily have "heavy" pages (in terms of data and/or code) without affecting other parts of the site. E.g. a search page or a page with data visualizations.
I've been prototyping how this will work and will provide more details further down the page.
Many of the changes in Gatsby 1.0 are inspired by the fine work of engineers at Google (and elsewhere) who've been researching patterns for improving web performance and building these into the web platform.
Particularly helpful is the PPRL pattern.
PPRL stands for:
- Push critical resources for the initial route.
- Render initial route.
- Pre-cache remaining routes.
- Lazy-load and create remaining routes on demand.
The less work you do up front, the faster your app boots up. Code and data splitting ensure that only the critical resources needed for the inital route are loaded.
Note, "push" refers to HTTP/2 Server Push which very few hosts support yet. I've been researching this and talking to people about it. Support for Server Push in Gatsby will probably come as host-specific plugins.
Gatsby uses Webpack right now for everything. Javascript, CSS, images, Markdown, JSON, YAML, etc. are all handled using Webpack's rather brilliant system of treating everything as JS modules.
Using Webpack has worked out really really well for Gatsby. It gives us a ton out of the box. A lovely hot-reloading development experience. Easy interoperability with all the latest and greatest web tools. And fast, optimized production builds. It's truely a swiss-army knife of tools.
But Webpack has some problems with data.
First it only understands files. If you want to integrate data from any other source e.g. external APIs you have to first convert that data into files.
Webpack can get weird if you try to reference files from outside of web root. I've been bitten by this several times as have others.
Another big problem is you can't use just some data from a file. What if you wanted to use data in your site from a 1 gigabyte CSV file? There's no way to get around loading the entire file unless again you first preprocess the file.
The last problem is data splitting. Ideally each route can load only the data it needs. But how? Often a route will want a bit of data from a number of files or other data sources. How can a route both easily specify what data it needs as well as tell Webpack to package that minimal data set together to be shipped to the browser to power the react component(s) for that route.
I've thought through a number of different possibilities (this issue explores one of those) but could never quite figure out how to make Webpack do what I wanted it to.
So eventually I concluded the simplest thing would be to split the data layer off and remove it from Webpack's control. Let Webpack do what it does best and build a data system tailor-made for Gatsby's needs.
I've been prototyping this new data layer the past few weeks with GraphQL and am really really pleased with how well it's working.
When you setup a site, you'll add one to many source plugins. These source plugins can be file-based e.g. a markdown source plugin which you point at a directory of markdown files or network-based e.g. for consuming an internal API or a 3rd-party API like Github.
Each source plugin defines types which get composed together to form a schema for your site.
This combined schema is consumed by GraphQL and made available to query against.
That's fairly straightfoward. What was tricky though was figuring out how to integrate the new data layer with React components. The pattern which I eventually settled on for my initial prototype is pleasingly simple.
All routes are powered by React.js components. A route component can
either power one path e.g. about.js
or can power many paths e.g. for
all blog posts blog-post.js
. Route components need data. To get data,
they can export a GraphQL query. This query is run during bootstrap and
the result is written out as a JSON file which is inserted into the route
component as props. During development the "query runner" watches both
route components and source files for changes and re-runs queries
overwriting the JSON files which then Webpack hot-reloads.
So a very minimal example. Say you have a blog and you want to create an
index page listing your blog posts. In your /pages
directory you'd
create an index.js
which would look something like:
import React from 'react'
import get from 'lodash/get'
import Link from 'react-router/lib/Link'
const BlogIndex = ({ data }) => {
const blogPosts = get(data, 'allMarkdown.edges')
const postList = blogPosts.map((post) => {
return (
<li>
<Link
to={post.node.path}
>
{post.node.frontmatter.title}
</Link>
</li>
)
})
return (
<div>
<h1>Blog posts</h1>
<ul>{postList}</ul>
</div>
)
}
export default BlogIndex
export const routeQuery = `
{
allMarkdown {
edges {
node {
path
frontmatter {
title
}
}
}
}
}
`
You can now think of the various content/data files you have as a "database" to query against however you want. E.g. to create a page listing tags you could export this query.
export const routeQuery = `
{
allMarkdown {
edges {
node {
frontmatter {
tags
}
}
}
}
}
`
I created a page like this on my blog (which is running Gatsby-1.0-alpha1) https://www.bricolage.io/tags/
Stuff like pagination, tag pages, and other "meta" pages are now pretty straightfoward.
Going with GraphQL also gives us access to fantastic tooling. Facebook uses GraphQL heavily and one of the most useful internal GraphQL tools they've released is GraphiQL. An IDE for GraphQL.
Here's a gif of me exploring my blog's GraphQL schema.
I'm super duper excited about all the possibilities the new GraphQL layer opens up. Here's a sampling of some ideas I've had.
- Use React Docgen to make PropType or Flow information from your React components queryable. Create a living styleguide.
- Do something similar for other JS docs systems e.g. JSDocs. Imagine writing code documentation while the documentation hot-reloads your changes.
- Programmable data. GraphQL fields can take arguments. Query for images
and pass a
width
value as an argument and have the image source plugin resize the image on the fly. Pass a format string to a date field and get back a formatted date (no more loading moment.js into the client). - Connect to 3rd party APIs e.g. Github, Twitter, Facebook, etc.
- Build sites using hosted CMSs e.g. Contentful, DatoCMS, or Prismic.
- Validate data e.g. require that all Markdown files have a title field that's of a minimum length.
- Connect data. GraphQL let's you easily connect types together e.g. the
author
field in the frontmatter of a markdown file can be connected to data from anauthors.yaml
file which let's you write queries like:
{
markdown {
frontmatter {
author {
firstName
lastName
}
}
}
}
- Query Markdown AST for advanced use cases. E.g. custom footnote rendering.
- Pulling data from legacy systems e.g. use a Wordpress source plugin and rebuild an old site on Gatsby while still maintaining content in Wordpress.
- Extend source plugin schemas with custom fields for your site.
- Add standard query operators to schema so you can easily sort, filter, search, glob, regex, groupBy, sum, etc. data.
With the coming source plugin architecture, getting data into your site will soon be straightfoward. Identify the sources of data, compose source plugins, play in GraphiQL to create queries, drop queries in route components, write components.
Gatsby currently is too magical with creating paths. It tries to
auto-generate paths based on files' positions on the file system. So a
file named my-sweet-blog-post.md
becomes /my-sweet-blog-post/
. Which
is fun and works but often you want more control.
So with Gatsby 1.0, I'm planning that all paths will be created programmatically.
Within plugins and at the site level you can create a function
called createRoutes
. This gets called with a graphql
function. With
that you write queries to get data and then return an array of route
objects with paths and the component responsible for that path.
Gatsby takes this route information and auto-generates a React Router config.
The beauty of static site generators is you know everything at build time. So you can calculate exactly what paths are needed.
I want to add support for purely client-side routes as well if you're loading data dynamically from an API but for server rendered stuff, we can calculate all routes ahead of time.
This means you're no longer limited to file-based routes and can easily do stuff like pagination or tag pages or a 1000 other things.
A simple example for a blog.
import _ from 'lodash'
exports.createRoutes = (graphql, cb) => {
const paths = []
graphql(`
{
allMarkdown(first: 1000) {
edges {
node {
path
}
}
}
}
`)
.then(result => {
const blogComponent = './pages/article-route.js'
let routes = []
// Create blog post routes.
_.each(result.data.allMarkdown.edges, (edge) => {
routes.push({
path: edge.node.path,
component: blogComponent,
})
})
cb(null, routes)
})
}
This is extra setup compared to what we have now but is still fairly straightforward and combined with the new GraphQL data layer, 1000x more flexible.
Also plugins and themes can provide default route creation for you so
you can just install a blog theme and just start dropping markdown files
in a content
directory. Or install a pagination plugin and tell it to
create /page/1
, /page/2
(as many as needed) with 10 blog posts per
page.
- BLOG post with details *
Normalize code.
bundles:
- commons.js — React, React Router, utility modules you use across the site
- route components — e.g. for a blog you might have a BlogIndex.js and BlogPost.js route component
- Per route data bundles. Each page has its data in own module.
details about Webpack config
alluded to this earlier — everything built on this. Easy to compose different functionality.
code examples
Move asset handling to plugins?
- Google Analytics
- RSS feed
- Markdown spell checker (in dev)
gatsby-config.js
New APIs will be necessary for this to work.
Propose API hardening pattern.
level 1: experimental only. Underscored. Unstable, can be broken on minor release. New APIs must incubate in level 1 for 3 months.
level 2: At least 3 plugins use API at least 1 core plugin. Can only be broken on major release. Documented and tested. Alias underscore function to non-underscored.
level 3: At least 6 plugins use API. Been in core for ~6 months. Thorughly documented and tested.
So will release 1.0 with a number of underscored APIs. In subsequent minor releases will improve them and then move them to level 2 and eventually level 3.
Existing APIs will get grandfathered in at level 2.
At major releases, evaluate all level 1 & 2 apis and remove or consolidate APIs that overlap or don't have sufficient power.
The create-react-app folks have done amazing work around improving the UX of Webpack based CLI apps. We should copy them liberally. Probably most improvements we could borrow from them aren't breaking changes so feel free to start cherry-picking now!
- store data in database?
- extend source plugins
- Sync data locally from remote APIs (
gatsby sync my-slow-api
?) - animations on route transitions
- Themes — really powerful, really want. Experience, NPM install, set in config, instant blog, instant documentation site, instant marketing website, etc. Can include plugins with configuration, route components, and styling.