Skip to content

Instantly share code, notes, and snippets.

@dustinlacewell
Last active November 29, 2020 12:53
Show Gist options
  • Save dustinlacewell/1f327beccc6f1c9dcf872a7c293ef4bb to your computer and use it in GitHub Desktop.
Save dustinlacewell/1f327beccc6f1c9dcf872a7c293ef4bb to your computer and use it in GitHub Desktop.

Generalizing Styx

Motivation

Currently Styx bakes in a number of concepts and abstractions typical of “static site generators”. Some of these ideas are:

  • The main build target is a “site”
  • The main assets of the build are “pages”
  • The ideas of “layout”, “templates” and “themes” are central

There are a number of advantages to following these standard conventions:

  • User familiarity with the ideas
  • Association with existing tools

However, being based on Nix, Styx has the room be much more flexible, and general than that. This document describes a plan to simplify the Styx core and move it towards more towards something that would be described as a “programmable content pipeline”.

Programmable Content Pipeline

The phrase “programmable content pipeline” attempt to denote a system that is useful for any kind of static content processing:

  • Arbitrary data comes in as assets
  • Assets are processed arbitrarilly
    • transforming asset content
    • deriving new assets from existing ones
    • aggregating assets into taxonomies
    • associating assets to each other
  • Assets are targetted to disk

By narrowing Styx’s core to just this kind of system we both simplify it’s implementation but also open it up to more kinds of content processing such as audio, video, text analysis, and so on. Anything really.

Nomenclature

This document will use revised nomenclature for the parts of Styx to reflect its aims. The prominent changes are:

  • Site -> Task: The build target is now a more generalized “task”
  • Page -> Output: The artifacts of the build are now “outputs”
  • Data -> Input: The structure feeding the outputs is now the “inputs”
  • Environment -> Context The structure made available to library functions
  • Layout -> Renderer: The function that finalizes output content is now the “renderer”
  • Template -> removed: It is expected that output’s input and renderer attributes are sufficient to produce the final content
  • Theme -> Module: Themes become modules which are packages which are passed to the styx import. These module’s can contribute their own library functions, inputs and outputs

Mock tasks.nix (minimal)

The following is a minimal example showing the new pipeline of tasks.nix:

{ pkgs ? import <nixpkgs> { }, extraConf ? { } }:

let
  styx = import pkgs.styx { config = [ ./conf.nix extraConf ]; };

  inputs.name = "ldlework";

  outputs.index = {
    path = "/example.txt";
    renderer = o: "Hello ${inputs.name}!";
  };

in styx.mkTask { inherit inputs ouputs; }

Mock tasks.nix (simple)

{ pkgs ? import <nixpkgs> { }, extraConf ? { } }:

let
  styx = import pkgs.styx {
    config = [ ./conf.nix extraConf ];
    context = { inherit inputs outputs; };
    modules = [
      pkgs.styx-templates
      pkgs.styx-sass
    ];
  };

  inherit (styx.lib) templates sass;

  inputs.message = name: "Hello ${name}!";

  outputs.index = {
    path = "/index.html";
    name = "ldlework";
    content = ''<div class="message">${inputs.message name}</div>'';
    renderer = templates.render ./templates/layout.html;
  };

  outputs.css = sass.load ./sass/site.sass // {
    path = "/site.css";
  };

in styx.mkTask { inherit inputs ouputs; }

Mock tasks.nix

Given that the following example only produces a single output, =about.html,= it is true that it could be a lot more simple.

However, it has been written in a “refactored” style to demonstrate that the user is now free to create their own abstractions in order to produce their inputs and outputs however they desire.

These helper functions could be tucked away in a local utils.nix, a local Styx module, or even a published Styx module package.

We don’t have to keep putting things into Styx to support more and more. We should just make it easy to grab those things and use them.

{ pkgs ? import <nixpkgs> { }, extraConf ? { } }:

let
  styx = import pkgs.styx {config = [ ./conf.nix extraConf ];
    context = { inherit inputs outputs; };modules = [pkgs.styx-markdownpkgs.styx-theme-generic];
  };

  inherit (styx.lib) markdown templates;htmlRenderer = template: output: ⑦
    templates.layout (template output);

  pageDefaults = {renderer = htmlRenderer template.page.full;
  };

  page = attrs: pageDefaults // attrs;markdownPage = { file, ... }@args: Ⓐ
    let
      data = markdown.load ({ inherit file; });attrs = data // args;in page attrs;inputs = {
    navbar = with outputs; [ about ];};

  outputs = rec {
    about = markdownPage rec {file = ./data/sample/pages/about.md;
      title = "About";
      path = "/about.html";
      navbarTitle = title;
    };
  };

in styx.mkTask { inherit inputs ouputs; }

Description

  • 1. The pkgs.styx package must be called to be initialized. This performs configuration loading and type-checking, and initializes the library.
  • 2. The context is exposed to all library functions upon initialization.
  • 3. Module packages are passed to Styx so that their configuration can be loaded and their library initialized.
  • 4. Specific data transformation functionality is now externalized to modules. Modules can contain configuration, library functions, and their own outputs.
  • 5. Themes are now just normal modules. Modules can provide templates as library functions with a convention of using the shared styx.lib.templates namespace.
  • 6. Raising the styx.lib.markdown and styx.lib.templates namespaces for convenience.
  • 7. htmlRenderer is a helper function that takes a template function and an output attrset and applies template to output and then applies templates.layout to the result. This produces the final output content.
  • 8. pageDefaults is an attrset which sets the renderer to the htmlRenderer function so every “page-like” output works the same.
  • 9. page is a helper function that takes an attribute set attrs and merges them over the pageDefaults.
  • A. markdownPage is a helper function to create outputs based on markdown sources.
  • B. The markdown file is loaded and returned as an attrset. Internally, markdown.load uses styx.core.load to load the textfile, its metadata, and apply nix-lang interpolations.
  • C. Merge in any custom attribute overrides.
  • D. Call page attrs to produce an output with the pageDefaults. In otherwords, setting the output’s renderer.
  • E. Our templates use inputs.navbar to generate a navigation bar.
  • F. outputs.about is set to calling markdownPage to produce the final output attrset.
  • G. styx.mkTask takes in the inputs and outputs. It will flatten the outputs, call the renderer for each, and write the result to the location in the path attribute.

pkgs.styx { …} startup

When the user calls pkgs.styx a number of things happen:

  • Each module’s option-declarations are loaded and merged
  • The merged option-declarations are resolved to defaults and type-checked
  • The configuration sources are loaded and merged
  • The merged configuration is applied to the merged option-declarations and type-checked
  • Every module library is loaded with the following arguments:
    • configuration set
    • user supplied context
    • the fixed-point of the library itself, the merged result of every module library

The returned attrset

An attribute set is returned to the user containing:

  • core: styx.core
  • conf: the loaded configuration
  • lib: the merged module libraries
  • context: the library context
  • modules: the actual modules
  • mkTask: the function that will perform the build

styx.mkTask function

The styx.mkTask function is much like mkSite with some differences:

  • It takes both inputs and outputs
  • It flatten outputs for the user
  • It stores its full argument set to the returned derivation’s passthru attribute

passthru Attribute

The derivation’s passthru attribute allows us to store arbitrary Nix data on the derivation without affecting the derivation’s build environment.

This can be used for the documentation generator to get the information it needs without having to have the user return an attrset with the attributes we need.

styx.core

styx.core will contain much of what the built-in library contains today. The only difference is that it is no longer dependant on going through the startup process to import.

styx.lib

styx.lib will contain an attrset of functions, merged from all of the loaded module libraries. The libraries themselves have access to:

  • the configuration
  • the context
  • the fixed-point lib itself

To make the library itself available to module libraries on loading, we use pkgs.lib.fix in order to produce a fixed-point version the library.

Styx Modules

Styx modules are packages which can contribute to a task. Their primary advantage is that they can provide their own configuration interfaces which can change their internal behaviors.

Module can contribue the following things:

  • Option Declarations
  • Outputs which get merged with task outputs
  • Library functions

Module initialization

There is a bit of a chicken and egg problem with module initialization. While modules can provide both configuration and library functions, we need a complete configuration in order to load their library functions.

Another chicken an egg problem is that module libraries should be able to access functions in the library from other modules. But how can we already have the library on hand to pass to modules, before we’ve loaded them?

Configuration Initialization

Modules will be loaded in two phases. When calling styx.core.modules.load you will get back an attrset containing decls and partial.

The decls can be used to merge with the declarations of other modules and the user configuration to produce the complete configuration.

The complete configuration can then be used to call partial, which will load the rest of the module.

Library Initialization

Calling the module partial does not however fully load its library. Afterall we have to pass the merged library to it, in order to load it.

Therefore we load each module’s library as a function taking the following arguments:

  • context: the user context attrset
  • lib: the merged module libraries

To solve the infinite recursion we utilize nixpkgs.lib.fix in order to calculate the fixed-point of the merged library.

Styx CLI

The CLI will be updated to allow the user to build any task defined by their tasks.nix By returning an attrset of styx.mkTask calls, the user can provide named tasks. The CLI should be able to determine whether tasks.nix returns a single or multiple tasks.

In the case that the user defines multiple tasks, but no task name is provided to the CLI, the CLI should look for a task named ‘default’. It should provide an error if it can’t be found.

Local Dev, src/nixpkgs and themes/versions.nix

Currently in order to facilitate local development there is a src/nixpkgs/default.nix. There is also a themes/versions.nix which contains the github details and hashes for pinned versions of various themes that we want to build documentation for in the official docs.

Both of these mechanisms are dropped in favor of normal overlays. With an overlay like the one below at ~/.config/nixpkgs/overlays/styx.nix tools like nix-build will find the versions of the packages desire. At this point, Styx can just refer to the normal package names, and the correct local ones, or even github pinned ones, will be found and used (and documented)

self: super: {
  styx = super.callPackage /home/ldlework/src/styx/styx {};
  styx-theme-generic = super.callPackage /home/ldlework/src/styx/themes/generic-templates {};
  styx-theme-agency = super.callPackage /home/ldlework/src/styx/themes/agency {};
  styx-theme-showcase = super.callPackage /home/ldlework/src/styx/themes/showcase {};
}
@ericsagnes
Copy link

I think the general idea is very good.

To avoid confusion, I will use styx for the current mkSite based, and stix for the mkTask based one.

A little reminder to what Styx is really:

  • styx = import pkgs.styx {} is just a function returning a (messy) attrset with everything (needed or not)
  • mkSite is just a customized runCommand
  • the cli is just a thin wrapper for nix-build and some bash/git black magic for deployment

So Styx is just a specialized mkDerivation with a set of helpers (lib, templates, themes, ...) to build websites with a bash wrapper as a cli.

The only thing required for Styx is a list of pages attrsets and a call to mkSite to turn the attrset into a site. (cf hello world)

Stix would sit between styx and mkDerivation, allow easy site generation through modules but also possibilities and flexibility that styx does not have.
The question being where to put the cursor. As I understand it, mkSite is just putting outputs in the result derivation (like showed in the example below), but I might misunderstand.

In this case, context, renderers and anything beside mkTask is optional, as they can be done in plain nix in site.nix. (pushing this logic one could say that mkTask can also be done in plain nix, so also is optional)
So my guess, is that they (anything beside mkTask) should be more a guideline on what kind of functionalities modules can provide than anything else.

Styx - the bad

  • the styx attrset is a mess
    • putting everything in a clean namespace should improve
  • bootstrap is over-complicated
    • one of reason being that we want to pass the full environment to templates
  • mkSite contains lot of hardcoded logic limiting the possibilities
    • sass, substitutions, ...

Stix - what to improve

If we want to maximize flexibility, every output item should be a StorePath that mkTask will copy or link in the result derivation.
(in a similar way to what styx mkSite does with pages)
That would allow, as you mentioned, to generate any kind of content.

For that, an output should provide 2 informations:

  • source: the storePath to use
  • path: the path to copy or link to the source to

These 2 information should be generated by a renderer in the pipeline, an html renderer would use pkgs.writeText.

NOTE: every styx page is a storePath created with pkgs.writeText and linked to the pageAttr path in the result package (source).

So mkTask could be something as simple as:

{ pkgs ? import <nixpkgs> {} }:

let
  mkTask = { outputs }: 
    let
      name = "styx-product";
      env = {
        meta = { platforms = pkgs.lib.platforms.all; };
        preferLocalBuild = true;
        allowSubstitutes = false;
      };
    in
    pkgs.runCommand name env ''
      mkdir $out
      ${pkgs.lib.concatMapStringsSep "\n" (output: ''
        ln -s "${output.source}" "$out/${output.path}"
      '') outputs}
    '';
in {
  result = mkTask {
    outputs = [{
      source = pkgs.writeText "foo" "bar";
      path = "foo.txt";
    }];
  };
}

NOTE: can be run with nix-build --no-out-link -A result.
NOTE: for simplicity, outputs is a list, but it could be an AttrSet or whatever fits best.

But it really depends on the desired result of mkTask.
So I think that this should be the first thing to fix in the spec.

Following the renderer pipeline idea, an example pipeline would be:

  1. load some file or use some data structure, for example markdown.load
  2. apply a set of transformations, for example htmlRenderer
  3. return an OutputAttrset, for example markdownPage

The context, renderers, modules and others will just be syntactic sugar making it easier to transform some data to a set of outputs for dedicated purposes, eg: a website in styx case.
inputs would be in-pipeline contents "pinned" to allow references in context so they can be used in renderers (eg: internal site links).

Possible issues would be:

  • renderer pipeline type errors: there is no way to ensure that the output of one renderer match the input of the next one in the pipeline.
    • that could be solved by having modules providing ready to use pipelines
  • higher entry barrier: flexibility and no convention could make site.nix much more hard to understand than what it already is
    • that could also be solved by having modules providing syntactic sugar & set of conventions and a nice documentation

Regarding the spec

It is great and well written, good work.
I agree with almost everything.

Using overlays should be a perfect fit for themes (styx was made before overlays were introduced).

bootstrap is really an area that should be improved, and the module initialization proposition should really help that.
Maybe the process could be simplified if we limit the information shared?

  • Does lib really need access to config or context?
    • it might have been the case in styx, but maybe for wrong reasons (I honestly don't remember)

Styx cli should be in something else than bash for better maintainance, or just be a thin wrapper to the nix commands.
The deploy site logic is close to very dirty.
Just a crazy idea, but it would be nice if modules could extend the cli functionalities, so a module could provide github-pages deployment functionality and so on.

@dustinlacewell
Copy link
Author

Hey, thanks for reading through and responding!

I think the biggest idea in your reply is to have each individual output specified as a store path. It is a very interesting idea that I considered at the very start but I wasn't sure about somethings:

  • Is it actually scalable within Nix? I can imagine a site potentially having many many many javascript files. There's no rule on the web that you have to concatenate and minimize your static assets, even though it is a good idea.
  • While it does simplify mkTask somewhat, is there really an advantage to shifting the burden onto the user to produce "already-in-store" paths? What is the difference between that, and having the outputs simply contain the text content. We can pretty easily write that content to file on behalf of the user, and this simplifies things on their end. They are just assigning strings, and working with functions that return strings, etc.

I feel like if we are going to generalize styx and reduce it's own API and functionality, then, what we do keep as part of the "very core experience" should at least do what we can on the user's behalf. That might mean taking care of writing their output data to disk for them.

To anticipate one of your answers, I suppose the way you have construed mkTask would allow not only for outputting simple text files but also:

  • files of any type, including binary files
  • whole directories (as it seems your implementation just uses ln -s which should also work on whole directories

Those are actually some nice advantages. So I guess I would say then, if we did go that way, then it would mean there are really two kinds of functions. Functions which operate on/produce output attrsets, and functions which are doing data-munging and producing store-paths to use as output sources. I can anticipate the future desire when designing module libraries to want to create functions that the user points at some, say, markdown file, and produces a totally complete output attrset with source and metadata set, but also other times when you'll want to just produce the store-path so that you can further process it with some other module. Well you might say, those second-kind of functions, that do the data munging, they wont produce store paths, they'll produce the data, as strings and other nix data types. Now you can easily pass that data to further data-munging functions in the pipeline. I hear that, but now we'll need a step at the end to put the final content into a store path, so that it can be set on the output.source attr. So now I'm back at, wouldn't it be nice if the user could just put the data on the output, and mkTask does the store-juggling internally. But then we lose the ability to just blindly ln -s the output.source, allowing for binary outputs and whole directories.

I'm sorry for the long paragraph. One intermediate I can see is that we expand the option attrset API. We include different attributes that are treated differently by mkSite. Or make mkSite polymorphic, and have it do type checks on the source attribute or something. The right answer isn't very clear to me anymore, there seem to be trade offs here.

Concerning Bootstrap

The existing bootstrap implementation did take some serious study to understand. But honestly, I don't think it's your fault, I just think it is the crazy lazy-pure nature of Nix which makes some things more brain-melting than they otherwise would be.

That said I think my study has paid off. I have a pretty good idea now of what needs what and at what time. I think that we can get away with keeping some of the attributes that we're passing to the modules. Let me try to explain.

Does lib really need access to config or context?

This question actually needs some clarification. There are really two things to talk about when we mention styx.lib. There is the core lib, which contains only the code from src/lib/ in Styx. Let's call that the corelib. Then there are the individual module libraries. Let's call those the modlibs. Then there is the merged result. Let's call this the lib. So now we actually have 4 questions:

  • does corelib need access to config?
  • does corelib need access to context?
  • do the modlibs need access to config?
  • do the modlibs need access to context?

The answer to these questions basically comes down to what roles are config and context serving? Here are my answers:

  • config are the static configuration variants
  • context are the per-task runtime configuration variants

When the user writes their conf.nix, it doesn't change. It just is. However, the context is computed at runtime. This opens the door for logic to depend on either of these an change its behavior based on them. The questions become "does corelib or the modlibs need to vary their behavior based on static or runtime configuration?

The answer for the corelib used to be "yes" for static configuration, as it varied the behavior of markupToHtml and other functions based on that configuration. This is why the corelib is so tricky to import, because it is really acting as a modlib. You had to construct some fakes in order to pass into the corelib before you could load it. By moving all markup and other non-core functionality to proper modules, we change the answer for corelib to "no" for both static and runtime configuration. Now, corelib becomes much easier to import during bootstrap. Yay.

However the modlibs are a different story. Modlibs definitely need access to conf as it will not only contain everyone else's configuration, but that module's configuration. So modlibs at least need access to static configuration. When it comes to context, I don't currently see any difficulty passing the context to the modlibs on import. The context is passed to the bootstrap, and so there is no logical dependency problems there. We just pass it along.

The biggest complication when it comes to bootstrapping, is bootstrapping the modules as whole. By the end of bootstrapping, we need to have each module initialized such that we have it's configuration, library, examples, metadata, etc all loaded. We then need to merge all the modlibs with each other and the corelib to produce the final bootstrap result. The problem as described in the spec is that the module libraries need both the complete configuration and the complete lib as an input.

Thankfully, I've kind of already worked that out and built a proof of concept locally. I described the strategy in the spec above, but it essentially comes down to loading the modules in phases. The very first thing we require in bootstrap is a complete configuration. So we load all of the modules, but we only get back their configuration + a partial. Once we've gone through the configuration loading process, and have a complete configuration, we can then pass the config into the partials to load the next stage of the modules. This includes everything but their libraries. We then use nixpkgs.lib.fix to load the module libraries and merge them in one step, producing not just the final merged lib but the bootstrap result as a whole.

Concerning Type Errors

At one level, this kind of comes with the territory of a dynamically typed language like Nix. We could try to go way out of our way to hold the user's hand here, but it seems to me like great documentation will really make up for this.

If we really wanted to tackle this problem somehow, then we may want to keep a real distinction between module library functions and module template functions, the former of which can take and return anything, the latter of which returns strings.

Instead of retaining that as a real division in the modules, I've instead sort of tried to imply that modules are free to use the lib.templates namespace to store such functions (and override each other thanks for modlib merging).

Concerning what's passed to mkTask

You got me thinking about this. Why do we pass inputs to mkTask? Just for documentation purposes? One thing I realized is that if we are just interested in showing the "data" that went into building the site, then theoretically we could just utilize what's in the user's context. The user basically already passes inputs and outputs to the context, and the context is available to the bootstrap, so the bootstrap is able to make it available to mkTask.

The only problem I see with this, is that the context will contain the outputs, but we don't really care about the outputs when showing the site data, as those are already shown as the site outputs. We could filter the attr out at mkTask time, but the name "outputs" is only a convention on the user side. It's only API for mkTask. Hmm.

Summary

I think the biggest questions we're left with are:

  • What is the API of pkgs.styx { ... } ?
  • What is the data returned by pkgs.styx { ... } ?
  • What is the format of output attrsets?
  • What is the API of mkTask ?

We should be confident that we don't need exact answers as we are only aiming at 0.8

@ericsagnes
Copy link

Well, it hasn't be to be a store path, it really depends of the mkTask implementation.
For example this implementation link storepaths and create file if source is a text (the polymorphic approach):

{ pkgs ? import <nixpkgs> {} }:

let
  mkTask = { outputs }: 
    let
      name = "styx-product";
      env = {
        meta = { platforms = pkgs.lib.platforms.all; };
        preferLocalBuild = true;
        allowSubstitutes = false;
      };
    in
    pkgs.runCommand name env ''
      mkdir $out
      ${pkgs.lib.concatMapStringsSep "\n" (output: 
        if builtins.isString output.source then ''
          echo "${output.source}" > "$out/${output.path}"
        '' else ''
          ln -s "${output.source}" "$out/${output.path}"
        ''
      ) outputs}
    '';
in {
  result = mkTask {
    outputs = [{
      source = pkgs.writeText "foo" "foo";
      path = "foo.txt";
    } {
      source = "bar";
      path = "bar.txt";
    }];
  };
}

That is a reason why I suggested to fix mkTask API first, as it will determine many other things.

Quite unrelated, but I had a few other nix experiments that seem a good match for stix:

  • mkPresentation: generating html slides from mardown files
    • this is more or less just site generation
  • machine learning: training a machine learning model and run it
    • learning configuration (epochs and backend) would be in a conf module, and there would be a task for the model and another for the frontend

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment