waylan/static_site_generator_proposal.md

## static_site_generator_proposal.md

      
    Raw
  

              static_site_generator_proposal.md
            
          
    My Static Site Generator Proposal

© Waylan Limberg 2024-04-11
Over the course of six years (2015-2021) I contributed regularly to MkDocs (253 commits according to GitHub's stats). In that time I learned a lot and formed some opinions about how I would like a static site generator to work. This is a brief summary of my ideas. Some of these ideas follow MkDocs closely and others are so different that I would not expect to ever find them in a future version of MkDocs; perhaps in a competing project instead. And these ideas only exist as a vague overview; the technical aspects of them have not been fleshed out. For example, my ideas make multiple references to a configuration; however, I have no idea what format that configuration should take and how the various options should be defined.
For the most part, I was happy with how we got MkDocs' page collection process working (stepping through a file structure collecting text documents). There might be a few subtle changes I would make, but for this discussion, assume a vary simpler approach. It's what is contained in those collected pages and how that content is processed that is different.
First, I would abandon the use of file extensions to indicate the markup language used (.md, etc). As we will get to, this is paramount to how the entire page processing system works. That said, the system needs to differentiate between page files and other content such as media, styles, and scripts. Therefore, all pages would be required to use a specific unique file extension. Perhaps the ubiquitous .txt or something specific like .page or maybe even .[name of tool]. Each page would then be required to contain meta-data or front-matter, which would define the format and processing instructions for that specific page. Of course, it could get repetitive to redefine the same set of processing instructions for each page, so a default set of processing instructions could be defined in a global configuration file, and the default set could be referenced and modified for each individual page. Alternatively, a individual page could have its own unique set of instructions specific to that page only.
The global configuration would consist of at least three parts: (1) the settings for defining the page collection (perhaps a local path or similar); (2) the definitions of the various different processing instructions available for use; and (3) the default set of processing instructions. For the purposes of this discussion, we can assume the first part would work very similar to how MkDocs does today. The rest, however, would be very different.
Each processing instruction would consist of some code, or a pointer to some code, which accepts page content and meta-data and returns page content and meta-data. Each processing instruction would be expected to accomplish a single task as it modifies the page and/or meta-data. A processing instruction might be a simple as altering a page's title in some way. Or it could pass the content through a markup parser and return HTML. Why, a page could contain no content at all, only meta-data which includes configuration options for a specific processing instruction which would generate markup. Or a processing instruction could take the generated HTML of a previous processing instruction and modify it in some way.
Presumably, the library would contain a few common processing instructions, such as a Markdown-to-HTML processor or a processor which passes the generated content through a theme template system. This would also make it easy for a user to specify use of their preferred Markdown parser, rather than locking everyone into a single implementation. If a user wanted to mix multiple markup languages, they could, by simply specifying a different processor for the language used on that page.
There would not need to be any formal plugin API as any callable which accepts the required parameters and returns usable data would work. Within the configuration, the user could assign a unique name to each processor. I am undecided about whether it makes sense for this configuration to be a code file (Python file), or a text based configuration file (yaml, toml, etc) which points to code using a common syntax. Processors could be distributed as third-party libraries, maintained within the file system of the documentation, or anywhere else that could be pointed to from the configuration.
The final piece of the puzzle would be processing instruction sets. Specifically, users would be able to define a default set of processors, although there is no reason why they should be limited to only one set. The global configuration could contain multiple sets and each page would then reference the specific set(s) that should be run against that page. If nothing is specified by a page, then the default set would be used.
A processing instruction set would consist of a list of processors by name. The processors would then be run in the order lists passing the output of one at input to the next. Each processor in the set could optionally have a nested collection of default configuration options would be passed to the processor. Each set would be given a unique name, which a page could reference to get all of the processors within that set. Of course, individual pages could define various specified meta-data values which could override the defaults on that page. The processor would just need to define its own meta-data keys and check them before running.
That's it. The entire library would consist of some page gathering code, and the code to call and manage processors and processor sets. Except for a few basic processors included in the base library, most page handling code would be user or third-party provided.
There are a few challenges with this proposal. For one, it is reasonable to expect users to want the freedom to use different methods to build a collection of pages. This had not been not addressed. Presumably, something like MkDocs existing Plugin API could work for that, despite my previously suggesting that no such API is needed. And, by having theming be one of the processors, we could end up with multiple competing theme engines, which would be confusing for new users. It might make sense for the processors to stop short of theming and then use something like MkDocs existing theming mechanism. These issues need to be explored further.
Postscript

While the text of this document is my work alone, the ideas expressed above are in the public domain and may be borrowed, modified, and otherwise used as anyone sees fit. Even if others were to build a very similar system, it is likely that many of the specific implementation details would be different than how I would do it. However, the basic concept is so powerful that I could easily adopt someone else's implementation and build by own set of processors to work how I wanted. And that is why I like this idea.