rplacd/gist:0e41a887aa85b414353a

## gistfile1.txt
By “world generation” we mean natural landscapes - not because we’re not interested in urban landscapes, but because we have a larger margin of error with the former. It’s not the “cities aren’t actually that interesting” effect - isn’t raw nature just as repeating? It’s the problem of simulating, in a way that isn’t trivial, all the possible ways humanity can both inhabit an environment, and shape it to its structures - determinism in both directions.

A high-level view. We’ll use Dwarf Fortress, Minecraft, Sir, You are Being Hunted, and The Witcher 3 (that is, the REDengine 3) as our prototypical examples. Be familiar with the first two. (TW3 isn’t meaningfully procedural - but substantially procedural enough to work on, and high-detail.)

We also refer to Morrowind as a source world we intend to model.

We begin with concrete analyses, work out the design space, and then try to locate ourselves within it - although you can read it backwards if you want to see choices before justifications.

A concrete example: Minecraft - low-detail, online, global
----------------------------------------------------------

Some of this is from [here][], and some of this is from the Minecraft Coder Pack patched decompilation of version 9.1.8; we choose to describe only Overworld generation, which begins at ChunkProviderGenerate.provideChunk(int x, int z). The version quoted here is heavily edited: generation of the skylight map, strongholds, temples, and monuments are removed. Understand, from bottom to top,  BiomeGenBase, ChunkProviderGenerate.provideChunk.

[here]: https://www.evernote.com/shard/s9/sh/2170e56b-2858-45f7-8fd7-591ac54d780e/610a77215134bc87e101a4bb9a66dae4

1. A blank “chunk primer” (ChunkPrimer) is created. Read “chunk primer” as “chunk” if you so wish.

It supports a subset of all chunk operations, including setting and getting blocks.

1. A list of biomes for the chunk is generated.

ChunkProviderGenerate.setBlocksInChunks() -> WorldChunkManager.getBiomesForGeneration() -> BiomeGenBase.getBiomesForBiomeList()

Given a list of biomes (in BiomeGenBase) Biome(name, temperature ratio, rainfall ratio, minimum height available, maximum height available):

1. “Feature generators” then modify the world. (MapGen*.generate(…, x, z, chunkPrimer))

For every block in the chunk:

    1. If X_n are nextLongs(), set the seed to (x*X_N) ^ (y*Y_N) ^ the world seed);
    2. Call an inner function MapGen*.recursiveGenerate(…, block x, block y, chunk origin x, chunk origin y, …):


```
  public Chunk provideChunk(int x, int z)
    {
        this.rand.setSeed((long)x * 341873128712L + (long)z * 132897987541L);
        ChunkPrimer chunkprimer = new ChunkPrimer();
        this.setBlocksInChunk(x, z, chunkprimer);
        this.biomesForGeneration = this.worldObj.getWorldChunkManager().loadBlockGeneratorData(this.biomesForGeneration, x * 16, z * 16, 16, 16);
        this.replaceBlocksForBiome(x, z, chunkprimer, this.biomesForGeneration);

        if (this.settings.useCaves)
        {
            this.caveGenerator.generate(this, this.worldObj, x, z, chunkprimer);
        }

        if (this.settings.useRavines)
        {
            this.ravineGenerator.generate(this, this.worldObj, x, z, chunkprimer);
        }

        if (this.settings.useMineShafts && this.mapFeaturesEnabled)
        {
            this.mineshaftGenerator.generate(this, this.worldObj, x, z, chunkprimer);
        }

        if (this.settings.useVillages && this.mapFeaturesEnabled)
        {
            this.villageGenerator.generate(this, this.worldObj, x, z, chunkprimer);
        }

	// ...items removed...

        Chunk chunk = new Chunk(this.worldObj, chunkprimer, x, z);
        byte[] abyte = chunk.getBiomeArray();

        for (int i = 0; i < abyte.length; ++i)
        {
            abyte[i] = (byte)this.biomesForGeneration[i].biomeID;
        }

	// ...items removed...

        return chunk;
    }
```

A concrete example: Sir, You Are Being Hunted - high-detail, offline, local
—--------------------------—--------------------------—--------------------

The detail is here from a [GDC presentation][].

Several constraints - technical ones, design ones - limit the generality of the system:

- World generation must happen in a few minutes.

Most of the generation is based on distributions, and as local as possible; very little simulation-with-emergent features is present.

- The world is a small island.

This makes several other limits plausible: how much region-type variation we can get; what landscape features we have.

- Some regions must be built environments

It was then chosen to model built-up areas locally, within defined regions; the one exception to this are roads. This then gives the next constraint:

- World generation must partition the world.

It is expediently assumed that the world should be partitioned into convex shapes - no U-shaped hills surrounding villages, for one. World partitioning is used to model other features:

- Roads and canals are contained within partitions.

Canals don’t demonstrate erosion.

- Villages *can’t* be split up by roads.

This means that roads are sometimes separate regions, and sometimes simply detail running through other regions.

- Regions are contained within partitions.

- Roads and canals are then regions.

This leads them to the following process.

1. Initial world graph creation.

The world graph defines the partitioning of the world into regions; this partitioning is convex. Future steps can double-back on the world graph and modify it.

This graph is then labeled with region data; the distribution of regions is controlled by an island “theme”.

1. Road and canal insertion into the world graph.

Recall that roads are sometimes regions of their own, and sometimes detail in region.

Some road and canal waypoints are known (village centres, for example) - but not the specific roads that pass through them themselves. A separate road and canal graph is then created.

“Rasterising” this road graph into partitions in space is done by destructively laying out roads , and only then plausibly filling in the gaps around them.

1. Coastline and region boundary generation.

Doing regions before coastlines might seem to reverse things: why not generate the coastline and then fill it in, instead of having to throw away regions after the fact? This is essentially a coin toss - they’ve weighed slightly in favor of this one because they’ve found more reasons for regions to determine coastlines than for coastlines to determine regions.

Coasts are created with a sine function wrapped into a circle, and perturbed. This process favors convex islands, of course.

1. Height map generation.

Noise at several frequencies is used: one to give the whole island topography; one to give regions topography.

One particular case where a region must use noise to distinguish itself is a “rocky cliff edge” biome (as in “cliffs of Dover”) - it needs cliff edges, for example.

1. Splat map generation.

That is, ground texturing - we model features like geographical stratification, slope colouring, trails and natural drains, canal and road texturing, and region-local detail.

1. Detail object generation.

Per-region.

1. Prop generation.

Per-region, as per the trend. Perhaps the least generalisable part of the process: villages need roads; castles don’t.

Several patterns of construction have shown to be helpful -  lines of props, as in walls and hedges; filled areas of props, as in graveyards and clusters of trees.


A concrete example: The Witcher 3 - high-detail, offline, global
-----------------------------------------------------------------

Most of the detail here is from [Marcin Gollent][]’s GDC presentation.

[Marcin Gollent]: http://gdcvault.com/play/1020197/Landscape-Creation-and-Rendering-in

1. Offline: height map generation.

World Machine is used; the resulting detail is at most 0.5 metres fine.

1. Offline: material painting.

Manual, up to allowing the slope at a particular point to determine whether an “overlay” texture dominates a “background” texture. (From [Marcin Gollent][]: “Terrain shape generated in World Machine and imported”.

That being said, the terrain generators out there - Vue, Terragen, World Machine - allow metadata to flow from terrain generation stages to texture generation stages. But this isn’t meaningfully procedural, either!

1. Offline: vegetation - grass optional - generation.

(For our purposes: this can be generalised, but think twice before tightly coupling this stage with terrain generation - the two variables of “water” and “sunlight” might already be enough!)

Procedural - up to a terrain height map and manually created vegetation types.

- Determine the distribution of “water” using some annealing process: begin with an initial distribution, and let it settle to some position of lowest potential energy. A less loaded way to think about this is to call it “growth factor”: trees accumulate in pits, and are entirely missed out on slopes. ([Marcin Gollent][] uses “water accumulation” first, and then continues by calling this “resource” - with the quotes.)

- Determine the distribution of “sunlight” over a single cell.

- Using terrain, “water”, “sunlight” information for a cell, plant vegetation of specific varieties, further parametrised.

Arbitrary model placement might not be needed to specify the output of this process.

1. Online: grass and grasslikes generation

Because we don’t want to deal with individual instances of grass - neither when generating vegetation, neither on disk.

Grass “brushes” (that include both model and distribution data) are assigned to terrain materials; grass “efficiently” appears on these materials in runtime. Details I assume we can keep out for later; some caching is involved as well.

1. Online: rendering!

Not strictly world generation, but a few wodges that are consequences of the world generation pipeline.

Height map detail is tessellated up.

Grass and grasslike models are tinted by a low-resolution “pigment map” taken from the world. This gives us shading (in the graphics programming sense) that the existing model would have otherwise left out: that from the transparency of grass (here, grass is diffusely shaded), and that from diffuse interreflection from grass (the GI solution is likely too coarse-grained to give us colour bleeding from the ground.)

World generation stages - the design space, examples
----------------------------------------------------

Why talk about stages, rather than world generation algorithms? We want to generalise: we should be able to describe pipelines where, for example, some data is generated offline, and the rest is “upscaled” or “interpolated” later - whether by some local, embarrassingly parallel process (like voxel triangulation) or some process that we might cache (like tree generation).

- What is the smallest level of detail we’re working with?

It’s clear that we can work coarsely at an earlier stage, and slowly fill in details.

Dwarf Fortress has a cell the size of a dwarf as its smallest unit: trees and veins of ore are low-detail. Giving them plausible detail is a substantial task of its own. Minecraft is similar.

TW3. Here’s why we talk about stages: offline, several coarse-density maps are used - half-metre, at most. Grass and things of the same height are filled in at runtime with models: now we have polygon-level fidelity.

- What data structure are features recorded onto?

Do we really need something trivial to render - a voxelised world to triangulate - or can we simply output “vector cartography”, “declarative cartography”, that we can process through an “elaborator” to fill in extraneous details?

Is this structure 2D - which makes simulation tractable, but can’t represent ravines or caves - or 3D?

Dwarf Fortress and Minecraft: one sole output - and that’s the voxelised world.

The Witcher 3 is particularly interesting: a height map is generated in World Machine. Successive passes record detail on other maps; but for model detail we use a conventional scene graph (the least we can get away with is some means of planting vegetation on the ground - a two-dimensional quadtree would do.)

- What features are generated by noise? Plausible simulation? Physical simulation? User input?

What method passes use reflects the number of inputs that flow in: none of the systems above generate the contours of landmasses via pure simulation - simulation of plate tectonics isn’t quite a plug-in solution just yet.

That being said, we might be able to shift the weights between the three somewhat: this time, in favor of user input.

There are more possibilities for “user input” than the usual single-value switches-and-sliders settings panels that we get in Minecraft and DF. We might be more data-driven.

Think of everything that’s a first-class value in languages: all of which can be potentially data. We might want high-dimensioned forms of input (probability distributions of heights); we might even want first-class functions - train a neural network, for example. This gets us close to the edge of input that we call “static”.

This is a way out of the usual dichotomy of noise functions and simulations (and blends of the two): the first is a surface-level empiricism; the second is a deep empiricism. Being somewhere in the middle lets us work at the level we want.

We might be able to “razor” out a great deal of simulation we couldn’t have effectively modeled otherwise - at least, in cases where machine learning is feasible, and input data “more than underdetermines” output data. (There are probably formalisations of “more than underdetermines” out there.)

What if we trained a neural network to decide, given an initial point in the “space” of biomes, what biomes accompany it? It might even constrain the flora of the region.

What if we trained a neural network as an (incremental) cartographer of the imaginary, extending a mountain range, a road, or a river at each step?

What methods within machine learning *proper* don’t seem (“seem”, again, lacking formality) to do is produce a great deal of plausibly data at once from single parameters - but machine learning is built on top of distributions, after all, and we might be able to learn those. More concretely: we can fill in gaps in a field (of biomes, of temperatures), but we can’t generate the entire field.

- How much local (“vista-scale”) variety exists? How is this local variety represented?

Plausible continuous change is meaningfully natural, but if we want to meaningfully inhabit our environments, we need features - clusterings or discontinuities - around which to form senses of “place” and narratives. These are preferably multiple subsystems coinciding together: a Simba rock jutting over savanna, an island set on a lake in a grand bowl valley, an old tree on top a hill from which one can look onto rolling fields of wheat.

TW3, of course, is a prime example of letting writers and artists determine the landmarks in a world.

- How much global (“continent-scale”) variety exists? How is this global variety represented?

Try a “biome”. What experience do we want from biomes? Is it solely just for the sake of being able to march for a simulated day into somewhere sufficiently different?

Do biomes necessarily have to imply “patchwork worlds” with well-defined borders, or can we model continuous variation within the world even while structuring the world into biomes?

DF and MC have biomes; but MC’s biomes aren’t globally distributed: biome selection is dependent on height and the biomes adjacent. (Details from [here].) DF’s biome system is relatively expressive.

TW3 - and pipelines in general that combine different tools with different responsibilities - have no single way to partition up the world into biomes. That’s not to say that proceduralism’s been fully replaced with ad-hoc decisions by artists: artist input is required during terrain generation, but sane defaults are available during vegetation generation. These defaults, though, may be controlled by some global context, which we might call a “biome”…

- Local simulation, or whole-world simulation?

Dwarf Fortress is general and whole-world. From the [DFWiki][], after “Erosion Cycle Count”: rivers are “run” from riverheads all over the world *with the entire world *, and contribute to erosion throughout.

[DFWiki]: http://dwarffortresswiki.org/index.php/DF2014:Advanced_world_generation

TW3 is similar.

Minecraft is an interesting counterexample. From the [Minecraft Wiki][] how are rivers run through the world? They are considered and treated the same as as other “generated structures” - including self-contained items like trees, and “continuously flowing” items like mountains and dungeons.

[Minecraft Wiki]: http://minecraft.gamepedia.com/Generated_structures#Technical_details

Desiderata, and a few decisions
-------------------------------

This is here to motivate the analysis below - it’s the real object, but it talks about generalisations that you might not have concrete examples for.

These are choices that are effectively arbitrary, are ridiculously, and full of compromise.

- We want high-detail worlds.

This means the 90%-10% rule comes into effect: how do we generate convincing looking trees? Ore veins? Muddy river banks?

These decisions might be profitably ignored at initial stages, and we might work at a higher level of fidelity there.

- We want to be able to model worlds with global geographic variety - we don’t want a generate-a-Europe algorithm.

What structure describes at once Vvardenfell, Iceland, Stros M’Kai, and the Amazon?

Hands-off simulation requires we do the work of representing biomes in advance. At the very least they feed parameters into specific passes. What we want now is a way of correlating or constraining those parameters.

- We’ll use Vvardenfell - and the expansion pack world of Tamriel Rebuilt - as an exemplar world to model.

By “Vvardenfell” I mean a heavily modded Morrowind, rather than the Skywind project: Skywind isn’t near release; it’s also considerably harder to work with TES5 worlds.

This gives us quite a few things that make the world much more easily realisable in one stroke:

    - A world in a world description format we can work with: the OpenMW project has world file format readers. We might reinvent a few GIS tools to process it.

    - High-quality assets to describe that world. The MGSO (and Tamriel Rebuilt) gives us textures and models that fit a polygon budget a generation or two old.

    - A world created without being structured in “biome” or “region” structures. We might use models of continuous varying environments; we might choose to “cluster” world regions into biomes - but we’ll be disappointed if the world generation process gives us an obvious patchwork.

    - Most importantly: an exemplar world satisfying the constraint above. And - if you’re unfamiliar with the world of Morrowind - perhaps the ability to test the “worldness” of a generated world, without reference to the world’s specifics.

This isn’t the ideal solution, of course:

    - There is no sufficiently global range of regional variation to model. This is in conflict with the desideratum above.

    - It’s not clear we can construct a model of the “hidden variables” we can simulate - humidities, temperatures, geographical strata. This is also in conflict with one of the desiderata we have here.

    - The world’s height map resolution is low. Finer geographic details are modeled with ad-hoc solutions - models - that we can’t crunch data on.

    - When the height map is replaced by models, the height map is meaningless.

    - Concave geographical detail - caves, ravines - are determined by gameplay, and also modeled in an ad-hoc manner.

    - Much of the gross geographic detail is a product of Vvardenfell’s mythic history, rather than geological history. To be concrete: Red Mountain is Vvardenfell; and Red Mountain also means the Ashlands and the Ascadian Isles that were formed with Red Mountain as the heart of Lorkhan - et cetera, et cetera - and to this day we have a sense of “islander exclusion” that the Vvardenfell worldview must constantly negotiate with - et cetera, et cetera.

- The height map - fortunately, unfortunately - is the main data structure describing terrain.

On one hand, we lose the ability to easily describe ravines and caves; on the other hand, we can state a great deal of simulation in terms of placing what things at which places on the height map.

- We want a pipeline with some passes that are global.

Any feature that must be carved out of the earth - erosion, rivers, caves - most likely have no local equivalent: they require the world to interact with itself. Vegetation, on the other hand, is embarrassingly global.

- We prefer, in order, emergent simulation, user input, noise.

We value user input over noise because of possibilities can be encoded in user input - see above: arbitrary distributions; trained neural networks, and so on.

- And what if reality’s just banal? User customisation to experiment with is key.

Give the user the ability to experiment with parameters - from which they can predict possible worlds. This allows them to experiment with possible worlds, possible rhythms of gameplay. (Some predictability - but not total predictability - is desired: we might want world generation alone to display emergent behavior.)

- Static? Fully editable? I’m agnostic.