Skip to content

Instantly share code, notes, and snippets.

@roberth
Last active December 29, 2023 07:47
Show Gist options
  • Star 14 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save roberth/940dff88ca5f5f95949dc309dbe60a65 to your computer and use it in GitHub Desktop.
Save roberth/940dff88ca5f5f95949dc309dbe60a65 to your computer and use it in GitHub Desktop.
Simple and quick module system alternative + thoughts and tasks
/*
minimod: A stripped down module system
TODO Comparison:
- [ ] Come up with a benchmark "logic" using plain old functions and let bindings
- [ ] Write the benchmark for the module system
- [ ] Write the benchmark for POP?
- [ ] Qualitative comparison of extensibility in the context of composable
Nixpkgs packaging logic
TODO Fine-tuning:
- [ ] Try option-driven module merging
- [ ] Try the WIP features listed below
TODO Validation:
- [ ] Write actual packaging logic with this. The examples at the bottom
aren't quite realistic yet.
The decomposition into modules might already suit RFC 92 by decoupling
"package" and "derivation".
Why strip down the module system?
A module system for packaging would be awesome, but the current module system
is too slow. This is ok for configuration management, but hurts packaging
performance too much.
Why is the module system too slow?
- It has many features that we don't really need for packaging. This results
in a system that is _slightly_ too strict (not quite lazy enough) and has
a high constant factor overhead.
- Specifically in NixOS, it imports too many modules. See RFC 22.
A packaging solution based on the existing module system could be designed
to rely on "users" `imports`-ing everything they need, a la RFC 22.
If you want to go fast, you can't bring the kitchen sink.
Which module system features are included?
- mkForce / mkDefault / mkOverride
- types (incompatible for performance reasons)
- value merging
- type merging
- freeform types (module with `_wildcard` field)
Which module system features are intentionally omitted by minimod?
- import resolution (instead, provide a flat list of modules)
- disabledModules (instead, only import what you need; no global module list)
- specialArgs (without import resolution, args don't ever have to be special)
- syntax sugar including
- custom module arguments (no functionArgs quirks, yay!)
- _module.args (instead, use self)
- specialArgs (instead, use the lexical scope)
- shorthand module definitions (instead, value-only modules are the default.
Only module _types_ will be able to carry both "options" and "config" (WIP))
- mkIf (instead, use empty value, e.g. optionalAttrs)
- mkBefore (instead, use attrset if order is important, DAG type?)
- option trees (instead, nest modules)
- checks (instead, rely on testing, which is acceptable because packaging is less end-user than configuration)
- undeclared config value check
- option apply function (instead, add a new option to provide the computed value)
- all options have a value (minimod is config-driven instead of option-driven)
Why did you remove all the good parts?
Well, it's a simplification that tries to only
sacrifice as little as possible while keeping
the useful composition properties of the module
system.
Programming _and maintenance_ should feel the same,
except for the lack of bells and whistles.
I believe some features can be re-added with care. ->
Which module system features are WIP?
- combined options+config (allow module to carry a list of values to always mix in)
- extendModules (to allow exposing an overriding method not unlike `overrideAttrs`,
which can also be used for debugging, exposing internal attrs)
debugModule = moduleArgs: { package.debug.moduleArgs = moduleArgs; };
- optional checking (maybe?)
- first class documentation (seems to be worth adding; does not seem too costly)
Can this object system be rebased onto POP?
Not impossible, but probably not a net benefit.
Comparing the two, they don't seem like a great match.
POP has overlay-style overriding, whereas the module systems use priorities (`mkDefault`)
These are rather distinct solutions to the same problem that aren't really reconcilable.
Allowing both adds both cognitive and machine overhead.
Overriding is inherently about change; not very declarative. A priority system gets out of the way until you use it, whereas `super` is always present.
Can this object system be merged with the existing one?
We can have "submodule" adapters between the two. Maybe the `types` can be merged,
because having two distinct `types` isn't great.
*/
let
# nixpkgs lib
lib = import ./lib;
inherit (lib.modules) defaultPriority;
uniqueMerge = vs:
if builtins.length vs == 1
then builtins.head vs
else
throw "Only a single definition is allowed";
ignorePrio = v: if v._type or null == "override" then v.content else v;
resolvePrio = vs:
if builtins.length vs == 1
then
map ignorePrio vs
else
let
min =
lib.lists.foldl'
(min: v: if v._type or null == "override" then v.priority else defaultPriority) 1000000
vs;
in
if min == defaultPriority then
lib.filter (v: v._type or null != "override" || v.priority == defaultPriority) vs
else
lib.filter (v: v._type or null == "override" && v.priority == min) vs;
types = {
attrs = t: {
name = "attrs";
params = { inherit t; };
merge =
lib.zipAttrsWith
(name: values:
if builtins.length values == 1
then ignorePrio (builtins.head values)
else t.merge (resolvePrio values)
);
typeMerge = tys:
types.attrs (t.typeMerge (map (ty: ty.params.t) tys));
};
list = t: {
name = "list";
params = { inherit t; };
merge = lib.concatLists;
typeMerge = tys:
types.list (t.typeMerge (map (ty: ty.params.t) tys));
};
module = fields: {
name = "module";
params = { inherit fields; };
merge = rawModuleValues:
let
args = { inherit self fields; };
self =
lib.zipAttrsWith
(name: fieldValues:
let fvs = resolvePrio fieldValues;
in
if builtins.length fvs == 1
then builtins.head fvs
else
builtins.addErrorContext "in field ${name}" (
(
fields.${name}.merge or
fields._wildcard.merge or
(throw "Do not know how to merge field ${name}. Perhaps you forgot to declare it in the module, added a value to the wrong module, or mistyped the name ${name}.")
)
fvs
)
)
(map (v: lib.toFunction v args) rawModuleValues);
in
self;
typeMerge = tys:
types.module
(lib.zipAttrsWith
(name: fieldDecls:
builtins.addErrorContext "while merging module field type for ${name}" (
if builtins.length fieldDecls == 1
then builtins.head fieldDecls
else (builtins.head fieldDecls).typeMerge fieldDecls
)
)
(map (ty: ty.params.fields) tys)
);
};
int = {
name = "int";
merge = uniqueMerge;
};
sum = {
name = "sum";
merge = lib.foldl' __add 0;
};
unique = {
name = "unique";
merge = uniqueMerge;
};
package = {
name = "package";
merge = uniqueMerge;
};
};
derivation = with types; module {
derivation = attrs unique;
};
derivationMixIn = { self, ... }:
let
run = builtins.derivationStrict self.derivation;
in
{
derivationPath = run.drvPath;
# By iterating the outputs with genAttrs, we make `attrNames derivationOutputs`
# lazy in all of `derivation.*` except `derivation.outputs`
derivationOutputs = lib.genAttrs (self.derivation.outputs or [ "out" ]) (outputName: run.${outputName});
};
package = with types; module {
package = attrs unique; # freeform type?
};
stdDerivation = with types; module {
buildInputs = list package;
nativeBuildInputs = list package;
n = sum;
meta = module {
timeout = sum;
};
};
stdDerivationMixIn = { self, ... }: {
# set derivation arguments
derivation = {
name = if self?version then self.name + "-" self.version else self.name;
builder = "bash";
args = [ "setup.sh" ];
system = "x86_64-linux";
inherit (self) buildInputs nativeBuildInputs;
};
buildInputs = [ ];
nativeBuildInputs = [ ];
package = self.derivationOutputs // {
name = self.name;
drvPath = self.derivationPath;
};
};
haskellDerivation = with types; module {
haskell = module {
buildTools = list package;
};
};
haskellMixIn = { self, ... }: {
nativeBuildInputs = self.haskell.buildTools or [ ];
};
# NB: partially applied mkPackage memoizes the final fields, so it is worthwhile
# to bind it.
mkPackage = modules:
let inherit (mergeModules modules) merge;
in
mixins: (merge mixins).package;
mergeModules = (types.module { }).typeMerge;
example = mkPackage [ derivation stdDerivation haskellDerivation ] [
haskellMixIn
stdDerivationMixIn
derivationMixIn
{
name = "mypkg";
haskell.buildTools = [ "alex" ];
}
{
buildInputs = [ "libsystemd" ];
}
({ self, ... }: {
nativeBuildInputs = self.buildInputs ++ [ "gcc ${toString self.meta.timeout}" ];
})
{
buildInputs = [ "SDL2" ];
}
{
foo = lib.mkForce "foo";
bar = "bar";
meta.timeout = 1;
two = 2;
n = 1;
}
({ self, ... }: {
n = 2;
meta.timeout = self.two;
foo = "bar";
bar = lib.mkDefault "foo";
})
]
;
in
example
Copy link

ghost commented Apr 22, 2023

https://gist.github.com/roberth/940dff88ca5f5f95949dc309dbe60a65#file-minimod-nix-L270

It appears that packages are referred to with a string rather than a variable... it's not clear that there is a safe way do things like buildPackages.python (to run a python script in a cross-capable build) like this.

The obvious solution looks like it would lead to lexical scope escape bugs ("scope extrusion"): buildPackages means different things in different scopes; replacing it with a string loses that.

But maybe I'm overly pessimistic. Also, splicing might be going away, so maybe the problem solves itself.

Copy link

ghost commented Apr 22, 2023

Copy link

ghost commented Apr 22, 2023

https://gist.github.com/roberth/940dff88ca5f5f95949dc309dbe60a65#file-minimod-nix-L106-L120

Is it possible to have a module system without these integer priorities?

Arbitrary integer priorities work for NixOS because there is one central repository of options, and it dictates what the integers mean. But nixpkgs is a lot more anarchical and decoupled... the only analogous situation we have is the names of the phases in setup.sh, and fortunately Eelco basically got those right a long time ago and we haven't needed to change them.

But now any "thing" that undergoes the merge operation can potentially need this nftables-like choice of integers. So if you have two packages that nobody has ever used before and you're the first person to try using them together, you might find that the authors of those separately-developed packages disagreed on whether "really really important and first but not too early" is 400 or 500.

I guess this, plus performance (which you are clearly focused on) are my two remaining hesitations about moduleifying-all-the-things.

@roberth
Copy link
Author

roberth commented Apr 22, 2023

It appears that packages are referred to with a string rather than a variable... it's not clear that there is a safe way do things like buildPackages.python (to run a python script in a cross-capable build) like this.

This is just for simplicity; this proof of concept was mostly meant to show that a more efficient module system can be achieved, and could therefore potentially be used at Nixpkgs scale. Those packages as strings are just example data. In practice you would reference other packages like you would with drv-parts for example; taking a full package from some package set(s).

Is it possible to have a module system without these integer priorities?

Maybe, but you'll probably want at least some concept of overriding. For example in overlays the priority is implied by //.
In the end overriding is never great, because you're coupling with an implementation whose details should be unknown. Merging is less problematic, as long as the attributes to merge are actually merge. Many are, or to a "good enough" extent. Ideally the modules are designed in such a way that very little overriding is needed.
Integer priorities are a bit more "powerful" than //-like "latest wins". You can recover //-like power by introspecting the priorities of an existing instance and incrementing them for your overrides.

So if you have two packages that nobody has ever used before

This proof of concept is meant to be about the package itself only. Module composition is not suitable for combining "separately developed packages". It is meant to combine build logic though. For instance we could have a module for checking built pkg-config files against declared metadata, or a module that sets up a database during tests. These modules should "communicate" through well designed interfaces (concretely, sets of options) that don't rely on such guesswork.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment