This document contians my (jnthn) input to where implementation work for Perl 6 module/precomp stuff should head. While much has been converged on by existing work, some problems have also been consistently avoided or punted on; robust pre-compilation management is one such issue. Everything here is subject to course corrections as implementation takes place, but it should hopefully set out a good direction to move in. Also, don't expect this to be a complete deign. It's as much as I had time to figure out before I vanish for a week on honeymoon, and I'm sharing it in its current state so that others can carry it forward. Thanks goes to lizmat++ for highly valuable input on an earlier private draft.
To keep discussion precise, here are some definitions of the atoms under consideration:
- A compilation unit, or compunit for short, is a piece of Perl 6 code that is analyzed and compiled as a single unit. Typically, this comes from a source file on disk, but what's inside an EVAL also counts as a compilation unit.
- Compilation is the process by which a compilation unit is parsed and
analyzed, and turned into a set of objects representing its declarations
("meta-objects") and executable code representing its statements. Note that,
thanks to the many meta-programming features of Perl 6, compilation will
involve the execution of code, some of which may come from the compilation
unit being compiled (as happens with
BEGIN
blocks). - A script refers to a compilation unit that is provided to Perl 6 as the
entry point for execution. In an invocation like
perl6 foo.p6
, we say thatfoo.p6
is first compiled and then executed. In Rakudo Perl 6, in this case, the results of the compilation only exist in memory. - A module refers to a compilation unit that is used by a script, or by another module used from a script. A module must also be compiled before it can be made use of. (There is nothing preventing a given source file serving as both a script and a module depending on how it is used.)
- A distribution is a set of zero or more scripts and modules that are distributed together, along with some meta-data and potentially with some resources and tests.
This "usage" of modules by scripts and other modules is complex, and is the focus of much of this document. Some further terminology to enable precise discussion of the area is useful:
- A dependency specification is how one module or script's need for
another module is declared. In source code, it follows a
use
orneed
statement. An example isSereal:auth<cpan:*>:ver(1..*)
. The same syntax is used in thedepends
section of aMETA6.json
. - The dependencies of a script or module are identified during its
compilation by evaluating the dependency specifications that follow its
use
orneed
statements. - The transitive dependencies of a script or module is the union of its
dependencies together with the transitive dependencies of each of those
dependencies (
TRAN-DEP(m) = Union(DEP(m), TRAN-DEP(d) | d in DEP(m))
). - The reverse dependencies of a module are those scripts and modules that have it as a dependency.
- The transitive reverse dependencies of a module is the union of its
reverse dependencies together with the transitive reverse dependencies of
those reverse dependencies (
TRAN-REV(m) = Union(REV(m), TRAN-REV(m) | m in REV(m))
). - The dependent distributions of a distribution are those identified by
evaluating the dependency specifications in a
META6.json
. The transitive, reverse, and reverse transitive relations can also be established.
In this document, the term "dependency" will always refer to a dependency
between compilation units, discovered at compile time of the depending
compilation unit. Those declared in META6.json
are instead known as
distribution dependencies. They're very different concepts, and it's
important to keep them apart. The first is interesting for precompilation
management, and the latter for module installation tooling.
Precompilation is a mechanism for caching the results of compilation on disk, such that the overhead of compilation can be avoided in the future. The precompilation of a compilation unit can only be formed if all of its dependencies have already been precompiled (implying that at the point the precompilation of a module is completed, all of its transitive dependencies must have already been precompiled). Note this does not mean there is any need for the dependencies to be precompiled before precompilation of begins. It is not only reasonable, but also desirable, for dependencies to be precompiled "on demand" and recursively as they are discovered. That is, you start precompiling at the "top" of a dependency graph and by the time you've precompiled that module, the whole graph has been precompiled.
Since resolving dependency specifications to dependencies is a part of the
compilation process, a cached precompilation directly identifies the cached
precompilations it depends on. That is to say, the compiled effect of a
use
or need
statement is not to resolve a dependency specification, but
instead to identify a precise dependency to load. This in turn means that no
module database lookups are required. It also has the consequence that once
you load an existing precompilation, all module resolution from that point
on is "predestined". It's important, therefore, to ensure that precompilations
are tied to the entire set of "repositories" available for consideration at
the point they were formed.
Furthermore, precompilations are statically linked against the precompilations
of their transitive dependencies as well as against a particular compilation
of the Perl 6 compiler and its CORE.setting
. Therefore, the identify of the
Perl 6 compiler - obtained through $*PERL.compiler.id
- should also be
considered part of the environment the precompilation was formed in. This will
also support rakudobrew
style tools, which enable switching between different
versions and backends.
In an ideal world, precompilation would always be possible for all compilation
units. In reality, some things simply won't work out. Stashing a file handle
in a BEGIN block and reading from it later won't work, because file handles
cannot be serialized. And trying to load the precompilation of two modules
that make incompatible augment
ations to the same class can be expected to
fail. While we can culturally encourage writing precompilation-clean
code,
it's also important to provide a mechanism whereby a compilation unit can opt
out. This in turn means that its transitive reverse dependencies can not be
precompiled. A compilation unit can declare it is not elligible for being
precompiled with:
no precompilation;
Something that can locate and load modules, and potentially manage their installation and precompilation, is known as a repository (or, more fully, a compilation unit repository). A repository at its simplest could just map module names to source files on disk. A more complex repository might support installation of a distribution's modules along with associated resources, as well as managing precompilation of modules.
Repositories are structured as a linked list - that is, a repository may refer
to another. A repository can always provide its unique identity, which must
incorporate the identity of any repository it refers to. In normal startup, the
PROCESS::<$REPO>
symbol will be set a default repository that supports the
installation of distributions (a CompUnit::Repository::Installation
). Any
-I
includes, or any paths in a PERL6LIB
environment variable, will cause
PROCESS::<$REPO>
to instead point to a chain of repositories that ends with
the default CompUnit::Repository::Installation
that is normally there.
A use lib
installs a lexical $?REPO
which takes precedence over that in
PROCESS
. Note that for Perl 6.christmas, we will only support the use of
use lib
in scripts, not in modules, as its interaction with precompilation
is more complex than we have time to reasonably consider (and it's better to
wait until we've a good answer than to use a hacky one now).
This linked list replaces @?INC
. There are a number of benefits to this
design:
- Precompilations must factor in the whole set of repositories that could
have been considered when resolving a dependency specification. To see why,
consider an installed module M that contains
use N
. Its precompilation will have been done at installation time, and so only considered other installed modulesN
. Later, the developer ofN
is developing a new version, and uses-Ilib
to ensure this newN
is used. His test script doesuse M
, and the expectation would be that theN
islib
is loaded. However, if we hit the precomp ofM
, it has already committed to an installedN
. For correct behavior, the repository for-Ilib
- assuming it supports precompilation management - must itself manage precompilations for the whole transitive dependency chain. Put another way, only the module precompilations stored by the repository at the head of the chain are valid. This is most easily delivered by the head of the chain passing its precomp repository "down the chain". - You can trivially implement a repository that gives you a "clean room" with no system-wide modules just by it never delegating to the next thing in the chain.
- If you want "parallel" consideration of a set of other repositories with no ambiguities allowed across them, or "sequential" consideration of a set of repositories with the first providing a resolution winning, these can both be provided by a repository that contains an array of other repositories and delegates as needed.
- It's easier to test repository implementations in isolation if they're not
tied up with some global state, like an
@?INC
.
The module management API can be broken up into:
- Guts provided by a Perl 6 implementation (e.g. Rakudo) that do the various low-level tasks.
- Entities that capture the information associated with concepts such as "dependency specification" and "compilation unit".
- Roles for the various aspects of module management. Some of these are pure interfaces (that is, entirely required methods). Others provide default implementations that will usually be sufficient, and just need some details to be filled in.
- Provided implementations of those interfaces for a number of common use cases.
Alternative implementations of the roles to handle use cases beyond what Perl 6 provides direct support for are both expected and encouraged.
The CompUnit::Loader
is responsible for actually loading a compilation unit
into memory, either from source or a precompiled representation. It can work
with both files and in-memory byte buffers in either case. Implementations are
free to efficiently mmap
files into memory, allowing a single copy of a
precompiled module to exist in memory and be shared by many Perl 6 processes.
The methods on this are all expected to be called on the CompUnit::Loader
type object.
class CompUnit::Loader is repr('Uninstantiable') {
# Load a file from source and compile it
method load-source-file(Str $path) returns CompUnit::Handle {
...
}
# Decode the specified byte buffer as source code, and compile it
method load-source(Buf:D $bytes) returns CompUnit::Handle {
...
}
# Load a pre-compiled file
method load-precompilation-file(Str $path) returns CompUnit::Handle {
...
}
# Load the specified byte buffer as if it was the contents of a
# precompiled file
method load-precompilation(Buf:D $bytes) returns CompUnit::Handle {
... # XXX this one needs MoarVM/JVM backends to expose a new API
}
}
The CompUnit::Loader
class only expects to be asked to load a given source
or precompiled file into memory once. Asking it to load the same pre-compiled
file twice is erroneous. Provided this is respected, concurrent calls to the
methods of CompUnit::Loader
are allowed.
The CompUnit::Handle
class is a handle to a loaded compilation unit. Its
exact internals are implementation defined, but it can be expected to have
the following methods:
class CompUnit::Handle {
# If the compilation unit has a callable EXPORT subroutine, it will
# be returned here. A Callable type object otherwise.
method export-sub() returns Callable {
...
}
# The EXPORT package from the UNIT of the compilation unit; a
# Stash type object if none
method export-package() returns Stash {
...
}
# The EXPORTHOW package from the UNIT of the compilation unit;
# a Stash type object if none.
method export-how-package() returns Stash {
...
}
# The GLOBALish package from the UNIT of the compilation unit
# (the module's contributions to GLOBAL, for merging); a Stash
# type object if none.
method globalish-package() returns Stash {
...
}
}
Since these methods are all reads, it is safe for them to be called concurrently on the same instance. (If the internal implementation is not automatically threadsafe, it must do appropriate concurrency control, so consumers of this class don't have to worry.)
We'll likely end up wanting to factor this into some guts-providing class too, to be discovered/defined during implementation.
A dependency specification consists of a short name for a module (for example,
Grammar::Tracer
), and optionally a version matcher and auth matcher (which
will be smart-matched against the version and authority of a potential
dependency to see if it is satisfactory).
class CompUnit::DependencySpecification {
has Str:D $.short-name is required;
has $version-matcher = True;
has $auth-matcher = True;
}
A compilation unit is represented as follows:
class CompUnit {
# The CompUnit::Repository that loaded this CompUnit.
has CompUnit::Repository $.repo is required;
# That repository's identifier for the compilation unit. This is not
# globally unique.
has Str:D $.repo-id is required;
# The low-level handle.
has CompUnit::Handle $.handle is required;
# The short name, version, and auth of the compilation unit, if known.
has $.short-name;
has $.version;
has $.auth;
# Whether the module was loaded from a precompilation or not.
has Bool $.precompiled = False;
# The distribution that this compilation unit was installed as part of
# (if known).
has Distribution $.distribution;
}
To be more fully defined, but consists of the metadata from a META6.info and a way to reach any resources declared within it.
A precompilation store provides storage of pre-compiled things. It is not concerned with precompilation validity, just storage. Most of the time, the Perl 6 built-in implementation that stores precompiled files on disk will be sufficient. However, a tool that wished to bundle a Perl 6 implementation together with a bunch of precompiled scripts/modules for distribution would do an alternative implementation of this role.
subset CompUnit::PrecompilationId of Str:D
where { 2 < .chars < 64 && /^<[A..Za..z0..9_]>$/ };
role CompUnit::PrecompilationStore {
# Load the precompilation identified by the pairing of the specified
# compiler and precompilation ID.
method load(CompUnit::PrecompilationId $compiler-id,
CompUnit::PrecompilationId $precomp-id)
{ ... }
# Store the file at the specified path in the precompilation store,
# under the given compiler ID and precompilation ID.
method store(CompUnit::PrecompilationId $compiler-id,
CompUnit::PrecompilationId $precomp-id,
Str:D $path)
{ ... }
# Delete an individual precompilation.
method delete(CompUnit::PrecompilationId $compiler-id,
CompUnit::PrecompilationId $precomp-id)
{ ... }
# Delete all precompilations for a particular compiler.
method delete-by-compiler(CompUnit::PrecompilationId $compiler-id)
{ ... }
}
The CompUnit::PrecompilationId
subset type restricts the identifiers that
can be passed to things that should be valid on any even slightly modern file
system.
A precompilation manager is responsible the creation, location, and validity
testing of precompiled modules. The only thing it's not concerned with is
their actual storage. Therefore, a precompilation repository will typically
be configured with an implementation of CompUnit::PrecompilationStore
.
A class implementing the CompUnit::Repository
role knows how to resolve a
dependency specification to a concrete dependency and load it. It can also
provide a list of all the compilation units it has loaded. Implementations of
this role should take care to cache dependencies they already loaded, so the
same CompUnit
instance can be handed back each time the same dependency is
requested. Further, implementations of CompUnit::Repository
should cope with
concurrent calls.
The CompUnit::Repository
role is defined as follows:
role CompUnit::Repository {
# Resolves a dependency specification to a concrete dependency. If the
# dependency was not already loaded, loads it. Returns a CompUnit
# object that represents the selected dependency. If there is no
# matching dependency, throws X::CompUnit::UnsatisfiedDependency.
method need(CompUnit::DependencySpecification $spec,
# If we're first in the chain, our precomp repo is
# the chosen one.
CompUnit::PrecompilationRepository $precomp =
self.precomp-repository())
returns CompUnit:D
{ ... }
# Returns the CompUnit objects describing all of the compilation
# units that have been loaded by this repository in the current
# process.
method loaded()
returns Iterable
{ ... }
method precomp-repository()
returns CompUnit::PrecompilationRepository
{ CompUnit::PrecompilationRepository::None }
}
The X::CompUnit::UnsatisfiedDependency
type is defiend as:
class X::CompUnit::UnsatisfiedDependency is Exception {
has DependencySpecification $.specification;
method message() { ... }
}
If there is no unique module that can be considered the "best" given the
specification, an X::CompUnit::AmbiguousDependencySpecification
exception
will be thrown:
class X::CompUnit::AmbiguousDependencySpecification is Exception {
has DependencySpecification $.specification;
has @.ambiguous;
method message() { ... }
}
A compilation unit repository that supports installation of distributions should implement the following role:
role CompUnit::Repository::Installable does CompUnit::Repository {
# Installs a distribution into the repository.
method install(
# A Distribution object
Distribution $dist,
# A hash mapping entries in `provides` to a disk location that
# holds the source files; they will be copied (and may also be
# precompiled by some CompUnit::Repository implementations).
%sources,
# A hash mapping entries in the `resources` to a disk location
# that holds the files; again, these will be copied and stored.
%resources)
{ ... }
# Returns True if we can install modules (this will typically do a
# .w check on the module databaes).
method can-install() returns Bool { ... }
# Returns the Distribution objects for all installed distributions.
method installed() returns Iterable { }
}
An implementation of a precompilation store that stores the files on disk. It expects to be constructed with a prefix.
my $psf = CompUnit::PrecompilationStore::File.new(
prefix => "$path/.precomp"
);
Its directory layout is straightforward: one directory per compiler ID, which contains subdirectories named for the first two letters of the precompilation ID, which in turn contain the files. So:
$psf.store('comp-1', 'deafbeef');
$psf.store('comp-2', 'deafbeef');
$psf.store('comp-1', 'baadf00d');
Would create:
$path/.precomp/comp-1/ba/baadf00d
$path/.precomp/comp-1/de/deadbeef
$path/.precomp/comp-2/de/deadbeef
Those used to Git internals will note this is the same structure as it uses for loose objects.
The installation compilation unit repository supports both installation and precompilation. It expects to be configured with a directory, where it will keep both original sources and precompilations.
class CompUnit::Repository::Installation
does CompUnit::Repository::Installable {
has $.prefix is required;
...
}
This repository expects that any changes to modules installed through it will be done directly using it, and so any changes to module sources done directly will not be taken into account. Under its prefix, an installation repository establishes the following structure:
repo.lock # A lock file
dist/[sha1] # JSON-serialized distribution info (SHA-1 of dist ID)
sources/[index] # Module source files, by ascending ID
resources/[index] # Module resourece files, by ascending ID
short/[sha1] # Short-name quick lookup file by sha1 of the shortname
precomp/... # Precompilation store
dependencies # Pairs of short-name to short-name SHA-1s
HEAD # Current identifier
When we install a distribution, we do the following:
- Create the
repo.lock
file; if it already exists, fail. - Create a SHA-1 of the unique identifier for the distribution, and check it
does not already exist in
dist
; delete lock file and fail if so. - Copy each of the source files into
sources
, allocating each one an ascending ID. - Copy each of the resource files into
resources
, allocating each one an ascending ID. - Update the Distribution object with this logical name => ID mapping.
- Serialize the Distribution object and store it into
dist
, using the SHA-1 of its unique identifier for the file name. - XXX precomp newly added source files
- Load the dependencies file, and compute the transitive closure of short name relations from it.
- Take the provides section of the distribution, SHA-1 all the short names, and then find the the transitive reverse dependencies on those short names. This gives the set of modules whose precompilations will be invalidated by the new modules.
- XXX precompile all of those impacted modules
- Insert any new dependencies into the shortname dependency graph, and save it back to disk.
- Create/update the shortcut file for quick module resolution.
- Delete the
repo.lock
file.
If a CompUnit::Repository::Installation
is not the final repository in a
linked list, then it must also be sure to invalidate its precompilations if
the identity of the next repository in the chain changes. (Since the usual
case is per-user modules, it's reasonable to assume there will be write
access to the precomp directory to update it).
Further, when queried for a module, a CompUnit::Repository::Installation
that is followed in the chain by another CompUnit::Repository::Installation
should consider its set of modules together with its own, such that the best
option across the two of them wins (and if there are ambiguities, the nesting
level does not count as a resolution). (Conjectural: we could have a
CompUnit::Repository::Installation::Override that always considers itself to
know better, and only delegates if it can't do better.)
XXX The exact factoring here may want a little futher explanation.
The file system compilation unit repository is used when the -I
flag is
specified. It is initialized with a prefix. It assumes it will be able to
create and write to a .precomp
directory beneath that path. When it is
asked for a module, it first checks if it has a precompilation for it. If
so, it ensures it is still valid. The validy check involves:
- Checking the identity of the next repository in the chain
- Checking the modification time of the module itself
- Checking the modification times of the transitive dependencies of that module that live within this CompUnitRepo::FileSystem
A CompUnit::Repository::FileSystem
always tries to satisfy a request for a
module first, and only delegates if it is unable. It also doesn't care for
versioning or authority.
class CompUnit::Repository::FileSystem does CompUnit::Repository {
has $.prefix is required;
...
}
System-wide modules go in a path derived from the --prefix
that Rakudo is
built with. The Configure.pl
script can also be given a --module-prefix
,
which will override this. Tools like rakudobrew will likely wish to specify
a single common --module-prefix
so modules are shared between the things
they will switch between. This directory will be managed by an instance of
CompUnitRepo::Installation
.
A user's local module installations go into a .perl6
in their home
directory, with precompilations under it in .perl6/precomp/
. This will be
managed by a CompUnit::Repository::Installation
(which will delegate to
the system-wide CompUnit::Repository::Installation
).
It's gone.
Good question. Perhaps there should be a lookup mechanism of installation repositories by some kind of identifier ("system", "user", etc.)
Probably also gone, though probably also partly covered by whatever we build to satisfy the previous question.
The term "dependent distributions" should probably be something like "distributional dependency" because, well, it connotes to a reverse dependency otherwise.