Skip to content

Instantly share code, notes, and snippets.

@lizmat
Last active September 29, 2015 14:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save lizmat/f3807956c354c14902a3 to your computer and use it in GitHub Desktop.
Save lizmat/f3807956c354c14902a3 to your computer and use it in GitHub Desktop.
CURLI roadmap ideas, please leave your comments here
The CompUnitRepo::Local::Installation class (or short: CURLI) is a class
that is part of the Perl 6 core for installing modules from any source,
as described in S22. What S22 does not describe, is directory and file
layout. I'll try to explain in this post how I think we should do this,
having had the experiences (and problems) with panda and the current
implementation of CURLI.
CompUnitRepo::Local::File
=========================
Technically, the CURLF is not much of a CompUnitRepo, as it doesn't support
installing of modules as such. It is basically just a frontend to the
module loader that supports Perl 5 like module loading semantics (aka,
without any from / auth / version / api support) on files that happen to be
living at a certain location on a file system. Therefore, a CURLF should
*never* need to handle precompilation of modules. A CURLF is intended for
a development situation, *not* for a production environment.
Pluggability / Composability
============================
S11 / S22 describe an API. That API will need to be further defined as
we get further into implementing CURLI. The idea should always be that
*anybody* should be able to create their own CompUnitRepo module (think
packagers, or companies, or TPF) to their wishes. So anything CURLI can
do, should be possible to other developers as well.
CURLI Prerequisites
===================
- modules are installable from a Distribution object. Typically a tar-file,
as already is being used in the panda ecosystem. A distribution may
contain 0 or more loadable modules (something a "use", "need" or "require"
can find) and associated meta-information and/or data.
- a distribution contains several types of meta-data. Some parts of these
will be necessary to select a module for loading (e.g. the auth / version /
api parts needed for the "candidates" method). Other parts are only needed
when actually loading the module (e.g. the %?RESOURCE hash). Still other
parts only needed during introspection-like actions at runtime (e.g. calling
.WHY on a code object, if set up in a non-standard way. Or finding out
the actual textual description of the distribution).
- installed modules need to "survive" a rakudo update/upgrade. In the
previous panda implementation, modules needed to be re-installed (over
the internet) after *every* rakudo update (and that includes *any* rakudo
recompile). This becomes very tiresome for core developers and is
therefore one of reasons why some (bad) changes in rakudo are not seen
in the ecosystem until it is too late.
- precompiled modules may need to continue to exist for different versions
of rakudo. Think switching between different versions of rakudo using
rakudobrew: for a given installed base of modules, you don't want to have
to recompile again and again when switching: you should only need to
precompile *once* for any rakudo compilation.
- precompiling all installed (source) modules should be simple, fast and
possibly asynchronous. This will allow core developers to more easily do
a sanity check on changes in rakudo. It could even become part of
"make install".
- the core libraries (such as Test.pm, NativeCall.pm) should be installed
using the CURLI install process. It's really a matter of eating your own
dogfood. And in the long run should simplify matters.
- having a large number of modules installed, should have *no* effect on
bare startup time, either for starting the REPL, or doing a -e.
- only when a CURLI is asked to look up a module through .candidates,
should it load the necessary meta-info for satisfying the selection of
a module only. This should probably be in some file with a dedicated
(rakudo version independent) data-format that contains this meta-info for
*all* installed modules in this CURLI. For performance, this meta-data
should probably also live in the form of a precompiled hash that would be
much faster to load, as no parsing would be needed.
- the CURLI object for a given base directory, should be a sentinel
object. It is free to install distributions asynchronously if so required
(perhaps the .install method should allow for a number of distributions to
be installed, instead of just one). Since the CURLI object is a
sentinel, it can keep all necessary meta-info for selection of a module
in memory, and update both the memory copy, as the one on disk when
installing a distribution.
- the meta-information of a given distribution, can be considered frozen,
as the distribution itself (for a given name / auth / version / api).
Therefore any module related (other than needed for module selection) data
should live in a rakudo version independent format. With possibly a
precompiled version for performance.
CompUnit prerequisites
======================
- there should only be one sentinel CompUnit object for a given compilation
unit. Even if two threads are doing a .candidates simultaneously and return
the same compilation unit, both threads should share the same CompUnit
object.
- when rakudo is asked to load a CompUnit object, it will first determine
whether the CompUnit object is already loaded (by checking if its .WHICH is
known). If it is not known to have been loaded already, it will call the
.load method on the object. After which the object should be able to
provide pertinent information to rakudo about namespaces created, etc. etc.
(this needs to be worked out further).
- an implementation of CompUnit only needs to be able to .load it: rakudo
should take care of all related issues to prevent circularities and double
loading.
Implementation implications
===========================
- all precompiled code is bound to a specific rakudo version (compilation,
actually). To make management of rakudo versions more easy for helper
applications such as rakudobrew, all precompiled files that are associated
with a given CURLI should live in a single directory inside the base
directory of the CURLI: removal of a rakudo version (compilation, actually)
would then just mean removing the directory for that compilation.
- S11 stipulates that limitations of the file system (most notably,
case-insensitivity nowadays) should not be able to affect the naming of
modules in any way (so if someone would like to give their module a
Japanese name, that should Just Work(TM)). This implies some name
mangling from the language visible name, to the actual file on the file
system. The current CURLI implementation uses numbered files: one could
argue that some SHA might be better. But since all installation of modules
*should* be handled by a single sentinel CURLI object (even across processes,
so some file system type locking / semaphoring will be needed) it would
seem that simple numbering is adequate, as having SHA's as filenames would
not add much information from a file system (ls) point of view anyway.
Directory layout
================
The rakudo install directory, as well as any CURLI base-directory, may live
anywhere on a file system to which the user has access. Only executable
components need to be installed in system directories such as /usr/local/bin.
Should one wish to have a global, system supplied rakudo, only then it seems
warranted to actually have the rakudo install directories, as well as any
CURLI base-directories, at a system location, under protection of sudo/root.
rakudo install directory
|
\-- .precomp
|
\-- (compilation ID: one for each SHA of rakudo)
|
|-- perl6.moarvm (BOOTSTRAP)
|
|-- CORE.setting.moarvm
|
|-- RESTRICTED.setting.moarvm
|
|-- installed modules meta info (needed for .candidates)
|
|-- (distribution ID: one for each installed distribution)
| |
| \-- (compunit ID: one for each compunit in the dist)
| |
| |-- runtime meta info if any (e.g. %?RESOURCE)
| |
| |-- precompiled file
| |
| \-- other precompilable data
|
\-- lib
|
\-- (compunit ID: one for each of Test, NativeCall, etc.)
|
\-- precompiled file
(base-directory: one for each CURLI)
|
\-- (distribution ID: one for each installed distribution)
|
|-- module meta data for .candidates
|
\-- (compunit ID: one for each compunit in the dist)
|
|-- runtime meta info needed for loading
|
\-- .dist
|
* original distribution files name mangled
Cleanup considerations
======================
- removing support for a compilation ID of rakudo:
rm -rf rakudo install directory/.precomp/(compilation ID)
- uninstalling a distribution ID:
rm -rf base-directory/(distribution ID)
rm -rf rakudo install directory/.precomp/(distribution ID)
Rakudo loading process (on MoarVM)
==================================
- "perl6" is a script that loads moarvm in the install/bin directory.
- it passes the name of the script as execname
- it passes a libpath to the nqp/lib
- it passes a libpath to .
- specifies perl6.moarvm (main.nqp) to be run with the given parameters
- this loads nqp modules Perl6::Grammar & Perl6::Actions
- Perl6::Grammar loads Perl6::World
- sets up a Perl6 compiler
- calls it with the given (and generated) parameters
- token comp_unit then loads the settings (CORE.settings.moarvm)
- runs the indicated code
- runs the END blocks
From -use- to actual loading
============================
- the use/need/require statements should simply generate code that will
call the appropriate settings sub (e.g. USE/NEED/REQUIRE). This will move
most of the higher logic of compunit loading to the Perl 6 level, where it
is more easily maintained. Since the actual loading for standard compunits
will still happen at nqp level, the performance consequences should be
minimal.
- to further facilitate development and maintenance, the USE/NEED/REQUIRE
should probably be multis, with at least different candidates for :from.
This should allow loading of compunits to become even more pluggable,
provided a different :from is used. Such a candidate could even be
exported by a module (think auto-generated code from a WDSL template
that you could simply access by saying 'use FooBar:from<wdsl>, where
the FooBar file would contain the indicated template).
- the default implementation of USE/NEED will simply go through @?INC
and ask for .candidates to each of the CUR's. The first CUR to return
any CompUnit objects, will stop the search. If more than one CompUnit
is returned, a tie-breaking mechanism will be employed. If this does
not result in only one remaining CompUnit, then error should occur
(think a class consuming two roles, where each role is supplying a method
with the same name: the class will need to resolve this by supplying its
own method with that name). In the case of more than one CompUnit, the
issue should be resolved by supplying a stricter -use-/-need- statement.
- the tie-breaking logic of USE/NEED can be governed by pragma's in the
future. For now, the tie-breaking logic is as follows: if the CompUnits
are of different auth's, the tie-break will fail. If the CompUnits have
different "api" values, the tie-break will fail. If all CompUnits have
the same "api" value, then the CompUnit with the highest version will
be selected.
@lizmat
Copy link
Author

lizmat commented Sep 26, 2015

ugexe: trying to answer your questions

  • each CUR implementation can decide on which hooks it supports, and how they are specified. I agree CURLI should have support for hooks of many steps in the installation process. Further than that notion, I have not come yet. I hope we will be able to finalize this in this round of development. If you have any ideas in that direction, I would like to see them (maybe in a separate gist?)

  • like Perl 5, I think Perl 6 can only install compunits for the currently running process. Which implies that a backend is already selected. And that also implies to me, it should only install any files only needed for that backend. If you want to be able to run it on another backend as well, one would need to install the compunit using that other backend. This could maybe be automated by having one Perl 6 shell out to another Perl 6 using a different backend. But I would think a process can only install for the backend it is running on itself.

  • the complete original distribution, along with test files, is installed on the system, in my view. This should allow for rerunning any tests, should one so desire. But I would definitely not expect that to happen by default when upgrading a rakudo. With regards to upgraded dependencies: if you are stupid enough to run with underqualified -use- statements in production code, you deserver what you get. Perl 6 gives us the unique opportunity to get away from dependency hell. It only requires a little strictness from developers: if you are sure that the code in a given lexical scope runs with version1.2 of Foo, then specify that in your -use- statement. Then you will never have to worry about upgrading rakudo, as long as the that compunit remains installed. And that is a matter of dependency management.

  • CURLI can only precomp for the backend it is currently running under. Technically, it probably could precomp for other backends (by shelling out to a Perl 6 with a different backend). But I would, at least for 6Mas, disallow that for now, and possibly for ever.

  • since CURLI can only precomp for its own backend, it only needs to handle failures in that case.

  • with the proposed directory layout, it should be possible for different backends to use the same CURLI base directory. So a CURLI should probably understand that its sibling in the same base directory, but on a different backend, already installed the distribution files. It would still need to update its precompiled version of the installed modules meta data (needed for .candidates). But since the rakudo on the other backend will have a different compilation ID, the precompiled meta info will live in a different directory, so they will not interfere with each other between backends.

  • user supplied precomped files, even without source, smells like a package manager wanting to control things. This is fine, but I don't think the standard CURLI needs to support this. The whole point of CUR is that anybody can make their own CUR implementation, as long as it provides CompUnit objects that can be .load ed. So if a package manager want to just supply precomped files, they would have to make their own CUR (or find someone in the P6 community to make one for them).

  • we will need a way find out the build order of provides. Suggestions welcome.

  • we will need to look for prior art in this matter. Suggestions welcome.

  • I'm not sure what you mean with Distribution.content.

  • we may want to keep a MANIFEST around. On the other hand, since we try to keep things that belong together, together as much as possible, and prevent files from being copied at all, there is much less to clean up.

  • the parts of META6.json that are needed for .candidates, are kept separately in a fast readable format already. The parts of META6.json that are needed to load a compunit, are also kept separately in a fast readable format. The original META6.json file will be introspectable on demand: only then should it be opened. So not for selecting the compunit, or for loading it.

  • the location of a non-core CUR, is indeed an issue. I think we will need a separate command line parameter (and/or environment variable) for this, along with specification of its "source". I was thinking something along

    perl6 -C /usr/lib/CloudPan.pm -I cloud#cloudpan.org/6pan

The -C would take a path to a CUR (possibly even precompiled), which would get loaded (because it would generate an on-the-fly standard CompUnit object) and run its BEGIN blocks. This would install the "cloud" prefix for -I . With the -I specification, you would then instantiate a CloudPan CUR and unshift it on @?INC. And the you would be in business. :-)

  • you should not uninstall modules unless you are sure that no other modules need it, and none of your production code needs it. This is no different from the current situation in P5 or other languages. OTOH, I think uninstalling, with current disk size to distribution size ratio, is something that we will need to worry about less and less. If you really are short on disk space, maybe we need to support some sort of generic precomp only CUR. Or maybe someone out there will just make one, because the can!

@ugexe
Copy link

ugexe commented Sep 26, 2015

Some notes on your notes:

Installation of precompiled files only with no source was your suggestion, it has nothing to do with a package manager: http://irclog.perlgeek.de/perl6/2015-05-27#i_10665828 Additionally a user may wish to supply their own precompiled files because precompiling on things like a raspberry pi on the JVM is incredibly slow.

Distribution.content: http://design.perl6.org/S22.html#content

Regarding tests, S22 said tests would not be installed which is why I brought it up. http://design.perl6.org/S22.html#t

The definition of underqualified use statements does not seem as simple as I interpret from your comments due to supersedes, superceded_by, and excludes from S22 combined with later installation or uninstallation of modules.

@lizmat
Copy link
Author

lizmat commented Sep 26, 2015

ugexe:

  • then we may need to have a CUR for installing precomps only in the ecosystem. I don't see that as something that would need to be supplied by the core.
  • ah, that Distribution.content! :-) The idea of that was, that if we have support files in a Distribution, the file names will be mangled on the file system. The .content method should provide the programmer with a transparent way to get at those files, by specifying their "logical" location, instead of their actual location on the file system (which may be unknown before actual installation of the distribution).
  • I guess the thinking on that has changed: the entire distribution will live unpacked, but name mangled, somewhere under the base-directory of the CURLI. Precompilation of compunits will happen directly from the files in that directory, into the appropriate directory under .precomp. Since precomped files under rakudo refer to each other by absolute paths (afaik), this is another reason why putting them together may actually be a performance win (fewer disk reads necessary).
  • supersedes and superseded_by & excludes influence the selection process in .candidates, nothing else. So suppose you have use Foo:ver<1.2>:auth in your code, and it has been superseded by 1.21, then .candidates will return the CompUnit for 1.21, instead of the one for 1.2.

@ugexe
Copy link

ugexe commented Sep 27, 2015

Regarding precompiled files only:

No other CUR is needed for installing precompiled only code. CompUnit is what currently prevents this with the way it handles initialization using assumptions about the path passed in. The problem is for it to work at all you would also need to allow external CompUnit implementations to be used. It seems trivial to just allow a CompUnit to work in this fashion in the core (by looking at the extension like is already done for the rest of CUR instead of the current method of assuming $!path is a source file and just tacking an extension onto it and declaring that the precompiled path [like File.moarvm.moarvm]) vs requiring both an external CUR and CU to be supplied on the command line.

Regarding supersedes:
S22 says it has additional meaning for external packagers, so it's effects would go beyond .candidates

@FROGGS
Copy link

FROGGS commented Sep 27, 2015

By saying which version of a compunit you need in a lexical scope,
you are ensuring that your code will always run in the future as long
as that version of the compunit is installed on the system.

That is also begging for letting it fail later... You forgot that the system you are running on is not something constant. It will change over time, and when that happens you want to pull in a patched dependency, or a patcherd dependency of a dependency. This is the reason why the v1.2.+ sytanx exists, and you (the compiler) is not in charge of disallowing its use.
The problem with the dependency of a dependency situation is that you often cannot manipulate the dependency. So you cannot change it to allow a new very specifc version. This means you want to have depencies that allow a range of versions for allowing to get bugfixes in.
The same is what you want for your software. You want to constrain module use of a given auth or auth set, but constrain the version down to a range that allows bugfix releases.

@lizmat
Copy link
Author

lizmat commented Sep 27, 2015

FROGGS: I realise that the system you are running on is not something constant. And that you want to prevent use of a buggy version of a compunit. For that, we have "supersedes": it will allow you to specify an alternate compunit B to be used whenever something tries to load a faulty compunit A. I think this is a much better mechanism then relying on version numbers / semantic versioning.

And if it is really a minor version bump with bug fixes and completely identical API, you cannot even remove the lower versioned compunit from the installation. Because you cannot guarantee that there isn't code in your installation, that has a fixed dependency on the lower numbered version.

So I really think the superseded mechanism, is a much better choice. Note that this only affects the selection process in .candidates of CURLI: if a compunit is initially selected, but supersede information is available for it, instead the other compunit will be used (unless there is supersede information for that available as well, of course).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment