Skip to content

Instantly share code, notes, and snippets.

@fendor
Created August 31, 2020 10:31
Show Gist options
  • Save fendor/5b26d36538787c8c2ed8c6eb6e68541f to your computer and use it in GitHub Desktop.
Save fendor/5b26d36538787c8c2ed8c6eb6e68541f to your computer and use it in GitHub Desktop.
Final submission for Fendor's Google Summer of Code Project "Multiple Home Units For GHC"

GSoC Submission

The goal of my Google Summer of Code project was to introduce support of multiple home units for GHC. The main motivation for this proposal is to help IDEs provide a seamless developer experience. Currently, two of the bigger IDE projects in Haskell are ghcide and Haskell IDE Engine. Both of these projects aim to support a workflow where developers can work on multiple packages, such as a package's library and executable at the same time. The work-force behind it is GHC itself, which is responsible for actually compiling a user's source files. While we succeeded to create a fork that implements the proposal, nothing has been merged into upstream yet. This is to be expected, since it is a huge change that will impact a considerable number of the GHC API's consumers. Therefore this feature needs to be designed with care and code changes verified to have no unintended side-effects. Hopefully, this will turn into a crucial part of the Haskell ecosystem and tooling, improving tools such as cabal, stack and IDEs.

I worked on this project on gitlab and github. The accounts are:

We will now outline where important parts of the contributed work can be found:

  • In the wiki for multiple package per session
    • This wiki entry outlines how the implementation for multiple home units was planned and implemented so far and documents limitations and alternative design spaces.
  • The initial MR was !935 which makes up the foundation for multiple home unit support.
    • I rebased the PR and fixed all of the tests that were failing. To make it easier for merging, the PR got squashed into a single commit. Unfortunately, my own contributions got swallowed by the Gitlab UI.
    • An unsquashed version of the PR can be found here. Relevant commits are from e479c010...ff9acba2.
  • The motivating issue is !10827 which is a long outstanding issue.
  • The MR !3950 contains the complete work of my project. It also includes motiviation, current short-comings in addition to discussing design decisions. It also fully incorporates the MR !935, as we decided that it is desirable to merge the basic changes without any functionality.

During the work on the project, we also contributed to a number of other parts of the infrastructure that were required for testing our changes, show-casing their usefulness and general maintenance required to make them work again.

  • Contributions to head.hackage. Although not all of these patches can be merged right away, it will simplify the migration story for them once our contribution is merged into GHC.
    • Patching relevant packages to make them compile with GHC HEAD: !108 and !105
    • Patching hie-bios (only the last commit is relevant)
      • This patch can not be merged as-is because it depends on changes in GHC HEAD that have not been merged yet
    • Patching ghc-check (only the last commit is relevant)
      • This patch can not be merged as-is because it depends on changes in GHC HEAD that have not been merged yet
    • Patching ghcide (only the last two commits are relevant)
      • This patch can not be merged as-is because it depends on changes in GHC HEAD that have not been merged yet

For testing, we also patched the known tool cabal:

  • wip/multi-unit-repl (only the last commit is relevant)
    • This patch can not be merged as-is because it breaks the abstraction between Cabal and cabal-install.

What is working

Producing compiler artefacts

The important first step is to be able to compile modules that belong to different home units in parallel, without relying on information read from the package database. Since the package database is incomplete we want to generate compiler artefacts, such as hie-files.

Producing compiler artefacts can be achieved with a new major cli mode:

ghc -unit @unitA -unit @unitB ... -unit @unitZ

The argument @unitA uses response files, where unitA is a filepath that contains all the compilation options necessary to compile the home unit. Currently, the main limitation is that each unit must supply the -no-link argument, to avoid reading information from disk. This limitation is currently unlikely to be lifted, since it violates the separation of Cabal and GHC.

Interactive GHC session

It is possible to load multiple home units into an interactive GHC session. To achieve that, we introduced a new command line interface, namely:

ghc --interactive -unit @unitA -unit @unitB ... -unit @unitZ

The argument @unitA uses response files, where unitA is a filepath that contains all the compilation options necessary to compile the home unit. By default, the last given argument is the main home unit. Every code-evaluation is relative to this main home unit. This is necessary to avoid ambiguity if multiple home units define the same identifier. It is possible to change the main home unit by using the GHCi command :switch <unit-id>. We further introduce other commands that are helpful for defining tests and evaluating code within the interactive session:

  • :setunit <unit-id> <ghc-option>*: Set the compilation options for the given <unit-id>. If the given unit id is unknown, it will be created. This command can be used to define ad-hoc home units and adding them to the interactive session.
  • :addunit <unit-id> <target>*: Add targets to the global module graph that are located in the given unit id.
  • :switch <unit-id>: Switch the current main unit to <unit-id>.

This is enough for a basic usage of multiple home units within an interactive GHC session.

Testing with HEAD.Hackage

With HEAD.Hackage it is possible to use the MR on real-world applications for testing that loading succeeds.

To make this testing more viable, we also patched the cabal build tool. It was necessary to patch cabal to lift the limitation of loading only a single component at a time. The patch can be seen in the branch wip/multi-unit-repl. It allows opening of multiple components in the same ghci session, via the syntax cabal repl component1 component2. Changes in wip/multi-unit-repl can not be merged as-is, since it breaks the abstraction between Cabal, the library and cabal-install, the executable. It is meant to help testing of multiple home units.

With these changes, it is possible to test our project on a variety of packages on Hackage.

Project Limitations

There are two open issues that can not be solved within this MR and must be taken care of in subsequent work. Merging does not depend on these issues though.

Module visibility

Cabal packages usually define private and public modules, and packages can depend on the latter. Before this MR, this did not matter to the interactive session, as it was only possible to open a single component/package and a package can import its private modules. Therefore, there was no need for GHC to have an explicit notion of private/public modules and it just assumes that all modules are public. However, with multiple home units, the following situation might arise:

Module Visibility example

The important issue is that module D might depend on module C, although C is private to Unit Y and it should not be possible for Unit X to depend on the private modules of Unit Y. Therefore, we might accept programs that are not valid for a cabal package. However, we do not expect real-world problems as long as tools, such as cabal, make sure a package only imports public modules.

A potential way to solve this issue, is to make GHC understand module visibility. In particular, we would need to extend the command line interface to specify the visibility of a module and dependency resolution needs to detect invalid imports. In theory, this should not be difficult, as GHC already understands when a user imports private modules from external packages and provides helpful error messages.

Package Imports

The language extension Package Imports is used to resolve ambiguous imports from different units. In particular, to help disambiguate importing two modules with the same name from two different units.

Example:

import "foo" Data.Foo
import "bar" Data.Foo

It uses the package name for the disambiguation. The problem is that the package-name is read from disk and for home units, there is no such information on disk, therefore this feature can not work at the moment. A home unit is specified as a unit-id in the cli flag and looke like -package-id <unit-name>-<version>-<hash>. There is currently no way get the name of the package in a standardised way. One way to solve this would be to create a name-scheme for unit ids, but this proposal is out of scope for this project.

Presentation to the Community

The project has been announced, explained and motivated to the community. Two blog posts have been written:

It was also possible to present the project at this year's Haskell Implementor's Workshop under the title "Multiple Home Units" by Hannes Siebenhandl. The presentation was held on the 28th of August, in front of many members of the GHC development team where we explained design decisions and answered community questions. This was definitely a highlight for me, as it was one of the first time to actually present something in front of the Haskell community.

The slides for the presentation can be found here.

Remaining work

There is a lot of code to review and it stands to reason that there is a lot of small and big issues in the code that need to be taken care of before merging it upstream. Therefore, it is hard to estimate how much time must be spent to work reviews into the project. However, we do not expect any major reworking of the internal changes. We expect discussions about the following topics:

  • API design decisions
  • User UX
    • Especially in the interactive sessions, there will be some bikeshedding about naming.
  • Insufficient test-coverage
  • Documentation
    • Documentation is a big part of such a breaking API change.
    • User Documentation needs to be adapted as well.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment