Final GSoC19 Report for: A stronger foundation for interactive Haskell tooling
Throughout the summer I've been working on ecosystem improvements in order to allow Haskell tooling initiatives, mainly Haskell-IDE-Engine, to unfurl their whole potential.
Here we're going to look at what work was done during the summer and how that relates to the original proposal (pdf). The proposal had this to say:
This proposal will substantially improve the reliability, performance and maintainability of tooling efforts.
This proposal consists of three main areas:
- Improvements in GHC to reduce friction for downstream tooling efforts (Task 1 and 2)
- Work on cabal-helper to enable easy new-build support (Task 3)
- Integration of the above into Haskell IDE Engine (Task 4)
Significant progress was made with respect to the first two points, however not enough time was leftover to fully finish cabal-helper and integrate into HIE during GSoC.
Journey through submitted issues and pull-requests
The first two PRs nicely correspond to Task 1 from the proposal. I had originally alotted 2 weeks for this, it ended up taking around 4 weeks due to code-review and needing to get back into the GHC development process first. The third PR is a backport of the other two onto the 8.8 release branch which we just missed. Thankfully that got merged and the API changes will be part of the 8.8.1 release.
- Allow using targetContents for modules needing preprocessing
- Allow API clients to use GhcMake.downsweep directly
- Backport recent changes relevant for HIE to 8.8 branch
At this point, since my schedule was very tight, I figured I'd have to drop at least one of the tasks so I skipped right ahead to Task 2.2 as I knew we could handle multiple components downstream instead.
The original idea of Task 2.2 was to share data (in-process) across multiple GHC sessions in order to reduce the overall memory usage. Since Task 2.1 was scrapped this morphed into simply investigating why and where GHC is using so much memory to begin with.
On the side I was also thinking about if we could share memory across multiple processes, hence the first PR above. I later scrapped that idea because it's just a bit too much work.
- rts: Fix retainerProfile early return with TREC_CHUNK
- rts: Fix STATIC_INLINE macro
- rts: Fix -hT option with profiling rts
- rts: Divorce init of Heap profiler from CCS profiler / rts: Rename the nondescript initProfiling2 to refreshProfilingCCSs
- Generalise Profiling Heap Traversal code from Retainer profiler
- Introduce heap profiling by user specified roots
About 50 commits and about 5000 lines of RTS C code later:
confirmed my hypothesis
that GHC could really use a better (well any) cache eviction strategy for
ExternalPackageState datatype or, well, more sharing.
As a side-effect of the investigation I had also produced a new heap profiling mode which allows more directed quantitative analysis about the heap memory usage of Haskell code. Please see the "Introduce heap profiling by user specified roots" PR description for more details.
This code is still under code-review and I hope to get it merged in time for GHC 8.10.
Unfortunately this took quite a bit longer than expected, 5 weeks or so, but once I'd started I saw the potential to not just make it easier to debug and reduce GHC's memory consumption but literally any Haskell program's!
I figured while it's not what we originally planned for I should finish this anyways because there likely isn't a lot of people in the Haskell ecosystem that know about RTS implementation details and like to touch gnarly C code.
After finishing with the GHC work I changed gears and finally started looking into how we can best integrate cabal v2-build support via cabal-helper in HIE. This also involved looking thought Matthew's work on hie-bios.
Just getting HIE setup for development took much longer than expected. Mainly due to the fact that currently it only works when built via Stack and that's not my preferred build-tool.
I tried getting the tests working with cabal v2-test but there seem to be some problems there that need more debugging.
Meanwhile I was also working on getting cabal-helper ready for release.
I was able to remove some very old ghc-mod-era hacks in the way cabal-helper deals with multiple interdependent components in a package: Remove crusty old helper code (commit).
This is a very important change as it essentially reduces the impedance
mismatch between cabal-helper and Cabal-3.0's new
to zero allowing us to take advantage of that going forward.
Other notable commits:
Flesh out project discovery API | This will hopefully allow replacing ghc-mod's automatic project discovery with something more principled in the future.
I'm planning on releasing a new version of cabal-helper in the next couple of weeks, there are still a hand full of fixes left to do but it's certainly ready for preliminary integration now.
On a rainy weekend I implemented an idea I had floating around my head for a while and also produced some fixes for cabal-install in the process:
The idea here is to make build output fully sequential, and incidentally reproducible, in the presence of build concurrency but still allow for live output. Turns out this is also useful for HIE since I realised for proper UX we might have to parse and present build output from the build-tool as diagnostics in the future.
This is still not finished but once I get some feedback from the other cabal devs I'll see to getting it merged since I'm already actively using it in Emacs myself.
- Fix v2-install ProgramDb confusion #6195
- Fix cabal-install fighting over index cache with old versions #6164
- Solver failure on 3.0 with distsdir from 220.127.116.11 due to outdated "compiler" cache #6163
In conclusion as always there's still a lot to do, especially on the Haskell-IDE-Engine front. I will likely keep on working in this space even after GSoC but getting some hands-on experience with HIE has shown me just how complex it is.
I think the main focus for the future should be making HIE easier to contribute to. We really need more people working on it to tame the inherent complexities of the LSP protocol and downstream tools coming together.
I experienced some major friction in this area: The test coverage leaves something to be desired and the granularity of the tests we do have is simply way too coarse. If any of them fail you really have no idea what's going on at a high level.
Only supporting one build system for building HIE itself is also not helping to encourage contributions. My guess would be that there is a 50/50 split of cabal vs. stack in the Haskell world so we're just disgruntling half the possible contributors right there.
Other than that we should continue the current trend of upstream-first development rather than hacking around problems downstream -- that never goes well in the long run.