Skip to content

Instantly share code, notes, and snippets.

@wz1000
Created May 4, 2018 07:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wz1000/46bb4b2121f0911bbbf4d4743fafaba8 to your computer and use it in GitHub Desktop.
Save wz1000/46bb4b2121f0911bbbf4d4743fafaba8 to your computer and use it in GitHub Desktop.
.hie file discussion on #ghc
2018-02-13 23:57:02 wz1000 alanz, alexbiehl, mpickering: https://gist.github.com/wz1000/81e0bb720237e8ae6193c7ac2d28c913
2018-02-13 23:58:15 alanz wz1000, I like what you are aiming for
2018-02-13 23:58:22 alanz well, all of you
2018-02-13 23:58:23 mpickering wz1000: I just skimmed it now but if I were you I would write down something very concrete (answer all the questions at the end)
2018-02-13 23:58:33 alanz I guess I should actually pay some attention
2018-02-13 23:59:01 mpickering if you don't write down something very concrete then making progress is much more difficult as people love discussing things until they die
2018-02-13 23:59:47 wz1000 yeah, this is just some initial brainstorming
2018-02-14 00:00:03 c_wraith I'd love to be able to get more information out of ghc
2018-02-14 00:00:36 alanz I think there are also two parts to it, getting the info out, and keeping it available, as in storing it on disk or some such
2018-02-14 00:01:09 alanz Because expecting a big project's detail to be in RAM won't work
2018-02-14 00:01:41 mpickering also I don't think reading a .hi file is trivial currently?
2018-02-14 00:02:18 mpickering It's a good idea but needs a clear vision about what to include and how to include it
2018-02-14 00:02:46 mpickering you could also flesh out exactly how you imagine tooling interacting with these files
2018-02-14 00:03:08 mpickering If we had source plugins, you could experiment with a format and then later merge it into GHC if it proved to be successful
2018-02-14 00:08:46 alanz mpickering, what do you mean by that last statement? Experiment with extracting info and storing it somehow, or experiment with amending source?
2018-02-14 00:09:02 alanz I am in favour of the former, nervous of the latter
2018-02-14 00:09:31 mpickering someone writes a source plugin which extracts the kind of information that wz1000 is talking about, into their own format
2018-02-14 00:09:49 mpickering which then tools like haddock and so on consume
2018-02-14 00:10:09 alanz ok, that sounds good
2018-02-14 00:10:11 mpickering so it is decoupled from GHC development and can obey a different release cycle
2018-02-14 00:10:19 mpickering can be iterated on rapidly and so on
2018-02-14 00:10:30 alanz yes, and also can have alternative implementations, doing different things
2018-02-14 00:11:15 alanz Once TTG is fully into place we should be able to more easily transmit info thorugh the AST too
2018-02-14 00:11:22 alanz as an alternative
2018-02-14 00:21:32 wz1000 but haddock is tightly integrated into ghc itself
2018-02-14 00:22:58 wz1000 so if this thing has to be the source of info for haddock it has to move in sync with it
2018-02-14 00:25:19 alanz Well, maybe the way to go is to define a plugin architecture as is being proposed, but also include a DB API, which can potentially write info into a .hi file
2018-02-14 00:25:35 alanz So it can be used by haddock or any other
2018-02-14 00:26:04 alanz especially as haddock is aiming to be more loosely coupled, for the same reasons. i.e. to iterate faster
2018-02-14 00:26:58 alanz and each plugin gets a "directory" in the db.
2018-02-14 00:27:05 alanz how ever it gets structured
2018-02-14 00:27:38 alanz wz1000, a bit like the plugin data stuff we have in hie, but potentially persistent
2018-02-14 00:46:39 mpickering one obvious comment wz1000 is does this not require that we must foresee that we will want to run a certain tool on our program so that we can invoke the plugin when we compile the program?
2018-02-14 00:47:16 mpickering I suppose that is already true for how haddock and other tools work today?
2018-02-14 00:50:46 wz1000 mpickering: so you are suggesting every tool(hie, haddock etc.) have its own plugin to build its db?
2018-02-14 00:59:40 @hvr wz1000: the "a global database of cross referenced data" item seems to deviate from a KISS principle
2018-02-14 01:00:13 @hvr wz1000: i.e. if there's already enough info in the .hi files to construct such a db, then GHC ought not do it
2018-02-14 01:00:47 @hvr wz1000: i.e. such a db would feel redundant
2018-02-14 01:01:19 wz1000 hvr: but stuff inside one module refers to stuff inside other modules
2018-02-14 01:01:34 wz1000 we need some way to sort that out
2018-02-14 01:02:05 @hvr wz1000: we just need a way to reconstruct the scope within a module
2018-02-14 01:02:05 wz1000 a unique identifier for each symbol in the context of one compilation
2018-02-14 01:02:42 @hvr wz1000: otherwise you'd be forced to actually parse e.g. the haddock format, which we do not want to do inside GHC
2018-02-14 01:03:33 wz1000 hvr: is the hyperlinked source generation done as a seperate step?
2018-02-14 01:03:48 wz1000 because I imagine that needs to know where everything is defined
2018-02-14 01:03:58 @hvr wz1000: yes, but not only that; if a haddock docstring refers to a name like 'map'
2018-02-14 01:04:17 @hvr wz1000: then resolving that name is currently done during the haddock step
2018-02-14 01:04:18 wz1000 yeah
2018-02-14 01:04:22 wz1000 ok
2018-02-14 01:04:58 @hvr "hi haddock" aims at keeping haddock and ghc reasonably decoupled
2018-02-14 01:05:57 wz1000 so haddock needs to parse and typecheck the source again anyway?
2018-02-14 01:06:26 @hvr there was a subtle hint about that in the "hi haddock" proposal
2018-02-14 01:06:48 @hvr insofar as the hyperlinked-source part may pose a challenge to tackle in future work
2018-02-14 01:07:27 @hvr but it's considered out of scope for now
2018-02-14 01:07:39 wz1000 so the info we need for hyperlinked source is the same as that we need in hie or other tools
2018-02-14 01:08:18 @hvr possibly; but that's too much for a GSOC
2018-02-14 01:08:20 @hvr imho
2018-02-14 01:08:21 wz1000 so my proposal is that we let ghc do the hard work of generating that while compiling(since it already knows about it)
2018-02-14 01:09:03 @hvr the "hi haddock" proposal is specifically scoped in a way that it's realistic to be done within the alloted time window
2018-02-14 01:09:19 wz1000 and then hie, and --hyperlinked-source can do their thing simply by reading the .hi file
2018-02-14 01:10:39 wz1000 yes, but it would be better to have some kind of larger, consistent vision that ties in things like HIE and other tooling
2018-02-14 01:11:24 @hvr sure, as long as perfect doesn't become the enemy of good :)
2018-02-14 01:11:34 @hvr i.e. we got only a couple weeks to get it done
2018-02-14 01:12:17 @hvr that's the thing I worry about ambitious plans for GSOCs
2018-02-14 01:13:09 wz1000 your proposal, if implemented, would also simpify hie a lot
2018-02-14 01:13:53 wz1000 the doc map is pretty much the only thing we need from haddock: https://github.com/haskell/haskell-ide-engine/blob/master/src/Haskell/Ide/Engine/Plugin/Haddock.hs#L131
2018-02-14 01:14:53 @hvr fwiw, I've got additional hidden use-cases for the stuff that "hi haddock" would bring us
2018-02-14 01:15:59 wz1000 hvr: btw, is there any better way to implement these(lookupDocHtmlForModule, lookupSrcHtmlForModule): https://github.com/haskell/haskell-ide-engine/blob/master/src/Haskell/Ide/Engine/Plugin/Haddock.hs#L42
2018-02-14 01:16:11 @hvr and I believe we'd likely see more use-cases emerge once we have more information available in .hi files
2018-02-14 01:17:43 @hvr wz1000: maybe; haddock is slowly migrating to exposing more meta-data via .json files
2018-02-14 01:18:18 @hvr wz1000: initially for the quickjump stuff, but there's more to come in that area to allow for richer doc browsing
2018-02-14 01:19:06 @hvr but the plans haven't been solidified yet, it's all still a bit in flux
2018-02-14 01:23:22 wz1000 hvr: that is exactly the kind of info we need for hie - for each symbol(in the source and in the haddock), where is this defined, which import brought it into scope, where else is it used?
2018-02-14 01:24:05 wz1000 and of course, what is the docstring for it?
2018-02-14 01:24:16 wz1000 and the thing is - ghc already knows all of this
2018-02-14 01:24:35 wz1000 so it would be nice if there was a single source for this
2018-02-14 01:24:46 wz1000 that haddock and HIE and whatever could query
2018-02-14 01:29:38 @hvr ... the .hi files :)
2018-02-14 01:30:33 @hvr (and then there was also the long-term idea to use cbor in .hi files...)
2018-02-14 01:34:24 mpickering but.. I don't think it is trivial to read a .hi file?
2018-02-14 01:37:01 mpickering It seems cleaner to me to have .hi files be "exactly what GHC needs" and ".hie" files or "extended interface files" for this kind of additional information
2018-02-14 01:39:19 mpickering especially separating .hie files from normal GHC development seems like it could be very desirable to me
2018-02-14 01:43:03 @hvr and GHC would generate those .hie files?
2018-02-14 01:43:08 @hvr reliably?
2018-02-14 01:44:30 mpickering via a source plugin potentially
2018-02-14 01:44:31 @hvr w/o requiring to install some add-on tooling
2018-02-14 01:45:05 mpickering add-on tooling = haskell package
2018-02-14 01:45:07 mpickering in my mind
2018-02-14 01:45:11 @hvr oh dear
2018-02-14 01:45:25 @hvr I'm pessimistic about that one ;)
2018-02-14 01:45:37 alanz mpickering, that sounds like my database API
2018-02-14 01:46:05 alanz i.e. some kind of infrastructure provided by GHC that can be used to store and recall info for a plugin
2018-02-14 01:46:05 @hvr it may sound good on paper, but I will only be convinced once it exists ;-)
2018-02-14 01:46:14 alanz no problem
2018-02-14 01:46:29 alanz plenty of people ready to scratch that itch
2018-02-14 01:47:29 @hvr mpickering: also, are source plugins even portable?
2018-02-14 01:48:13 @hvr mpickering: i.e. are those available on all platforms (specifically those that don't support TH nor dynlinking via the system linker?)
2018-02-14 01:49:32 @hvr (the compiler plugins I know weren't)
2018-02-14 01:53:27 wz1000 hvr: for some kinds of analysis(particularly the kind mpickering was talking about), we might need the entire Typechecked AST
2018-02-14 01:53:45 wz1000 so that could potentially blow up the size of hi files
2018-02-14 01:55:12 mpickering portability is a problem, that is a good point
2018-02-14 01:55:50 @hvr wz1000: thoughtpolice iirc had either plans or a proof of concept of compressing .hi files
2018-02-14 01:56:03 mpickering I remember a phab patch for that
2018-02-14 01:56:21 @hvr right... but I don't remember what became of it
2018-02-14 01:56:35 wz1000 wouldn't something standard like gzip do?
2018-02-14 01:56:45 @hvr wz1000: that's so 80ies =)
2018-02-14 01:57:07 @hvr today it's more about things like lz4 or lzop
2018-02-14 01:57:19 @hvr for these kind of on-the-fly compression
2018-02-14 02:00:26 alanz Given the performance requirements of batch compilation, it is probably better to have a separate file for this kind of thing
2018-02-14 02:00:40 alanz Which gets spat out if/when there is a plugin
2018-02-14 02:00:59 @hvr alanz: yeah, but I don't think the plugin plan is possible at all
2018-02-14 02:01:00 alanz and leave the .hi files for their current usage
2018-02-14 02:01:08 alanz why not?
2018-02-14 02:01:14 @hvr alanz: given that it would be unavailable on some platforms
2018-02-14 02:01:34 alanz why would that be the case? GHC is available?
2018-02-14 02:01:36 @hvr and thus would effectively make haddock unavailable
2018-02-14 02:01:43 @hvr alanz: because plugins don't work everywhere
2018-02-14 02:01:58 alanz in what scenarios? cross-compilation?
2018-02-14 02:02:05 @hvr just like TH doesn't work everywhere or GHCi doesn't
2018-02-14 02:02:49 alanz well, IDE targeted stuff is inherently self-limiting, and would exclude those odd cases
2018-02-14 02:03:17 alanz but imo that is more of an argument for a separate file / db then
2018-02-14 02:22:36 dfeuer Will anything go wrong if I unsafeCoerce a function from (a -> a -> a) to the type (a -> a -> (# a #))?
2018-02-14 02:23:20 dfeuer Some testing suggests this is okay, but it's not quite officially sanctioned.
2018-02-14 02:24:36 dfeuer Similarly, it seems like it should be okay to unsafeCoerce from ((# a #) -> r) to (a -> r)....
2018-02-14 02:31:52 mpickering wz1000: https://github.com/nboldi/heed
2018-02-14 02:33:02 mpickering looks exactly like you were suggesting
2018-02-14 02:38:30 mpickering looks like a frontend plugin currently though
2018-02-14 02:38:33 mpickering but would be trivial to change
2018-02-14 02:49:03 @hvr alanz: I don't mind much if it's part of .hi or there's a 2nd .hie file; my main objection is to rely on a mechanism that isn't portable and would thus make it unavailable to haddock
2018-02-14 02:49:38 @hvr thus falling short of the goals set by the "hi haddock" proposal
2018-02-14 02:49:46 @hvr also,
2018-02-14 02:49:59 alanz well, maybe both these things need to happen
2018-02-14 02:50:18 alanz how does hi haddock manage to do it, that a plugin would not?
2018-02-14 02:50:36 @hvr alanz: do what exactly?
2018-02-14 02:50:58 @hvr also, we still wouldn't be able to decouple the plugin from ghc devel
2018-02-14 02:51:05 @hvr it'd still be bundled w/ ghc anyway
2018-02-14 02:51:39 alanz well, hi haddock is able to work without the problems of a plugin. How?
2018-02-14 02:51:50 * alanz has not actually read the proposal
2018-02-14 02:52:08 @hvr alanz: the proposal is about having GHC natively augment the .hi file, w/o any plugin
2018-02-14 02:52:31 alanz ok, so wherever GHC is, that thing is
2018-02-14 02:52:39 @hvr thus allow haddock and ghci to benefit from that
2018-02-14 02:52:55 @hvr and every other tool that can access .hi files
2018-02-14 02:52:57 alanz And what makes you think that a plugin would not be able to do the same thing? if shipped as part of GHC?
2018-02-14 02:53:19 @hvr alanz: because a plugin would require facilities which aren't available everywhere
2018-02-14 02:53:30 @hvr the same ones you'd need for e.g. TH to work
2018-02-14 02:54:11 @hvr it's not about using the plugin API, it's about using the linking mechanism that causes problems
2018-02-14 02:54:16 alanz cross-compilation? What scenarios have that?
2018-02-14 02:54:22 @hvr for starters, AIX
2018-02-14 02:54:40 @hvr that's the base-line
2018-02-14 02:54:57 @hvr actually AIX is slightly above the baseline, as it's a stage2 compiler
2018-02-14 02:55:00 alanz so the problem is putting it in the hi file?
2018-02-14 02:55:02 @hvr and yet doesn't have TH/interp support
2018-02-14 02:55:27 @hvr no, the problem is relying on a plugin that needs to be linked into a `ghc` process dynamically
2018-02-14 02:55:40 @hvr similiar to how TH code works
2018-02-14 02:56:05 @hvr i.e. the "dynamic" part is the one that's not portable
2018-02-14 02:56:44 @hvr so, if instead we talk about a statically linked plugin, that is part of GHC's default distro
2018-02-14 02:57:02 @hvr then it would work; but then calling it a plugin is a weird thing to say
2018-02-14 02:57:07 alanz ok. So maybe define the plugin API so it can be static or dynamic, and use haddock on AIX as static
2018-02-14 02:57:21 alanz in which case it is not so weird
2018-02-14 02:57:37 @hvr yeah, but then you can just as well link it statically all the time to avoid the overhead
2018-02-14 02:58:04 @hvr there's no benefit imho to go the trouble of dynamic plugins which are yet another moving piece & point of failure
2018-02-14 02:58:22 @hvr if there's at least one platform that requires that essential plugin to be statically linked anyway
2018-02-14 02:58:23 alanz except that it opens the door for *other* plugins, to be used in architectures that are not as constrained
2018-02-14 02:58:27 alanz which is most of them
2018-02-14 02:58:45 @hvr well, *other* plugins are an orthognal concern anyway
2018-02-14 02:59:05 wz1000 also, i think the recompilation checker doens't work with plugins
2018-02-14 02:59:06 alanz well, my point is to possible use common interfaces for them
2018-02-14 02:59:33 wz1000 so that means ides using plugins to compile haskell take a massive penalty
2018-02-14 02:59:41 alanz my understanding is that the recompilation checker is some piece of voodoo magic that seriously needs some attention
2018-02-14 02:59:53 @hvr this seems like a rabbit hole in the making ;)
2018-02-14 03:00:12 alanz I know it does not honour the flags you pass in to force recompilation if you need to
2018-02-14 03:00:42 alanz GHC as a whole is a rabbit hole. Especially the scaffolding/scheduling stuff
2018-02-14 03:01:03 alanz Built up over 20 years, to cope with all the weird corner/use cases out there
2018-02-14 03:01:04 @hvr I see there are multiple tasks emerging here
2018-02-14 03:01:10 alanz I agree
2018-02-14 03:01:25 @hvr one would be looking at the plugin API to address its shortcomings
2018-02-14 03:01:47 @hvr so that GHC can become more modularised in theory
2018-02-14 03:01:57 @hvr w/o regressions
2018-02-14 03:03:02 alanz yes. And more modularised is an important goal, which enables a lot of other things
2018-02-14 03:03:09 @hvr and then the question is, whether all other tasks which could somehow be expressed in terms of a more perfect plugin API should be held back until the API has become adequate enough
2018-02-14 03:03:39 @hvr i.e. whether to place a moratorium on extensions until the plugin API has been refactored/reengineered
2018-02-14 03:04:05 alanz I am in favour of going ahead, and being prepared to refactor based on actual experience
2018-02-14 03:04:07 @hvr or wether it's ok to go the old-fashioned way in the meantime
2018-02-14 03:04:12 alanz But I am an empiricist
2018-02-14 03:05:04 @hvr (and accept the future cost of porting it over to the new API once it's available)
2018-02-14 03:05:30 wz1000 I think this stuff can live in mainline ghc
2018-02-14 03:05:40 wz1000 If clang can do it, why not ghc?
2018-02-14 03:05:41 alanz the hi haddock stuff?
2018-02-14 03:06:06 alanz I agree. As far as I am concerned IDE support needs to be right in there, as a first class citizen
2018-02-14 03:06:13 wz1000 yes, but along with extra info to support more stuff
2018-02-14 03:06:13 alanz like in Roslyn
2018-02-14 03:06:19 alanz yes
2018-02-14 03:06:52 alanz But I imagine we end up with some standard plugins, and the ability to bring in experimental ones until they become standard
2018-02-14 03:06:58 alanz much like in hie
2018-02-14 03:07:12 @hvr that's an admirable goal obviously
2018-02-14 03:07:57 @hvr also, I'm not sure if cabal already provides the necessary infrastructure for making plugins a good citizen
2018-02-14 03:08:24 @hvr i.e. ability to specify required plugins in the .cabal file etc; I remember discussions about it, but I don't remember them bearing fruits yet
2018-02-14 03:08:38 wz1000 I'm proposing that ghc dumps a fixed set of data: docstrings, definitions, references, imported by, and the typechecked ast
2018-02-14 03:08:46 alanz I see IDE type plugins being orthogonal to the cabal file
2018-02-14 03:08:57 @hvr alanz: oh, those kind of plugins
2018-02-14 03:09:01 alanz they belong to the tooling
2018-02-14 03:09:11 @hvr alanz: I was thinking of plugins which transform the AST
2018-02-14 03:09:27 alanz no, we are explicitly not talking about those
2018-02-14 03:09:30 @hvr like e.g. doing some clever constant folding
2018-02-14 03:09:33 alanz Or at least I'm not
2018-02-14 03:09:52 alanz I think that is a whole different kettle of fish
2018-02-14 03:10:08 @hvr is that a different plugin API?
2018-02-14 03:10:35 alanz There is one currently on ghc-proposals
2018-02-14 03:10:49 alanz https://github.com/ghc-proposals/ghc-proposals/pull/107#issuecomment-362334846
2018-02-14 03:11:36 mpickering I agree with hvr that using a plugin would not be suitable for haddock which needs to work everywhere so it doesn't really impact his proposal
2018-02-14 03:12:02 mpickering but experimental support can be achieved by a plugin, which works on all major platforms, and then folded into GHC once it is stable and if it is desired
2018-02-14 03:12:59 mpickering wz1000: https://github.com/ghc-proposals/ghc-proposals/pull/108
2018-02-14 03:13:46 alanz yes
2018-02-14 03:14:15 alanz mpickering, lots of good ideas floating around
2018-02-14 03:15:56 mpickering The fact remains that .hi files are not intended to be read by anyone but GHC itself, which makes them difficult to work with for anyone else.
2018-02-14 04:01:54 @hvr mpickering: well, it's enough if lib:ghc can read them
2018-02-14 04:01:59 @hvr haddock already links against lib:ghc
2018-02-14 04:02:14 @hvr that's not something we intend to change necessarily
2018-02-14 04:02:47 @hvr so it doesn't really conflict with `.hi files are not intended to be read by anyone but GHC itself`
2018-02-14 04:03:43 @hvr i.e. I don't care about a portable representation; just give me an API which breaks w/ every major GHC release, and I'm happy
2018-02-14 04:05:19 mpickering The GHC API is sufficient.. I can accept that but this is about making it the easiest possible rather than.. possible
2018-02-14 04:06:06 @hvr well, you still don't want everyone to reinvent their own parsers for whatever augmetned info format we come up with
2018-02-14 04:06:31 @hvr so you'd still end up with some common library/API for reading that meta-data into convenient Haskell types
2018-02-14 04:06:40 @hvr and those types will likely be closely related to lib:ghc types
2018-02-14 04:06:49 @hvr so you'd still link against lib:ghc's API in some way
2018-02-14 04:07:03 @hvr -> just throw it into lib:ghc already
2018-02-14 04:07:20 mpickering Not necessarily
2018-02-14 04:10:45 mpickering It's a space which needs exploring
2018-02-14 05:24:16 angerman o/
2018-02-14 05:26:03 angerman mpickering: keep in mind that there is this long standing rumor that we could potentially improve ghcs performance by encoding hi files in a way that is faster to deserialize, which of course is much easier without any official format that other tools expect to be able to read.
2018-02-14 05:27:47 mpickering ok but I find the premise implausible :P
2018-02-14 05:30:01 @hvr angerman: hence why I brought up cbor as well as lzo(p)/lz4
2018-02-14 05:30:36 @hvr both have the potential to accelerate reading .hi files
2018-02-14 05:31:17 angerman mpickering: have some library to read them :-) unless you want to go and write in a different language that should be fine no?
2018-02-14 05:31:32 angerman hvr: mmap all the way ;-)
2018-02-14 05:36:41 @hvr angerman: not portable :)
2018-02-14 06:19:02 angerman Hvr: :p
2018-02-14 06:19:22 angerman hvr: fast path everywhere except for aix ;-)
2018-02-14 12:17:23 steshaw[m] angerman: hvr: sounds like a job for https://hackage.haskell.org/package/compact ?
2018-02-14 13:14:51 @hvr steshaw[m]: `compact` is about GC ... serialisation is a separate concern for which you'd e.g. use something like CBOR
2018-02-14 13:15:42 @hvr steshaw[m]: also I don't think `compact` uses an efficient serialisation
2018-02-14 13:15:56 steshaw[m] I thought `compact` was kind of like flatbuffers (or capnproto) which doesn't require deserialisation
2018-02-14 13:16:17 @hvr well, yeah, if see `compact` as a more portable `mmap`, yes
2018-02-14 13:16:37 @hvr implying that you want to dump internal structures as-is to the disk
2018-02-14 13:16:43 @hvr w/o any optimisation
2018-02-14 13:16:54 @hvr but I don't think that's a good idea
2018-02-14 13:16:54 steshaw[m] No deserialisation is much faster than some deserialisation
2018-02-14 13:17:19 steshaw[m] There is no need for optimisation if there is no deserialisation.
2018-02-14 13:17:33 steshaw[m] Why isn't it a good idea?
2018-02-14 13:17:49 @hvr if we store more information in the .hi files, we also need to make sure to do that in a reasonably efficient way than simply dumping out whatever bloated structure we came up with; i.e. we need to model the serialisation
2018-02-14 13:18:12 @hvr so, compression imho becomes a must
2018-02-14 13:18:34 @hvr both by chosing a good representation as well as employing an additional compression layer
2018-02-14 13:18:53 steshaw[m] You can compress a memory buffer if you want.
2018-02-14 13:19:12 @hvr ...which only addresses half of my request :)
2018-02-14 13:19:29 @hvr it doesn't take care to fine-tune the representation
2018-02-14 13:19:36 steshaw[m] I think you're saying that with `compact`, it's perhaps easy to have stuff included in the .hi file that you didn't expect to end up there.
2018-02-14 13:19:56 @hvr not only that, but I don't trust `compact`'s serialisation to be in fact... "compact"
2018-02-14 13:20:07 @hvr it seems more like generic auto-derived serialisation
2018-02-14 13:20:21 steshaw[m] I see, I haven't tried it yet but it's something that I've dreamed of for many years :)
2018-02-14 13:20:25 @hvr w/o any concern for compactness
2018-02-14 13:20:40 steshaw[m] If it's not actually compact, can it be improved?
2018-02-14 13:20:52 @hvr probably, if we switch it to CBOR =)
2018-02-14 13:21:05 steshaw[m] :)
2018-02-14 13:21:26 @hvr (and/or apply encoding techniques from CBOR)
2018-02-14 13:21:38 steshaw[m] Well, doesn't CBOR require deserialisation to other structures? IMO that is a downside if so.
2018-02-14 13:21:54 @hvr what do you mean by that?
2018-02-14 13:22:12 steshaw[m] If not `compact`, then what about flatbuffers or capnproto?
2018-02-14 13:23:33 wz1000 hvr: that means we need ghc to depend on a cbor library
2018-02-14 13:23:37 steshaw[m] Well, if CBOR allows you to load the buffer (or mmap) it into memory and then access the contents (as if it were a database) then I am in favour of it. This is how flatbuffers works.
2018-02-14 13:23:49 @hvr I haven't looked at how flatbuffers or capnproto encode...
2018-02-14 13:24:07 @hvr wz1000: ...so? that was a bit of a hidden agenda of cborg ;-)
2018-02-14 13:24:23 steshaw[m] The `compact` seemed like flatbuffers but even better because it allowed you to use normal Haskell data structures.
2018-02-14 13:24:51 @hvr steshaw[m]: I haven't looked at how flatbuffers or capnproto specifically encode their data; but I'd expect them to be mindful about it, if they're a network protocol
2018-02-14 13:25:01 wz1000 I was under the impression that ghc was very conservative with what it depends on
2018-02-14 13:26:02 steshaw[m] hvr: neither are a network protocol. It's a format for a "flat" buffer. It allows fast use of the buffer (without deserialisation to other data structures).
2018-02-14 13:26:24 @hvr wz1000: yes, but cborg was supposed to be a faster, better, stronger replacement for `binary`, originally intended to be folded into `binary` proper: http://code.haskell.org/~duncan/binary-experiment/binary.pdf
2018-02-14 13:26:42 wz1000 ok
2018-02-14 13:26:43 @hvr wz1000: but it ended becoming a new library
2018-02-14 13:26:59 @hvr because changing `binary` in-place is messier
2018-02-14 13:28:08 wz1000 so I think we are in agreement that we need to put more info in the .hi files, atleast enough so that haddock doesn't need to compile the source again
2018-02-14 13:28:14 @hvr and cborg does change the encoding of `binary`; and cborg actually tries to be more principled about separating the concern of "just serialise a data structure" vs "encode a binary format"
2018-02-14 13:29:11 @hvr fwiw, replacing binary w/ `cborg` in GHC and evaluating its performance before/after would be a GSOC project of its own
2018-02-14 13:29:44 steshaw[m] Not having to decode the network buffer is an old trick from network programming. I vaguely recall that BEA Tuxedo had such a format so that proxies could quickly look into the message and forward it without deserialising it (i.e. the proxy can peek directly at the network buffer to do it's work).
2018-02-14 13:30:11 @hvr I've written enough network code in C8x to remember :)
2018-02-14 13:30:37 @hvr but network code is concerned about zero-copy
2018-02-14 13:30:38 steshaw[m] C8x?
2018-02-14 13:30:42 @hvr C89
2018-02-14 13:30:54 steshaw[m] Oh, right :)
2018-02-14 13:30:55 @hvr or rather gnu98, before C99 was a thing
2018-02-14 13:31:36 @hvr also, in network protocols you still care about compact representations,
2018-02-14 13:31:48 steshaw[m] Yes, zero-copy. It is nice. It keeps proxy-type server having low memory overhead.
2018-02-14 13:32:03 @hvr which for servers which need to handle 100k connections is essential
2018-02-14 13:32:48 steshaw[m] Could the `cborg` library operate in a similar way? i.e. to flatbuffers/capnproto?
2018-02-14 13:32:49 @hvr when you design network protocols you try to fit essential data into MTUs
2018-02-14 13:33:15 @hvr to be able to access the essential information ideally asap; and it helps if the data makes it in the first package
2018-02-14 13:34:19 @hvr steshaw[m]: cborg is a heavily optimised streaming format; you'd still have to traverse the memory buffer in order
2018-02-14 13:35:32 steshaw[m] True, but the BEA Tuxedo thing wasn't really that type of low-level protocol. The kind of proxy I'm thinking of was called a content something-or-other proxy. It is able to look at the entire message (not just the header), to make it's decisions.
2018-02-14 13:36:09 wz1000 hvr: you had mentioned "rich doc browsing". is anything written down somewhere?
2018-02-14 13:36:26 @hvr at the end of the day, I don't care how we do it, as long as we manage to keep the serialisation bloat to a minimum (cause w/ cabal new-buidl we have *lots* of packages; I tend to have easily 10k and more package instances in my ~/.store; so I do care a lot about keeping `.hi` files small)
2018-02-14 13:37:03 @hvr wz1000: no, and I also said that the plans are still rather in flux/vague
2018-02-14 13:37:19 @hvr wz1000: i.e. still in brainstorming phase
2018-02-14 13:37:29 steshaw[m] hvr: is there a lot of bloat in the .hi file atm. I remember this was a problem in the Idris universe.
2018-02-14 13:38:17 @hvr fwiw, I don't mind trading deserialising performance for additional meta-data if it gets us a more compact representation
2018-02-14 13:38:40 @hvr cause if that extra info is only needed for extra tools, GHC can skip reading it
2018-02-14 13:38:44 @hvr so it costs GHC nothing
2018-02-14 13:38:57 wz1000 yeah
2018-02-14 13:39:09 @hvr extreme case: have separate .hie files
2018-02-14 13:39:15 @hvr for stuff that GHC doesn't need
2018-02-14 13:39:38 @hvr and also have a flag to inhibit their generation
2018-02-14 13:39:50 wz1000 yes
2018-02-14 13:39:51 @hvr if for any reason you need to speed up GHCs compilation
2018-02-14 13:39:58 @hvr and know you don't need the .hie files
2018-02-14 13:39:58 wz1000 I think that is a workable approach
2018-02-14 13:40:17 @hvr like we e.g. the only-typecheck mode in GHC
2018-02-14 13:40:19 steshaw[m] Sounds like there are multiple problems at play.
2018-02-14 13:40:28 @hvr steshaw[m]: yeah, it's never simple =)
2018-02-14 13:40:49 @hvr lots of interoperating concerns/goals in balance w/ each other
2018-02-14 13:41:07 steshaw[m] What is the metadata required by other systems but not GHC?
2018-02-14 13:41:46 wz1000 ghc knows all of the data while compiling
2018-02-14 13:42:05 wz1000 but throws it out and doesn't include it in .hi files
2018-02-14 13:42:25 wz1000 Basically information about the source file itself
2018-02-14 13:42:57 wz1000 where stuff was defined, where stuff was called, its docstrings, its types, which import brought it into scope and so on
2018-02-14 13:43:28 wz1000 info it doesn't need to know for using the module as a dependency
2018-02-14 13:43:33 @hvr wz1000: ...and TH makes this even more fun :)
2018-02-14 13:44:00 @hvr (even more than CPP), as it messes up the source-loc <-> use-site relationships
2018-02-14 13:44:07 wz1000 yes
2018-02-14 13:44:12 wz1000 that is a pain
2018-02-14 13:45:00 @hvr (but which I think for "hi haddock" isn't much of a concern, as this would only affect hyperlinked source hihglighting)
2018-02-14 13:45:59 steshaw[m] Yeah, getting GHC to compile faster is important to me. Writing how giant metadata file would work against that.
2018-02-14 13:46:00 @hvr (...and we deliberately eschew the source highlighting issue cause its a can of worms... =)
2018-02-14 13:46:39 @hvr steshaw[m]: fwiw, for hi haddock I don't think we'd need such a "huge" amount of new meta-data
2018-02-14 13:47:02 steshaw[m] Having that separate file sounds like a good idea.
2018-02-14 13:47:11 @hvr but it quickly gets more if you need to dump out fine-grained src-loc associated metadata
2018-02-14 13:47:47 @hvr for use-sites rather than just def-sites
2018-02-14 13:47:49 steshaw[m] Is "hi haddock" the metadata that would go into .hi files to support haddock?
2018-02-14 13:48:04 alexbiehl basically, yes
2018-02-14 13:48:31 @hvr steshaw[m]: https://summer.haskell.org/ideas.html#hi-haddock
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment