Created
May 4, 2018 07:29
-
-
Save wz1000/46bb4b2121f0911bbbf4d4743fafaba8 to your computer and use it in GitHub Desktop.
.hie file discussion on #ghc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2018-02-13 23:57:02 wz1000 alanz, alexbiehl, mpickering: https://gist.github.com/wz1000/81e0bb720237e8ae6193c7ac2d28c913 | |
2018-02-13 23:58:15 alanz wz1000, I like what you are aiming for | |
2018-02-13 23:58:22 alanz well, all of you | |
2018-02-13 23:58:23 mpickering wz1000: I just skimmed it now but if I were you I would write down something very concrete (answer all the questions at the end) | |
2018-02-13 23:58:33 alanz I guess I should actually pay some attention | |
2018-02-13 23:59:01 mpickering if you don't write down something very concrete then making progress is much more difficult as people love discussing things until they die | |
2018-02-13 23:59:47 wz1000 yeah, this is just some initial brainstorming | |
2018-02-14 00:00:03 c_wraith I'd love to be able to get more information out of ghc | |
2018-02-14 00:00:36 alanz I think there are also two parts to it, getting the info out, and keeping it available, as in storing it on disk or some such | |
2018-02-14 00:01:09 alanz Because expecting a big project's detail to be in RAM won't work | |
2018-02-14 00:01:41 mpickering also I don't think reading a .hi file is trivial currently? | |
2018-02-14 00:02:18 mpickering It's a good idea but needs a clear vision about what to include and how to include it | |
2018-02-14 00:02:46 mpickering you could also flesh out exactly how you imagine tooling interacting with these files | |
2018-02-14 00:03:08 mpickering If we had source plugins, you could experiment with a format and then later merge it into GHC if it proved to be successful | |
2018-02-14 00:08:46 alanz mpickering, what do you mean by that last statement? Experiment with extracting info and storing it somehow, or experiment with amending source? | |
2018-02-14 00:09:02 alanz I am in favour of the former, nervous of the latter | |
2018-02-14 00:09:31 mpickering someone writes a source plugin which extracts the kind of information that wz1000 is talking about, into their own format | |
2018-02-14 00:09:49 mpickering which then tools like haddock and so on consume | |
2018-02-14 00:10:09 alanz ok, that sounds good | |
2018-02-14 00:10:11 mpickering so it is decoupled from GHC development and can obey a different release cycle | |
2018-02-14 00:10:19 mpickering can be iterated on rapidly and so on | |
2018-02-14 00:10:30 alanz yes, and also can have alternative implementations, doing different things | |
2018-02-14 00:11:15 alanz Once TTG is fully into place we should be able to more easily transmit info thorugh the AST too | |
2018-02-14 00:11:22 alanz as an alternative | |
2018-02-14 00:21:32 wz1000 but haddock is tightly integrated into ghc itself | |
2018-02-14 00:22:58 wz1000 so if this thing has to be the source of info for haddock it has to move in sync with it | |
2018-02-14 00:25:19 alanz Well, maybe the way to go is to define a plugin architecture as is being proposed, but also include a DB API, which can potentially write info into a .hi file | |
2018-02-14 00:25:35 alanz So it can be used by haddock or any other | |
2018-02-14 00:26:04 alanz especially as haddock is aiming to be more loosely coupled, for the same reasons. i.e. to iterate faster | |
2018-02-14 00:26:58 alanz and each plugin gets a "directory" in the db. | |
2018-02-14 00:27:05 alanz how ever it gets structured | |
2018-02-14 00:27:38 alanz wz1000, a bit like the plugin data stuff we have in hie, but potentially persistent | |
2018-02-14 00:46:39 mpickering one obvious comment wz1000 is does this not require that we must foresee that we will want to run a certain tool on our program so that we can invoke the plugin when we compile the program? | |
2018-02-14 00:47:16 mpickering I suppose that is already true for how haddock and other tools work today? | |
2018-02-14 00:50:46 wz1000 mpickering: so you are suggesting every tool(hie, haddock etc.) have its own plugin to build its db? | |
2018-02-14 00:59:40 @hvr wz1000: the "a global database of cross referenced data" item seems to deviate from a KISS principle | |
2018-02-14 01:00:13 @hvr wz1000: i.e. if there's already enough info in the .hi files to construct such a db, then GHC ought not do it | |
2018-02-14 01:00:47 @hvr wz1000: i.e. such a db would feel redundant | |
2018-02-14 01:01:19 wz1000 hvr: but stuff inside one module refers to stuff inside other modules | |
2018-02-14 01:01:34 wz1000 we need some way to sort that out | |
2018-02-14 01:02:05 @hvr wz1000: we just need a way to reconstruct the scope within a module | |
2018-02-14 01:02:05 wz1000 a unique identifier for each symbol in the context of one compilation | |
2018-02-14 01:02:42 @hvr wz1000: otherwise you'd be forced to actually parse e.g. the haddock format, which we do not want to do inside GHC | |
2018-02-14 01:03:33 wz1000 hvr: is the hyperlinked source generation done as a seperate step? | |
2018-02-14 01:03:48 wz1000 because I imagine that needs to know where everything is defined | |
2018-02-14 01:03:58 @hvr wz1000: yes, but not only that; if a haddock docstring refers to a name like 'map' | |
2018-02-14 01:04:17 @hvr wz1000: then resolving that name is currently done during the haddock step | |
2018-02-14 01:04:18 wz1000 yeah | |
2018-02-14 01:04:22 wz1000 ok | |
2018-02-14 01:04:58 @hvr "hi haddock" aims at keeping haddock and ghc reasonably decoupled | |
2018-02-14 01:05:57 wz1000 so haddock needs to parse and typecheck the source again anyway? | |
2018-02-14 01:06:26 @hvr there was a subtle hint about that in the "hi haddock" proposal | |
2018-02-14 01:06:48 @hvr insofar as the hyperlinked-source part may pose a challenge to tackle in future work | |
2018-02-14 01:07:27 @hvr but it's considered out of scope for now | |
2018-02-14 01:07:39 wz1000 so the info we need for hyperlinked source is the same as that we need in hie or other tools | |
2018-02-14 01:08:18 @hvr possibly; but that's too much for a GSOC | |
2018-02-14 01:08:20 @hvr imho | |
2018-02-14 01:08:21 wz1000 so my proposal is that we let ghc do the hard work of generating that while compiling(since it already knows about it) | |
2018-02-14 01:09:03 @hvr the "hi haddock" proposal is specifically scoped in a way that it's realistic to be done within the alloted time window | |
2018-02-14 01:09:19 wz1000 and then hie, and --hyperlinked-source can do their thing simply by reading the .hi file | |
2018-02-14 01:10:39 wz1000 yes, but it would be better to have some kind of larger, consistent vision that ties in things like HIE and other tooling | |
2018-02-14 01:11:24 @hvr sure, as long as perfect doesn't become the enemy of good :) | |
2018-02-14 01:11:34 @hvr i.e. we got only a couple weeks to get it done | |
2018-02-14 01:12:17 @hvr that's the thing I worry about ambitious plans for GSOCs | |
2018-02-14 01:13:09 wz1000 your proposal, if implemented, would also simpify hie a lot | |
2018-02-14 01:13:53 wz1000 the doc map is pretty much the only thing we need from haddock: https://github.com/haskell/haskell-ide-engine/blob/master/src/Haskell/Ide/Engine/Plugin/Haddock.hs#L131 | |
2018-02-14 01:14:53 @hvr fwiw, I've got additional hidden use-cases for the stuff that "hi haddock" would bring us | |
2018-02-14 01:15:59 wz1000 hvr: btw, is there any better way to implement these(lookupDocHtmlForModule, lookupSrcHtmlForModule): https://github.com/haskell/haskell-ide-engine/blob/master/src/Haskell/Ide/Engine/Plugin/Haddock.hs#L42 | |
2018-02-14 01:16:11 @hvr and I believe we'd likely see more use-cases emerge once we have more information available in .hi files | |
2018-02-14 01:17:43 @hvr wz1000: maybe; haddock is slowly migrating to exposing more meta-data via .json files | |
2018-02-14 01:18:18 @hvr wz1000: initially for the quickjump stuff, but there's more to come in that area to allow for richer doc browsing | |
2018-02-14 01:19:06 @hvr but the plans haven't been solidified yet, it's all still a bit in flux | |
2018-02-14 01:23:22 wz1000 hvr: that is exactly the kind of info we need for hie - for each symbol(in the source and in the haddock), where is this defined, which import brought it into scope, where else is it used? | |
2018-02-14 01:24:05 wz1000 and of course, what is the docstring for it? | |
2018-02-14 01:24:16 wz1000 and the thing is - ghc already knows all of this | |
2018-02-14 01:24:35 wz1000 so it would be nice if there was a single source for this | |
2018-02-14 01:24:46 wz1000 that haddock and HIE and whatever could query | |
2018-02-14 01:29:38 @hvr ... the .hi files :) | |
2018-02-14 01:30:33 @hvr (and then there was also the long-term idea to use cbor in .hi files...) | |
2018-02-14 01:34:24 mpickering but.. I don't think it is trivial to read a .hi file? | |
2018-02-14 01:37:01 mpickering It seems cleaner to me to have .hi files be "exactly what GHC needs" and ".hie" files or "extended interface files" for this kind of additional information | |
2018-02-14 01:39:19 mpickering especially separating .hie files from normal GHC development seems like it could be very desirable to me | |
2018-02-14 01:43:03 @hvr and GHC would generate those .hie files? | |
2018-02-14 01:43:08 @hvr reliably? | |
2018-02-14 01:44:30 mpickering via a source plugin potentially | |
2018-02-14 01:44:31 @hvr w/o requiring to install some add-on tooling | |
2018-02-14 01:45:05 mpickering add-on tooling = haskell package | |
2018-02-14 01:45:07 mpickering in my mind | |
2018-02-14 01:45:11 @hvr oh dear | |
2018-02-14 01:45:25 @hvr I'm pessimistic about that one ;) | |
2018-02-14 01:45:37 alanz mpickering, that sounds like my database API | |
2018-02-14 01:46:05 alanz i.e. some kind of infrastructure provided by GHC that can be used to store and recall info for a plugin | |
2018-02-14 01:46:05 @hvr it may sound good on paper, but I will only be convinced once it exists ;-) | |
2018-02-14 01:46:14 alanz no problem | |
2018-02-14 01:46:29 alanz plenty of people ready to scratch that itch | |
2018-02-14 01:47:29 @hvr mpickering: also, are source plugins even portable? | |
2018-02-14 01:48:13 @hvr mpickering: i.e. are those available on all platforms (specifically those that don't support TH nor dynlinking via the system linker?) | |
2018-02-14 01:49:32 @hvr (the compiler plugins I know weren't) | |
2018-02-14 01:53:27 wz1000 hvr: for some kinds of analysis(particularly the kind mpickering was talking about), we might need the entire Typechecked AST | |
2018-02-14 01:53:45 wz1000 so that could potentially blow up the size of hi files | |
2018-02-14 01:55:12 mpickering portability is a problem, that is a good point | |
2018-02-14 01:55:50 @hvr wz1000: thoughtpolice iirc had either plans or a proof of concept of compressing .hi files | |
2018-02-14 01:56:03 mpickering I remember a phab patch for that | |
2018-02-14 01:56:21 @hvr right... but I don't remember what became of it | |
2018-02-14 01:56:35 wz1000 wouldn't something standard like gzip do? | |
2018-02-14 01:56:45 @hvr wz1000: that's so 80ies =) | |
2018-02-14 01:57:07 @hvr today it's more about things like lz4 or lzop | |
2018-02-14 01:57:19 @hvr for these kind of on-the-fly compression | |
2018-02-14 02:00:26 alanz Given the performance requirements of batch compilation, it is probably better to have a separate file for this kind of thing | |
2018-02-14 02:00:40 alanz Which gets spat out if/when there is a plugin | |
2018-02-14 02:00:59 @hvr alanz: yeah, but I don't think the plugin plan is possible at all | |
2018-02-14 02:01:00 alanz and leave the .hi files for their current usage | |
2018-02-14 02:01:08 alanz why not? | |
2018-02-14 02:01:14 @hvr alanz: given that it would be unavailable on some platforms | |
2018-02-14 02:01:34 alanz why would that be the case? GHC is available? | |
2018-02-14 02:01:36 @hvr and thus would effectively make haddock unavailable | |
2018-02-14 02:01:43 @hvr alanz: because plugins don't work everywhere | |
2018-02-14 02:01:58 alanz in what scenarios? cross-compilation? | |
2018-02-14 02:02:05 @hvr just like TH doesn't work everywhere or GHCi doesn't | |
2018-02-14 02:02:49 alanz well, IDE targeted stuff is inherently self-limiting, and would exclude those odd cases | |
2018-02-14 02:03:17 alanz but imo that is more of an argument for a separate file / db then | |
2018-02-14 02:22:36 dfeuer Will anything go wrong if I unsafeCoerce a function from (a -> a -> a) to the type (a -> a -> (# a #))? | |
2018-02-14 02:23:20 dfeuer Some testing suggests this is okay, but it's not quite officially sanctioned. | |
2018-02-14 02:24:36 dfeuer Similarly, it seems like it should be okay to unsafeCoerce from ((# a #) -> r) to (a -> r).... | |
2018-02-14 02:31:52 mpickering wz1000: https://github.com/nboldi/heed | |
2018-02-14 02:33:02 mpickering looks exactly like you were suggesting | |
2018-02-14 02:38:30 mpickering looks like a frontend plugin currently though | |
2018-02-14 02:38:33 mpickering but would be trivial to change | |
2018-02-14 02:49:03 @hvr alanz: I don't mind much if it's part of .hi or there's a 2nd .hie file; my main objection is to rely on a mechanism that isn't portable and would thus make it unavailable to haddock | |
2018-02-14 02:49:38 @hvr thus falling short of the goals set by the "hi haddock" proposal | |
2018-02-14 02:49:46 @hvr also, | |
2018-02-14 02:49:59 alanz well, maybe both these things need to happen | |
2018-02-14 02:50:18 alanz how does hi haddock manage to do it, that a plugin would not? | |
2018-02-14 02:50:36 @hvr alanz: do what exactly? | |
2018-02-14 02:50:58 @hvr also, we still wouldn't be able to decouple the plugin from ghc devel | |
2018-02-14 02:51:05 @hvr it'd still be bundled w/ ghc anyway | |
2018-02-14 02:51:39 alanz well, hi haddock is able to work without the problems of a plugin. How? | |
2018-02-14 02:51:50 * alanz has not actually read the proposal | |
2018-02-14 02:52:08 @hvr alanz: the proposal is about having GHC natively augment the .hi file, w/o any plugin | |
2018-02-14 02:52:31 alanz ok, so wherever GHC is, that thing is | |
2018-02-14 02:52:39 @hvr thus allow haddock and ghci to benefit from that | |
2018-02-14 02:52:55 @hvr and every other tool that can access .hi files | |
2018-02-14 02:52:57 alanz And what makes you think that a plugin would not be able to do the same thing? if shipped as part of GHC? | |
2018-02-14 02:53:19 @hvr alanz: because a plugin would require facilities which aren't available everywhere | |
2018-02-14 02:53:30 @hvr the same ones you'd need for e.g. TH to work | |
2018-02-14 02:54:11 @hvr it's not about using the plugin API, it's about using the linking mechanism that causes problems | |
2018-02-14 02:54:16 alanz cross-compilation? What scenarios have that? | |
2018-02-14 02:54:22 @hvr for starters, AIX | |
2018-02-14 02:54:40 @hvr that's the base-line | |
2018-02-14 02:54:57 @hvr actually AIX is slightly above the baseline, as it's a stage2 compiler | |
2018-02-14 02:55:00 alanz so the problem is putting it in the hi file? | |
2018-02-14 02:55:02 @hvr and yet doesn't have TH/interp support | |
2018-02-14 02:55:27 @hvr no, the problem is relying on a plugin that needs to be linked into a `ghc` process dynamically | |
2018-02-14 02:55:40 @hvr similiar to how TH code works | |
2018-02-14 02:56:05 @hvr i.e. the "dynamic" part is the one that's not portable | |
2018-02-14 02:56:44 @hvr so, if instead we talk about a statically linked plugin, that is part of GHC's default distro | |
2018-02-14 02:57:02 @hvr then it would work; but then calling it a plugin is a weird thing to say | |
2018-02-14 02:57:07 alanz ok. So maybe define the plugin API so it can be static or dynamic, and use haddock on AIX as static | |
2018-02-14 02:57:21 alanz in which case it is not so weird | |
2018-02-14 02:57:37 @hvr yeah, but then you can just as well link it statically all the time to avoid the overhead | |
2018-02-14 02:58:04 @hvr there's no benefit imho to go the trouble of dynamic plugins which are yet another moving piece & point of failure | |
2018-02-14 02:58:22 @hvr if there's at least one platform that requires that essential plugin to be statically linked anyway | |
2018-02-14 02:58:23 alanz except that it opens the door for *other* plugins, to be used in architectures that are not as constrained | |
2018-02-14 02:58:27 alanz which is most of them | |
2018-02-14 02:58:45 @hvr well, *other* plugins are an orthognal concern anyway | |
2018-02-14 02:59:05 wz1000 also, i think the recompilation checker doens't work with plugins | |
2018-02-14 02:59:06 alanz well, my point is to possible use common interfaces for them | |
2018-02-14 02:59:33 wz1000 so that means ides using plugins to compile haskell take a massive penalty | |
2018-02-14 02:59:41 alanz my understanding is that the recompilation checker is some piece of voodoo magic that seriously needs some attention | |
2018-02-14 02:59:53 @hvr this seems like a rabbit hole in the making ;) | |
2018-02-14 03:00:12 alanz I know it does not honour the flags you pass in to force recompilation if you need to | |
2018-02-14 03:00:42 alanz GHC as a whole is a rabbit hole. Especially the scaffolding/scheduling stuff | |
2018-02-14 03:01:03 alanz Built up over 20 years, to cope with all the weird corner/use cases out there | |
2018-02-14 03:01:04 @hvr I see there are multiple tasks emerging here | |
2018-02-14 03:01:10 alanz I agree | |
2018-02-14 03:01:25 @hvr one would be looking at the plugin API to address its shortcomings | |
2018-02-14 03:01:47 @hvr so that GHC can become more modularised in theory | |
2018-02-14 03:01:57 @hvr w/o regressions | |
2018-02-14 03:03:02 alanz yes. And more modularised is an important goal, which enables a lot of other things | |
2018-02-14 03:03:09 @hvr and then the question is, whether all other tasks which could somehow be expressed in terms of a more perfect plugin API should be held back until the API has become adequate enough | |
2018-02-14 03:03:39 @hvr i.e. whether to place a moratorium on extensions until the plugin API has been refactored/reengineered | |
2018-02-14 03:04:05 alanz I am in favour of going ahead, and being prepared to refactor based on actual experience | |
2018-02-14 03:04:07 @hvr or wether it's ok to go the old-fashioned way in the meantime | |
2018-02-14 03:04:12 alanz But I am an empiricist | |
2018-02-14 03:05:04 @hvr (and accept the future cost of porting it over to the new API once it's available) | |
2018-02-14 03:05:30 wz1000 I think this stuff can live in mainline ghc | |
2018-02-14 03:05:40 wz1000 If clang can do it, why not ghc? | |
2018-02-14 03:05:41 alanz the hi haddock stuff? | |
2018-02-14 03:06:06 alanz I agree. As far as I am concerned IDE support needs to be right in there, as a first class citizen | |
2018-02-14 03:06:13 wz1000 yes, but along with extra info to support more stuff | |
2018-02-14 03:06:13 alanz like in Roslyn | |
2018-02-14 03:06:19 alanz yes | |
2018-02-14 03:06:52 alanz But I imagine we end up with some standard plugins, and the ability to bring in experimental ones until they become standard | |
2018-02-14 03:06:58 alanz much like in hie | |
2018-02-14 03:07:12 @hvr that's an admirable goal obviously | |
2018-02-14 03:07:57 @hvr also, I'm not sure if cabal already provides the necessary infrastructure for making plugins a good citizen | |
2018-02-14 03:08:24 @hvr i.e. ability to specify required plugins in the .cabal file etc; I remember discussions about it, but I don't remember them bearing fruits yet | |
2018-02-14 03:08:38 wz1000 I'm proposing that ghc dumps a fixed set of data: docstrings, definitions, references, imported by, and the typechecked ast | |
2018-02-14 03:08:46 alanz I see IDE type plugins being orthogonal to the cabal file | |
2018-02-14 03:08:57 @hvr alanz: oh, those kind of plugins | |
2018-02-14 03:09:01 alanz they belong to the tooling | |
2018-02-14 03:09:11 @hvr alanz: I was thinking of plugins which transform the AST | |
2018-02-14 03:09:27 alanz no, we are explicitly not talking about those | |
2018-02-14 03:09:30 @hvr like e.g. doing some clever constant folding | |
2018-02-14 03:09:33 alanz Or at least I'm not | |
2018-02-14 03:09:52 alanz I think that is a whole different kettle of fish | |
2018-02-14 03:10:08 @hvr is that a different plugin API? | |
2018-02-14 03:10:35 alanz There is one currently on ghc-proposals | |
2018-02-14 03:10:49 alanz https://github.com/ghc-proposals/ghc-proposals/pull/107#issuecomment-362334846 | |
2018-02-14 03:11:36 mpickering I agree with hvr that using a plugin would not be suitable for haddock which needs to work everywhere so it doesn't really impact his proposal | |
2018-02-14 03:12:02 mpickering but experimental support can be achieved by a plugin, which works on all major platforms, and then folded into GHC once it is stable and if it is desired | |
2018-02-14 03:12:59 mpickering wz1000: https://github.com/ghc-proposals/ghc-proposals/pull/108 | |
2018-02-14 03:13:46 alanz yes | |
2018-02-14 03:14:15 alanz mpickering, lots of good ideas floating around | |
2018-02-14 03:15:56 mpickering The fact remains that .hi files are not intended to be read by anyone but GHC itself, which makes them difficult to work with for anyone else. | |
2018-02-14 04:01:54 @hvr mpickering: well, it's enough if lib:ghc can read them | |
2018-02-14 04:01:59 @hvr haddock already links against lib:ghc | |
2018-02-14 04:02:14 @hvr that's not something we intend to change necessarily | |
2018-02-14 04:02:47 @hvr so it doesn't really conflict with `.hi files are not intended to be read by anyone but GHC itself` | |
2018-02-14 04:03:43 @hvr i.e. I don't care about a portable representation; just give me an API which breaks w/ every major GHC release, and I'm happy | |
2018-02-14 04:05:19 mpickering The GHC API is sufficient.. I can accept that but this is about making it the easiest possible rather than.. possible | |
2018-02-14 04:06:06 @hvr well, you still don't want everyone to reinvent their own parsers for whatever augmetned info format we come up with | |
2018-02-14 04:06:31 @hvr so you'd still end up with some common library/API for reading that meta-data into convenient Haskell types | |
2018-02-14 04:06:40 @hvr and those types will likely be closely related to lib:ghc types | |
2018-02-14 04:06:49 @hvr so you'd still link against lib:ghc's API in some way | |
2018-02-14 04:07:03 @hvr -> just throw it into lib:ghc already | |
2018-02-14 04:07:20 mpickering Not necessarily | |
2018-02-14 04:10:45 mpickering It's a space which needs exploring | |
2018-02-14 05:24:16 angerman o/ | |
2018-02-14 05:26:03 angerman mpickering: keep in mind that there is this long standing rumor that we could potentially improve ghcs performance by encoding hi files in a way that is faster to deserialize, which of course is much easier without any official format that other tools expect to be able to read. | |
2018-02-14 05:27:47 mpickering ok but I find the premise implausible :P | |
2018-02-14 05:30:01 @hvr angerman: hence why I brought up cbor as well as lzo(p)/lz4 | |
2018-02-14 05:30:36 @hvr both have the potential to accelerate reading .hi files | |
2018-02-14 05:31:17 angerman mpickering: have some library to read them :-) unless you want to go and write in a different language that should be fine no? | |
2018-02-14 05:31:32 angerman hvr: mmap all the way ;-) | |
2018-02-14 05:36:41 @hvr angerman: not portable :) | |
2018-02-14 06:19:02 angerman Hvr: :p | |
2018-02-14 06:19:22 angerman hvr: fast path everywhere except for aix ;-) | |
2018-02-14 12:17:23 steshaw[m] angerman: hvr: sounds like a job for https://hackage.haskell.org/package/compact ? | |
2018-02-14 13:14:51 @hvr steshaw[m]: `compact` is about GC ... serialisation is a separate concern for which you'd e.g. use something like CBOR | |
2018-02-14 13:15:42 @hvr steshaw[m]: also I don't think `compact` uses an efficient serialisation | |
2018-02-14 13:15:56 steshaw[m] I thought `compact` was kind of like flatbuffers (or capnproto) which doesn't require deserialisation | |
2018-02-14 13:16:17 @hvr well, yeah, if see `compact` as a more portable `mmap`, yes | |
2018-02-14 13:16:37 @hvr implying that you want to dump internal structures as-is to the disk | |
2018-02-14 13:16:43 @hvr w/o any optimisation | |
2018-02-14 13:16:54 @hvr but I don't think that's a good idea | |
2018-02-14 13:16:54 steshaw[m] No deserialisation is much faster than some deserialisation | |
2018-02-14 13:17:19 steshaw[m] There is no need for optimisation if there is no deserialisation. | |
2018-02-14 13:17:33 steshaw[m] Why isn't it a good idea? | |
2018-02-14 13:17:49 @hvr if we store more information in the .hi files, we also need to make sure to do that in a reasonably efficient way than simply dumping out whatever bloated structure we came up with; i.e. we need to model the serialisation | |
2018-02-14 13:18:12 @hvr so, compression imho becomes a must | |
2018-02-14 13:18:34 @hvr both by chosing a good representation as well as employing an additional compression layer | |
2018-02-14 13:18:53 steshaw[m] You can compress a memory buffer if you want. | |
2018-02-14 13:19:12 @hvr ...which only addresses half of my request :) | |
2018-02-14 13:19:29 @hvr it doesn't take care to fine-tune the representation | |
2018-02-14 13:19:36 steshaw[m] I think you're saying that with `compact`, it's perhaps easy to have stuff included in the .hi file that you didn't expect to end up there. | |
2018-02-14 13:19:56 @hvr not only that, but I don't trust `compact`'s serialisation to be in fact... "compact" | |
2018-02-14 13:20:07 @hvr it seems more like generic auto-derived serialisation | |
2018-02-14 13:20:21 steshaw[m] I see, I haven't tried it yet but it's something that I've dreamed of for many years :) | |
2018-02-14 13:20:25 @hvr w/o any concern for compactness | |
2018-02-14 13:20:40 steshaw[m] If it's not actually compact, can it be improved? | |
2018-02-14 13:20:52 @hvr probably, if we switch it to CBOR =) | |
2018-02-14 13:21:05 steshaw[m] :) | |
2018-02-14 13:21:26 @hvr (and/or apply encoding techniques from CBOR) | |
2018-02-14 13:21:38 steshaw[m] Well, doesn't CBOR require deserialisation to other structures? IMO that is a downside if so. | |
2018-02-14 13:21:54 @hvr what do you mean by that? | |
2018-02-14 13:22:12 steshaw[m] If not `compact`, then what about flatbuffers or capnproto? | |
2018-02-14 13:23:33 wz1000 hvr: that means we need ghc to depend on a cbor library | |
2018-02-14 13:23:37 steshaw[m] Well, if CBOR allows you to load the buffer (or mmap) it into memory and then access the contents (as if it were a database) then I am in favour of it. This is how flatbuffers works. | |
2018-02-14 13:23:49 @hvr I haven't looked at how flatbuffers or capnproto encode... | |
2018-02-14 13:24:07 @hvr wz1000: ...so? that was a bit of a hidden agenda of cborg ;-) | |
2018-02-14 13:24:23 steshaw[m] The `compact` seemed like flatbuffers but even better because it allowed you to use normal Haskell data structures. | |
2018-02-14 13:24:51 @hvr steshaw[m]: I haven't looked at how flatbuffers or capnproto specifically encode their data; but I'd expect them to be mindful about it, if they're a network protocol | |
2018-02-14 13:25:01 wz1000 I was under the impression that ghc was very conservative with what it depends on | |
2018-02-14 13:26:02 steshaw[m] hvr: neither are a network protocol. It's a format for a "flat" buffer. It allows fast use of the buffer (without deserialisation to other data structures). | |
2018-02-14 13:26:24 @hvr wz1000: yes, but cborg was supposed to be a faster, better, stronger replacement for `binary`, originally intended to be folded into `binary` proper: http://code.haskell.org/~duncan/binary-experiment/binary.pdf | |
2018-02-14 13:26:42 wz1000 ok | |
2018-02-14 13:26:43 @hvr wz1000: but it ended becoming a new library | |
2018-02-14 13:26:59 @hvr because changing `binary` in-place is messier | |
2018-02-14 13:28:08 wz1000 so I think we are in agreement that we need to put more info in the .hi files, atleast enough so that haddock doesn't need to compile the source again | |
2018-02-14 13:28:14 @hvr and cborg does change the encoding of `binary`; and cborg actually tries to be more principled about separating the concern of "just serialise a data structure" vs "encode a binary format" | |
2018-02-14 13:29:11 @hvr fwiw, replacing binary w/ `cborg` in GHC and evaluating its performance before/after would be a GSOC project of its own | |
2018-02-14 13:29:44 steshaw[m] Not having to decode the network buffer is an old trick from network programming. I vaguely recall that BEA Tuxedo had such a format so that proxies could quickly look into the message and forward it without deserialising it (i.e. the proxy can peek directly at the network buffer to do it's work). | |
2018-02-14 13:30:11 @hvr I've written enough network code in C8x to remember :) | |
2018-02-14 13:30:37 @hvr but network code is concerned about zero-copy | |
2018-02-14 13:30:38 steshaw[m] C8x? | |
2018-02-14 13:30:42 @hvr C89 | |
2018-02-14 13:30:54 steshaw[m] Oh, right :) | |
2018-02-14 13:30:55 @hvr or rather gnu98, before C99 was a thing | |
2018-02-14 13:31:36 @hvr also, in network protocols you still care about compact representations, | |
2018-02-14 13:31:48 steshaw[m] Yes, zero-copy. It is nice. It keeps proxy-type server having low memory overhead. | |
2018-02-14 13:32:03 @hvr which for servers which need to handle 100k connections is essential | |
2018-02-14 13:32:48 steshaw[m] Could the `cborg` library operate in a similar way? i.e. to flatbuffers/capnproto? | |
2018-02-14 13:32:49 @hvr when you design network protocols you try to fit essential data into MTUs | |
2018-02-14 13:33:15 @hvr to be able to access the essential information ideally asap; and it helps if the data makes it in the first package | |
2018-02-14 13:34:19 @hvr steshaw[m]: cborg is a heavily optimised streaming format; you'd still have to traverse the memory buffer in order | |
2018-02-14 13:35:32 steshaw[m] True, but the BEA Tuxedo thing wasn't really that type of low-level protocol. The kind of proxy I'm thinking of was called a content something-or-other proxy. It is able to look at the entire message (not just the header), to make it's decisions. | |
2018-02-14 13:36:09 wz1000 hvr: you had mentioned "rich doc browsing". is anything written down somewhere? | |
2018-02-14 13:36:26 @hvr at the end of the day, I don't care how we do it, as long as we manage to keep the serialisation bloat to a minimum (cause w/ cabal new-buidl we have *lots* of packages; I tend to have easily 10k and more package instances in my ~/.store; so I do care a lot about keeping `.hi` files small) | |
2018-02-14 13:37:03 @hvr wz1000: no, and I also said that the plans are still rather in flux/vague | |
2018-02-14 13:37:19 @hvr wz1000: i.e. still in brainstorming phase | |
2018-02-14 13:37:29 steshaw[m] hvr: is there a lot of bloat in the .hi file atm. I remember this was a problem in the Idris universe. | |
2018-02-14 13:38:17 @hvr fwiw, I don't mind trading deserialising performance for additional meta-data if it gets us a more compact representation | |
2018-02-14 13:38:40 @hvr cause if that extra info is only needed for extra tools, GHC can skip reading it | |
2018-02-14 13:38:44 @hvr so it costs GHC nothing | |
2018-02-14 13:38:57 wz1000 yeah | |
2018-02-14 13:39:09 @hvr extreme case: have separate .hie files | |
2018-02-14 13:39:15 @hvr for stuff that GHC doesn't need | |
2018-02-14 13:39:38 @hvr and also have a flag to inhibit their generation | |
2018-02-14 13:39:50 wz1000 yes | |
2018-02-14 13:39:51 @hvr if for any reason you need to speed up GHCs compilation | |
2018-02-14 13:39:58 @hvr and know you don't need the .hie files | |
2018-02-14 13:39:58 wz1000 I think that is a workable approach | |
2018-02-14 13:40:17 @hvr like we e.g. the only-typecheck mode in GHC | |
2018-02-14 13:40:19 steshaw[m] Sounds like there are multiple problems at play. | |
2018-02-14 13:40:28 @hvr steshaw[m]: yeah, it's never simple =) | |
2018-02-14 13:40:49 @hvr lots of interoperating concerns/goals in balance w/ each other | |
2018-02-14 13:41:07 steshaw[m] What is the metadata required by other systems but not GHC? | |
2018-02-14 13:41:46 wz1000 ghc knows all of the data while compiling | |
2018-02-14 13:42:05 wz1000 but throws it out and doesn't include it in .hi files | |
2018-02-14 13:42:25 wz1000 Basically information about the source file itself | |
2018-02-14 13:42:57 wz1000 where stuff was defined, where stuff was called, its docstrings, its types, which import brought it into scope and so on | |
2018-02-14 13:43:28 wz1000 info it doesn't need to know for using the module as a dependency | |
2018-02-14 13:43:33 @hvr wz1000: ...and TH makes this even more fun :) | |
2018-02-14 13:44:00 @hvr (even more than CPP), as it messes up the source-loc <-> use-site relationships | |
2018-02-14 13:44:07 wz1000 yes | |
2018-02-14 13:44:12 wz1000 that is a pain | |
2018-02-14 13:45:00 @hvr (but which I think for "hi haddock" isn't much of a concern, as this would only affect hyperlinked source hihglighting) | |
2018-02-14 13:45:59 steshaw[m] Yeah, getting GHC to compile faster is important to me. Writing how giant metadata file would work against that. | |
2018-02-14 13:46:00 @hvr (...and we deliberately eschew the source highlighting issue cause its a can of worms... =) | |
2018-02-14 13:46:39 @hvr steshaw[m]: fwiw, for hi haddock I don't think we'd need such a "huge" amount of new meta-data | |
2018-02-14 13:47:02 steshaw[m] Having that separate file sounds like a good idea. | |
2018-02-14 13:47:11 @hvr but it quickly gets more if you need to dump out fine-grained src-loc associated metadata | |
2018-02-14 13:47:47 @hvr for use-sites rather than just def-sites | |
2018-02-14 13:47:49 steshaw[m] Is "hi haddock" the metadata that would go into .hi files to support haddock? | |
2018-02-14 13:48:04 alexbiehl basically, yes | |
2018-02-14 13:48:31 @hvr steshaw[m]: https://summer.haskell.org/ideas.html#hi-haddock |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment