First let's analyze Clang, GCC, and Rust and see what we can learn from them.
Diagnostics contain entries of this format (Rust calls them "spans"):
LOCATION: TYPE: MESSAGE
MESSAGE_CONTINUED
SNIPPET
LOCATION is generally of the form PATH:ROW:COL
where PATH is the path to the
offending source code, ROW is the line number (1-indexed), and COL is the
column number (1-indexed). Rust not only shows the starting location but also
the ending location.
Here, TYPE is either error
(red), warning
(magenta/magenta/yellow), note
(grey/cyan/green), or help
(cyan). Each TYPE has a distinctive color so the
user can easily see which ones are important. For warnings, magenta is probabl
A single diagnostic may contain several of these entries, but the first one is
always error
or warning
(primary entry), while the remaining ones (if any)
can only be note
or help
(auxiliary entries).
note
entries bring attention to parts of the code that are further away from
the source of the error. GHC already does this to some extent, but the
formatting is ad hoc and differs from error to error. Moreover, GHC seems to
pretty-format long expressions in its own format. Personally I think it would
be better to print them as-is using notes.
help
is a Rust-specific entry that reminds the user of how to seek detailed
help on a specific error. This may be useful for newcomers, but I think would
probably be overkill to display this for every diagnostic like Rust does. If
GHC ever gets a similar feature and wants to make it newcomer-friendly, it
would be sufficient to print this exactly once per compilation.
The MESSAGE contains flowing text wrapped to around 80 characters. It may
contain fragments of code, which are colored distinctively from the text
(except Rust). The primary message may also suffixed with [ID]
, where ID is
some sort of identifier for this diagnostic (or class of diagnostics).
Clang/GCC use a string to identify the class of diagnostic (e.g. -Wunused
),
which can in turn be used to silence the diagnostic if needed. Rust appears
to have a unique
numeric ID
for each diagnostic.
The SNIPPET contains code pulled directly from the source file. It may also contain underlining to indicate which parts of the line are relevant to the problem (Clang/GCC calls them "caret diagnostics").
Underlining: Clang has two different forms: ^
and ~~~~~
; GCC only has ^
;
Rust uses ^~~~~
. Unless GHC has a reason to distinguish between different
kinds of highlights, Rust's approach is probably the best as it's easier to
see than a single caret.
Both Clang and Rust show a one-line summary of how many errors and warnings were generated by the code. (I'm not sure how useful this would be.)
Adjustable colors. Since GCC supports GCC_COLORS
, GHC can also support
GHC_COLORS
using a syntax such as
error=01;31:warning=01;35
Interestingly, Clang hard-codes the colors so they can't be changed without recompiling (or hacking the binary). (I have no info about this on Rust, but I suspect it's similar.)
Now let's look at our goals for GHC.
In the grand scheme of things, it would be useful to produce diagnostics in a structured format that can be easily parsed without ambiguity. However, the format needs to be flexible enough to allow new errors to be added, so a sum type would probably be too rigid for this.
The goal is not unlike that of HTML, where the format is structured but the ultimately the information is primarily meant for consumption by humans. Therefore, it would be nice to borrow some of the good ideas.
Source code is worth a thousand words, so here's a rough sketch of how diagnostics could be encoded:
data Diagnostic
= Diagnostic
DiagnosticID
(Entry PrimaryEntryType)
(Entry AuxiliaryEntryType)
data DiagnosticID
= DiagnosticID String
data Entry type
= Entry
Location
type
Message
data Location
= Location
FilePath
Int Int -- row range
Int Int -- col range
| Module
data PrimaryEntryType = Error | Warning
data AuxiliaryEntryType = Note
data Message
= Message
[MessageElem]
data MessageElem
= Text String
| Code String
We can follow the same format as Clang/GCC:
PATH:ROW:COL: TYPE: MESSAGE
MESSAGE_CONTINUED
SNIPPET
Colors are enabled if GHC can detect that the terminal is a TTY that supports
colors. Otherwise they are off. If -fcolor-diagnostics[=always]
then it is
turned on always. If -fcolor-diagnostics=never
or -fno-color-diagnostics
then it is disabled. The default is -fcolor-diagnostics=auto
, which is
implied. (Note that
Location ranges are helpful, but for human readers they add a lot of visual
noise. There should be some sort of flag to turn ranges on or off. If the
flag is turned out, locations are printed in the format R-S:C-D
, even if R == S
or C == D
. This way parsers won't have to handle special cases.
Would it be useful to also have a 3rd color for deferred errors?
Here, we focus on the actual content of diagnostics. I have a few idea of my own, but if you have any feel free to add them to this list.
They don't provide anything that the user doesn't already know from the source location.
In the second argument of ‘(.)’, namely ‘_3’
In the first argument of ‘use’, namely ‘(ix viDest . _3)’
In the second argument of ‘(<$>)’, namely ‘use (ix viDest . _3)’
Frequently I see something like "(rigid, skolem)
" in the error message,
which looks like a 2-tuple of identifiers. If the intent is to indicate that
they are both synonyms, why not write it as rigid/skolem
?
GHC's errors are quite understandable once you've familiarized with the jargon. However, one of the main reasons why type errors are still frustrating is that even though I understand the error I don't actually know why it happens, because of a phenomenon that can be best described as: "it all made sense in my head when I wrote it".
Generally, it's because of something that I failed to consider when I unified the types in my head. So it would be helpful for GHC to point out where the flaw in my logic is.
Obviously one can't expect a compiler to guess what the logic error is. In fact, it shouldn't. Rather it should present (in an order fashion) of how it inferred a given type.
Here's an example of what I mean:
Diagram.hs:335:41:
Couldn't match type ‘Int’ with ‘t0 (Link 'Incoming e)’
arising from a functional dependency between:
constraint ‘Field3
(v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
(v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
Int
Int’
arising from a use of ‘_3’
instance ‘Field3 (a, b, c) (a, b, c') c c'’ at <no location info>
Relevant bindings include
e :: e (bound at Diagram.hs:334:52)
loSrc :: [(e, VertexIx)] (bound at Diagram.hs:333:15)
lo :: Vector (Link 'Outgoing e)
(bound at Diagram.hs:332:49)
reformedNodes :: Seq
(v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
(bound at Diagram.hs:330:5)
nodes0 :: Seq (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
(bound at Diagram.hs:346:5)
nodes :: DiagramT Vector e v (bound at Diagram.hs:321:55)
diagram :: Diagram e v (bound at Diagram.hs:321:38)
normalizeDiagramEdges :: Coeffed (Diagram e v)
-> Coeffed (Diagram e v)
(bound at Diagram.hs:321:1)
In the second argument of ‘(.)’, namely ‘_3’
In the first argument of ‘use’, namely ‘(ix viDest . _3)’
In the second argument of ‘(<$>)’, namely ‘use (ix viDest . _3)’
Here's a mockup of what would be more useful (colored version):
Diagram.hs:335:41: error:
cannot match Int with t0 (Link 'Incoming e) arising from a functional
dependency between constraint and instance, where the constraint is
Field3 (…, …, t0 (Link 'Incoming e)) … Int …
arising from a use of
_3 :: Field3 s t a b => …
EdgeIx <$> use (ix viDest . _3) <* do
^^
[Control.Lens.Tuple]: note:
instance Field3 (a, b, c) (a, b, c') c c'
^ ^
Diagram.hs:335:13: note:
Int arises from a use of EdgeIx :: Int -> …
EdgeIx <$> use (ix viDest . _3) <* do
^^^^^^
Diagram.hs:331:20: note:
t0 (Link 'Incoming e) arises from a use of
nodes0 :: Seq (…, …, t0 (Link 'Incoming e))
(`execState` nodes0) $
^^^^^^