Rufflewind/improved-ghc-diagnostics.md

## improved-ghc-diagnostics.md

      
    Raw
  

              improved-ghc-diagnostics.md
            
          
    Improved GHC diagnostics

Analysis of Clang, GCC, and Rust

First let's analyze Clang, GCC, and Rust and see what we can learn from them.
Diagnostic entries

Diagnostics contain entries of this format (Rust calls them "spans"):
LOCATION: TYPE: MESSAGE
  MESSAGE_CONTINUED
SNIPPET

LOCATION is generally of the form PATH:ROW:COL where PATH is the path to the
offending source code, ROW is the line number (1-indexed), and COL is the
column number (1-indexed).  Rust not only shows the starting location but also
the ending location.
Here, TYPE is either error (red), warning (magenta/magenta/yellow), note
(grey/cyan/green), or help (cyan).  Each TYPE has a distinctive color so the
user can easily see which ones are important.  For warnings, magenta is probabl
A single diagnostic may contain several of these entries, but the first one is
always error or warning (primary entry), while the remaining ones (if any)
can only be note or help (auxiliary entries).
note entries bring attention to parts of the code that are further away from
the source of the error.  GHC already does this to some extent, but the
formatting is ad hoc and differs from error to error.  Moreover, GHC seems to
pretty-format long expressions in its own format.  Personally I think it would
be better to print them as-is using notes.
help is a Rust-specific entry that reminds the user of how to seek detailed
help on a specific error.  This may be useful for newcomers, but I think would
probably be overkill to display this for every diagnostic like Rust does.  If
GHC ever gets a similar feature and wants to make it newcomer-friendly, it
would be sufficient to print this exactly once per compilation.
The MESSAGE contains flowing text wrapped to around 80 characters.  It may
contain fragments of code, which are colored distinctively from the text
(except Rust).  The primary message may also suffixed with [ID], where ID is
some sort of identifier for this diagnostic (or class of diagnostics).
Clang/GCC use a string to identify the class of diagnostic (e.g. -Wunused),
which can in turn be used to silence the diagnostic if needed.  Rust appears
to have a unique
numeric ID
for each diagnostic.
The SNIPPET contains code pulled directly from the source file.  It may also
contain underlining to indicate which parts of the line are relevant to the
problem (Clang/GCC calls them "caret diagnostics").
Underlining: Clang has two different forms: ^ and ~~~~~; GCC only has ^;
Rust uses ^~~~~.  Unless GHC has a reason to distinguish between different
kinds of highlights, Rust's approach is probably the best as it's easier to
see than a single caret.
Diagnostic summary

Both Clang and Rust show a one-line summary of how many errors and warnings
were generated by the code.  (I'm not sure how useful this would be.)
Adjustable colors

Adjustable colors.  Since GCC supports GCC_COLORS, GHC can also support
GHC_COLORS using a syntax such as
error=01;31:warning=01;35

Interestingly, Clang hard-codes the colors so they can't be changed without
recompiling (or hacking the binary).  (I have no info about this on Rust, but
I suspect it's similar.)
Semantic output

Now let's look at our goals for GHC.
In the grand scheme of things, it would be useful to produce diagnostics in a
structured format that can be easily parsed without ambiguity.  However, the
format needs to be flexible enough to allow new errors to be added, so a sum
type would probably be too rigid for this.
The goal is not unlike that of HTML, where the format is structured but the
ultimately the information is primarily meant for consumption by humans.
Therefore, it would be nice to borrow some of the good ideas.
Source code is worth a thousand words, so here's a rough sketch of how
diagnostics could be encoded:
data Diagnostic
  = Diagnostic
      DiagnosticID
      (Entry PrimaryEntryType)
      (Entry AuxiliaryEntryType)

data DiagnosticID
  = DiagnosticID String

data Entry type
  = Entry
      Location
      type
      Message

data Location
  = Location
      FilePath
      Int Int -- row range
      Int Int -- col range
  | Module

data PrimaryEntryType   = Error | Warning
data AuxiliaryEntryType = Note

data Message
  = Message
      [MessageElem]

data MessageElem
  = Text String
  | Code String

Colored diagnostics

We can follow the same format as Clang/GCC:
PATH:ROW:COL: TYPE: MESSAGE
  MESSAGE_CONTINUED
SNIPPET

Toggling

Colors are enabled if GHC can detect that the terminal is a TTY that supports
colors.  Otherwise they are off.  If -fcolor-diagnostics[=always] then it is
turned on always.  If -fcolor-diagnostics=never or -fno-color-diagnostics
then it is disabled.  The default is -fcolor-diagnostics=auto, which is
implied.  (Note that
Ranges

Location ranges are helpful, but for human readers they add a lot of visual
noise.  There should be some sort of flag to turn ranges on or off.  If the
flag is turned out, locations are printed in the format R-S:C-D, even if R == S or C == D.  This way parsers won't have to handle special cases.
Questions

Would it be useful to also have a 3rd color for deferred errors?
Improving diagnostic information

Here, we focus on the actual content of diagnostics.  I have a few idea of my
own, but if you have any feel free to add them to this list.
Discarding the "in n-th argument of"

They don't provide anything that the user doesn't already know from the source
location.
In the second argument of ‘(.)’, namely ‘_3’
In the first argument of ‘use’, namely ‘(ix viDest . _3)’
In the second argument of ‘(<$>)’, namely ‘use (ix viDest . _3)’

(rigid, skolem)

Frequently I see something like "(rigid, skolem)" in the error message,
which looks like a 2-tuple of identifiers.  If the intent is to indicate that
they are both synonyms, why not write it as rigid/skolem?
Type tracing

GHC's errors are quite understandable once you've familiarized with the
jargon.  However, one of the main reasons why type errors are still
frustrating is that even though I understand the error I don't actually know
why it happens, because of a phenomenon that can be best described as: "it
all made sense in my head when I wrote it".
Generally, it's because of something that I failed to consider when I unified
the types in my head.  So it would be helpful for GHC to point out where the
flaw in my logic is.
Obviously one can't expect a compiler to guess what the logic error is.  In
fact, it shouldn't.  Rather it should present (in an order fashion) of how
it inferred a given type.
Here's an example of what I mean:
Diagram.hs:335:41:
    Couldn't match type ‘Int’ with ‘t0 (Link 'Incoming e)’
    arising from a functional dependency between:
      constraint ‘Field3
                    (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
                    (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
                    Int
                    Int’
        arising from a use of ‘_3’
      instance ‘Field3 (a, b, c) (a, b, c') c c'’ at <no location info>
    Relevant bindings include
      e :: e (bound at Diagram.hs:334:52)
      loSrc :: [(e, VertexIx)] (bound at Diagram.hs:333:15)
      lo :: Vector (Link 'Outgoing e)
        (bound at Diagram.hs:332:49)
      reformedNodes :: Seq
                         (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
        (bound at Diagram.hs:330:5)
      nodes0 :: Seq (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
        (bound at Diagram.hs:346:5)
      nodes :: DiagramT Vector e v (bound at Diagram.hs:321:55)
      diagram :: Diagram e v (bound at Diagram.hs:321:38)
      normalizeDiagramEdges :: Coeffed (Diagram e v)
                               -> Coeffed (Diagram e v)
        (bound at Diagram.hs:321:1)
    In the second argument of ‘(.)’, namely ‘_3’
    In the first argument of ‘use’, namely ‘(ix viDest . _3)’
    In the second argument of ‘(<$>)’, namely ‘use (ix viDest . _3)’

Here's a mockup of what would be more useful (colored version):
Diagram.hs:335:41: error:
  cannot match Int with t0 (Link 'Incoming e) arising from a functional
  dependency between constraint and instance, where the constraint is
    Field3 (…, …, t0 (Link 'Incoming e)) … Int …
  arising from a use of
    _3 :: Field3 s t a b => …
            EdgeIx <$> use (ix viDest . _3) <* do
                                        ^^
[Control.Lens.Tuple]: note:
  instance Field3 (a, b, c) (a, b, c') c c'
                         ^             ^
Diagram.hs:335:13: note: 
  Int arises from a use of EdgeIx :: Int -> …
            EdgeIx <$> use (ix viDest . _3) <* do
            ^^^^^^
Diagram.hs:331:20: note:
  t0 (Link 'Incoming e) arises from a use of
  nodes0 :: Seq (…, …, t0 (Link 'Incoming e))
      (`execState` nodes0) $
                   ^^^^^^


## z_example.html

<style>
  body { color: white; background: #111; }
  .location { color: yellow; }
  .error
  , .underline.error
  { color: red; }
  .note
  , .underline.note
  { color: cyan; }
  .message { font-weight: bold; }
  .message code { color: grey; }
  .snippet { color: white; }
  .underline { color: lime !important; }
</style>
<body>
<pre><span class="location">Diagram.hs:335:41:</span> <span class="error">error:</span>
  <span class="message">cannot match <code>Int</code> with <code>t0 (Link 'Incoming e)</code> arising from a functional
  dependency between constraint and instance, where the constraint is
    <code>Field3 (…, …, t0 (Link 'Incoming e)) … Int …</code>
  arising from a use of
    <code>_3 :: Field3 s t a b => …</code></span>
<span class="snippet">            EdgeIx <$> use (ix viDest . _3) <* do</span>
<span class="error underline">                                        ^^</span>
<span class="location">[Control.Lens.Tuple]:</span> <span class="note">note:</span>
  <span class="message">instance <code>Field3 (a, b, c) (a, b, c') c c'</code></span>
<span class="note underline">                         ^             ^</span>
<span class="location">Diagram.hs:335:13:</span> <span class="note">note: </span>
  <span class="message"><code>Int</code> arises from a use of <code>EdgeIx :: Int -> …</code></span>
<span class="snippet">            EdgeIx <$> use (ix viDest . _3) <* do</span>
<span class="note underline">            ^^^^^^</span>
<span class="location">Diagram.hs:331:20:</span> <span class="note">note:</span>
  <span class="message"><code>t0 (Link 'Incoming e)</code> arises from a use of
  <code>nodes0 :: Seq (…, …, t0 (Link 'Incoming e))</code></span>
<span class="snippet">      (`execState` nodes0) $</span>
<span class="note underline">                   ^^^^^^</span>
</pre>
</body>

	<style>
	body { color: white; background: #111; }
	.location { color: yellow; }
	.error
	, .underline.error
	{ color: red; }
	.note
	, .underline.note
	{ color: cyan; }
	.message { font-weight: bold; }
	.message code { color: grey; }
	.snippet { color: white; }
	.underline { color: lime !important; }
	</style>
	<body>
	<pre><span class="location">Diagram.hs:335:41:</span> <span class="error">error:</span>
	<span class="message">cannot match <code>Int</code> with <code>t0 (Link 'Incoming e)</code> arising from a functional
	dependency between constraint and instance, where the constraint is
	<code>Field3 (…, …, t0 (Link 'Incoming e)) … Int …</code>
	arising from a use of
	<code>_3 :: Field3 s t a b => …</code></span>
	<span class="snippet"> EdgeIx <$> use (ix viDest . _3) <* do</span>
	<span class="error underline"> ^^</span>
	<span class="location">[Control.Lens.Tuple]:</span> <span class="note">note:</span>
	<span class="message">instance <code>Field3 (a, b, c) (a, b, c') c c'</code></span>
	<span class="note underline"> ^ ^</span>
	<span class="location">Diagram.hs:335:13:</span> <span class="note">note: </span>
	<span class="message"><code>Int</code> arises from a use of <code>EdgeIx :: Int -> …</code></span>
	<span class="snippet"> EdgeIx <$> use (ix viDest . _3) <* do</span>
	<span class="note underline"> ^^^^^^</span>
	<span class="location">Diagram.hs:331:20:</span> <span class="note">note:</span>
	<span class="message"><code>t0 (Link 'Incoming e)</code> arises from a use of
	<code>nodes0 :: Seq (…, …, t0 (Link 'Incoming e))</code></span>
	<span class="snippet"> (`execState` nodes0) $</span>
	<span class="note underline"> ^^^^^^</span>
	</pre>
	</body>