Skip to content

Instantly share code, notes, and snippets.

@Rufflewind
Last active September 28, 2015 00:21
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Rufflewind/704788413c5e6bed18b3 to your computer and use it in GitHub Desktop.
Save Rufflewind/704788413c5e6bed18b3 to your computer and use it in GitHub Desktop.

Improved GHC diagnostics

Analysis of Clang, GCC, and Rust

First let's analyze Clang, GCC, and Rust and see what we can learn from them.

Diagnostic entries

Diagnostics contain entries of this format (Rust calls them "spans"):

LOCATION: TYPE: MESSAGE
  MESSAGE_CONTINUED
SNIPPET

LOCATION is generally of the form PATH:ROW:COL where PATH is the path to the offending source code, ROW is the line number (1-indexed), and COL is the column number (1-indexed). Rust not only shows the starting location but also the ending location.

Here, TYPE is either error (red), warning (magenta/magenta/yellow), note (grey/cyan/green), or help (cyan). Each TYPE has a distinctive color so the user can easily see which ones are important. For warnings, magenta is probabl

A single diagnostic may contain several of these entries, but the first one is always error or warning (primary entry), while the remaining ones (if any) can only be note or help (auxiliary entries).

note entries bring attention to parts of the code that are further away from the source of the error. GHC already does this to some extent, but the formatting is ad hoc and differs from error to error. Moreover, GHC seems to pretty-format long expressions in its own format. Personally I think it would be better to print them as-is using notes.

help is a Rust-specific entry that reminds the user of how to seek detailed help on a specific error. This may be useful for newcomers, but I think would probably be overkill to display this for every diagnostic like Rust does. If GHC ever gets a similar feature and wants to make it newcomer-friendly, it would be sufficient to print this exactly once per compilation.

The MESSAGE contains flowing text wrapped to around 80 characters. It may contain fragments of code, which are colored distinctively from the text (except Rust). The primary message may also suffixed with [ID], where ID is some sort of identifier for this diagnostic (or class of diagnostics). Clang/GCC use a string to identify the class of diagnostic (e.g. -Wunused), which can in turn be used to silence the diagnostic if needed. Rust appears to have a unique numeric ID for each diagnostic.

The SNIPPET contains code pulled directly from the source file. It may also contain underlining to indicate which parts of the line are relevant to the problem (Clang/GCC calls them "caret diagnostics").

Underlining: Clang has two different forms: ^ and ~~~~~; GCC only has ^; Rust uses ^~~~~. Unless GHC has a reason to distinguish between different kinds of highlights, Rust's approach is probably the best as it's easier to see than a single caret.

Diagnostic summary

Both Clang and Rust show a one-line summary of how many errors and warnings were generated by the code. (I'm not sure how useful this would be.)

Adjustable colors

Adjustable colors. Since GCC supports GCC_COLORS, GHC can also support GHC_COLORS using a syntax such as

error=01;31:warning=01;35

Interestingly, Clang hard-codes the colors so they can't be changed without recompiling (or hacking the binary). (I have no info about this on Rust, but I suspect it's similar.)

Semantic output

Now let's look at our goals for GHC.

In the grand scheme of things, it would be useful to produce diagnostics in a structured format that can be easily parsed without ambiguity. However, the format needs to be flexible enough to allow new errors to be added, so a sum type would probably be too rigid for this.

The goal is not unlike that of HTML, where the format is structured but the ultimately the information is primarily meant for consumption by humans. Therefore, it would be nice to borrow some of the good ideas.

Source code is worth a thousand words, so here's a rough sketch of how diagnostics could be encoded:

data Diagnostic
  = Diagnostic
      DiagnosticID
      (Entry PrimaryEntryType)
      (Entry AuxiliaryEntryType)

data DiagnosticID
  = DiagnosticID String

data Entry type
  = Entry
      Location
      type
      Message

data Location
  = Location
      FilePath
      Int Int -- row range
      Int Int -- col range
  | Module

data PrimaryEntryType   = Error | Warning
data AuxiliaryEntryType = Note

data Message
  = Message
      [MessageElem]

data MessageElem
  = Text String
  | Code String

Colored diagnostics

We can follow the same format as Clang/GCC:

PATH:ROW:COL: TYPE: MESSAGE
  MESSAGE_CONTINUED
SNIPPET

Toggling

Colors are enabled if GHC can detect that the terminal is a TTY that supports colors. Otherwise they are off. If -fcolor-diagnostics[=always] then it is turned on always. If -fcolor-diagnostics=never or -fno-color-diagnostics then it is disabled. The default is -fcolor-diagnostics=auto, which is implied. (Note that

Ranges

Location ranges are helpful, but for human readers they add a lot of visual noise. There should be some sort of flag to turn ranges on or off. If the flag is turned out, locations are printed in the format R-S:C-D, even if R == S or C == D. This way parsers won't have to handle special cases.

Questions

Would it be useful to also have a 3rd color for deferred errors?

Improving diagnostic information

Here, we focus on the actual content of diagnostics. I have a few idea of my own, but if you have any feel free to add them to this list.

Discarding the "in n-th argument of"

They don't provide anything that the user doesn't already know from the source location.

In the second argument of ‘(.)’, namely ‘_3’
In the first argument of ‘use’, namely ‘(ix viDest . _3)’
In the second argument of ‘(<$>)’, namely ‘use (ix viDest . _3)’

(rigid, skolem)

Frequently I see something like "(rigid, skolem)" in the error message, which looks like a 2-tuple of identifiers. If the intent is to indicate that they are both synonyms, why not write it as rigid/skolem?

Type tracing

GHC's errors are quite understandable once you've familiarized with the jargon. However, one of the main reasons why type errors are still frustrating is that even though I understand the error I don't actually know why it happens, because of a phenomenon that can be best described as: "it all made sense in my head when I wrote it".

Generally, it's because of something that I failed to consider when I unified the types in my head. So it would be helpful for GHC to point out where the flaw in my logic is.

Obviously one can't expect a compiler to guess what the logic error is. In fact, it shouldn't. Rather it should present (in an order fashion) of how it inferred a given type.

Here's an example of what I mean:

Diagram.hs:335:41:
    Couldn't match type ‘Int’ with ‘t0 (Link 'Incoming e)’
    arising from a functional dependency between:
      constraint ‘Field3
                    (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
                    (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
                    Int
                    Int’
        arising from a use of ‘_3’
      instance ‘Field3 (a, b, c) (a, b, c') c c'’ at <no location info>
    Relevant bindings include
      e :: e (bound at Diagram.hs:334:52)
      loSrc :: [(e, VertexIx)] (bound at Diagram.hs:333:15)
      lo :: Vector (Link 'Outgoing e)
        (bound at Diagram.hs:332:49)
      reformedNodes :: Seq
                         (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
        (bound at Diagram.hs:330:5)
      nodes0 :: Seq (v, Seq (Link 'Outgoing e), t0 (Link 'Incoming e))
        (bound at Diagram.hs:346:5)
      nodes :: DiagramT Vector e v (bound at Diagram.hs:321:55)
      diagram :: Diagram e v (bound at Diagram.hs:321:38)
      normalizeDiagramEdges :: Coeffed (Diagram e v)
                               -> Coeffed (Diagram e v)
        (bound at Diagram.hs:321:1)
    In the second argument of ‘(.)’, namely ‘_3’
    In the first argument of ‘use’, namely ‘(ix viDest . _3)’
    In the second argument of ‘(<$>)’, namely ‘use (ix viDest . _3)’

Here's a mockup of what would be more useful (colored version):

Diagram.hs:335:41: error:
  cannot match Int with t0 (Link 'Incoming e) arising from a functional
  dependency between constraint and instance, where the constraint is
    Field3 (…, …, t0 (Link 'Incoming e)) … Int …
  arising from a use of
    _3 :: Field3 s t a b => …
            EdgeIx <$> use (ix viDest . _3) <* do
                                        ^^
[Control.Lens.Tuple]: note:
  instance Field3 (a, b, c) (a, b, c') c c'
                         ^             ^
Diagram.hs:335:13: note: 
  Int arises from a use of EdgeIx :: Int -> …
            EdgeIx <$> use (ix viDest . _3) <* do
            ^^^^^^
Diagram.hs:331:20: note:
  t0 (Link 'Incoming e) arises from a use of
  nodes0 :: Seq (…, …, t0 (Link 'Incoming e))
      (`execState` nodes0) $
                   ^^^^^^

Screenshot

<style>
body { color: white; background: #111; }
.location { color: yellow; }
.error
, .underline.error
{ color: red; }
.note
, .underline.note
{ color: cyan; }
.message { font-weight: bold; }
.message code { color: grey; }
.snippet { color: white; }
.underline { color: lime !important; }
</style>
<body>
<pre><span class="location">Diagram.hs:335:41:</span> <span class="error">error:</span>
<span class="message">cannot match <code>Int</code> with <code>t0 (Link 'Incoming e)</code> arising from a functional
dependency between constraint and instance, where the constraint is
<code>Field3 (…, …, t0 (Link 'Incoming e)) … Int …</code>
arising from a use of
<code>_3 :: Field3 s t a b => …</code></span>
<span class="snippet"> EdgeIx <$> use (ix viDest . _3) <* do</span>
<span class="error underline"> ^^</span>
<span class="location">[Control.Lens.Tuple]:</span> <span class="note">note:</span>
<span class="message">instance <code>Field3 (a, b, c) (a, b, c') c c'</code></span>
<span class="note underline"> ^ ^</span>
<span class="location">Diagram.hs:335:13:</span> <span class="note">note: </span>
<span class="message"><code>Int</code> arises from a use of <code>EdgeIx :: Int -> …</code></span>
<span class="snippet"> EdgeIx <$> use (ix viDest . _3) <* do</span>
<span class="note underline"> ^^^^^^</span>
<span class="location">Diagram.hs:331:20:</span> <span class="note">note:</span>
<span class="message"><code>t0 (Link 'Incoming e)</code> arises from a use of
<code>nodes0 :: Seq (…, …, t0 (Link 'Incoming e))</code></span>
<span class="snippet"> (`execState` nodes0) $</span>
<span class="note underline"> ^^^^^^</span>
</pre>
</body>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment