Skip to content

Instantly share code, notes, and snippets.

@envp
Last active July 20, 2022 18:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save envp/6abc3230dcc5043416c86aefb3d24419 to your computer and use it in GitHub Desktop.
Save envp/6abc3230dcc5043416c86aefb3d24419 to your computer and use it in GitHub Desktop.

SARIF Diagnostics in Clang

Objective

Implement a -fdiagnostics=sarif flag for the clang frontend, which instructs clang binary to emit diagnostics formatted using the SARIF specification.

Analysis

There are several subtasks for this:

  1. - [X] FLAG Extend -fdiagnostics= to accept sarif as a value, ensure that the effect of flag is a no-op.
  2. - [ ] BRIDGE Create a DiagnosticConsumer that is capable of accepting diagnostics from clang and storing them for usage later. There should be a method provided to flush stored diagnostics out into some kind of writer
  3. - [X] EMITTER Create a SARIF emitter based on code present in libStaticAnalyzer
  4. - [ ] END_TO_END Link the SARIF Consumer to the frontend and write tests for the outpu

Task Notes

Prior Art

clangStaticAnalyzer already implements a SARIF consumer, but the implementation there appears to be targeted towards “path-sensitive diagnostics”. What are these?

FLAG (Unreleased) Add a -fdiagnostics-format=sarif flag

Command line options are generated via LLVM-Tablegen

Of interest are clang’s options; which come from: clang/include/Options.td. This file is used to generate an Options.inc file in-situ that is used by clang/include/Options.h to define accepted CLI opts.

This results in an -Wswitch-enum warning due to the unused TextDiagnostic::SARIF enum, which will be resolved at a later stage.

[BRIDGE]

Analysis

The objective here is to create an internal client, that can be hooked to the diagnostics engine. But there are two possibilities for what this should derive from:

  • DiagnosticRenderer: This seems to be a utility class focused purely on formatting diagnostics, what is confusing is that when the clang driver is initialized it comes with a renderer, and not a consumer?
  • DiagnosticConsumer: This, according to the Internals Manual is what is responsible for handling diagnostics passed in by the current compiler instance

The main reason with going for creating a DiagnosticConsumer, is the following:

[EMITTER]

Hang-ups

A source of confusion while figuring the above out, was the Text{DiagnosticPrinter,DiagnosticBuffer,Diagnostic} family of classes.

After some digging it appears they work as follows:

  • TextDiagnosticPrinter is a DiagnosticConsumer:
  • TextDiagnosticPrinter internally uses a TextDiagnostic object to perform all the special formatting for each of the options (-fdiagnostics-format=[clang|vi|msvc])
  • TextDiagnosticBuffer is a class used only to buffer the diagnostic text without any special pretty printing. This is useful for -verify via the VerifyDiagnosticConsumer , also useful for tests. More context is provided in (libflangFrontend) https://reviews.llvm.org/D87774

Implementation Notes

[ITERATION 0,1] Refactoring clangStaticAnalyzer – Later

  • [X] Create clang/include/SarifDiagnosticConsumer.h and clang/include/SarifDiagnosticConsumer.cpp
  • [ ] Create unit tests (similar to VerifyDiagnosticConsumer) that check if diagnostics passed to this consumer are stored properly
  • [X] Modify CompilerInstance::createDiagnostics so that it sets the consumer when -fdiagnostics-format=sarif is specified
  • [X] Ensure TextDiagnosticPrinter cannot use -fdiagnostics-format=sarif

The code in clangStaticAnalyzer is useful for SarifDiagnosticBuffer and needs to be refactored into it’s own templated class. The templating is necessary since the prior implementation’s createRule, createRules functions load rule names from a .inc file. A class and a template specification for PathDiagnostic is necessary to cover this, while keeping a common interface between StaticAnalyzer and the compiler internals.

  • [X] Refactor all the static functions for creating SARIF fragments into a templated class / method, and create a specialization for PathDiagnostic

[ITERATION 2] Implement a SarifDocumentWriter

One of the hangups of ITERATION-0 turned out to be there were too many moving parts that were coupled together through the notion of “diagnostic”. The original approach was to make template specializations for each of the diagnostic types. It is not straightfoward to have a clean, common implementation this way. What turns out to be a better approach is to separate the concerns by action:

  1. Writing SARIF documents through a common interface `SarifDocumentWriter`
  2. Consuming diagnostics and translating them into SARIF

[ITERATION 3] Implement a SarifDocumentWriter + Value Semantic Types for SARIF objects

Split up the SarifDocumentWriter -> SarifLog, SarifLogSerializer

Create: SarifLog::validate() to recursively validate the SARIF to be seriailzed.

[END_TO_END]

Need a good testing tool for this. Possibly use pydantic + datamodel_codegen to create an object model from the schema, then use this to validate SARIF produced by test tools? This still doesn’t solve some of the other problems, such as Microsoft’s SARIF validator (https://sarifweb.azurewebsites.net/Validation) has some extra rules such as:

  • version values must be numeric (The spec says it can be anything the tool natively outputs, so it deviates here, other cases will come as we build larger tests.)

Secondary Tasks

  • [ ] Improve documentation for TextDiagnostic, TextDiagnosticPrinter, and TextDiagnosticBuffer
  • [ ] Investigate why -fdiagnostics-format=<value> doesn’t show up in clang --help

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment