Implement a -fdiagnostics=sarif
flag for the clang frontend, which
instructs clang
binary to emit diagnostics formatted using the SARIF
specification.
There are several subtasks for this:
- - [X]
FLAG
Extend-fdiagnostics=
to acceptsarif
as a value, ensure that the effect of flag is a no-op. - - [ ]
BRIDGE
Create aDiagnosticConsumer
that is capable of accepting diagnostics from clang and storing them for usage later. There should be a method provided to flush stored diagnostics out into some kind of writer - - [X]
EMITTER
Create a SARIF emitter based on code present inlibStaticAnalyzer
- - [ ]
END_TO_END
Link the SARIF Consumer to the frontend and write tests for the outpu
clangStaticAnalyzer
already implements a SARIF
consumer, but the implementation there appears to be targeted towards “path-sensitive diagnostics”. What are these?
Command line options are generated via LLVM-Tablegen
Of interest are clang
’s options; which come from:
clang/include/Options.td.
This file is used to generate an Options.inc
file in-situ that is used by
clang/include/Options.h
to define accepted CLI opts.
This results in an -Wswitch-enum
warning due to the unused TextDiagnostic::SARIF
enum, which will be resolved at a later stage.
The objective here is to create an internal client, that can be hooked to the diagnostics engine. But there are two possibilities for what this should derive from:
DiagnosticRenderer
: This seems to be a utility class focused purely on formatting diagnostics, what is confusing is that when theclang
driver is initialized it comes with a renderer, and not a consumer?DiagnosticConsumer
: This, according to the Internals Manual is what is responsible for handling diagnostics passed in by the current compiler instance
The main reason with going for creating a DiagnosticConsumer
, is the following:
- The Diagnostic Internals Manual says so (rather unlucky to have found this towards the end my search)
-
CompilerInstance::createDiagnostics
accepts a(DiagnosticConsumer *)
parameter.
A source of confusion while figuring the above out, was the Text{DiagnosticPrinter,DiagnosticBuffer,Diagnostic}
family of classes.
After some digging it appears they work as follows:
TextDiagnosticPrinter
is aDiagnosticConsumer
:TextDiagnosticPrinter
internally uses aTextDiagnostic
object to perform all the special formatting for each of the options (-fdiagnostics-format=[clang|vi|msvc]
)TextDiagnosticBuffer
is a class used only to buffer the diagnostic text without any special pretty printing. This is useful for-verify
via theVerifyDiagnosticConsumer
, also useful for tests. More context is provided in (libflangFrontend
) https://reviews.llvm.org/D87774
- [X] Create
clang/include/SarifDiagnosticConsumer.h
andclang/include/SarifDiagnosticConsumer.cpp
- [ ] Create unit tests (similar to
VerifyDiagnosticConsumer
) that check if diagnostics passed to this consumer are stored properly - [X] Modify
CompilerInstance::createDiagnostics
so that it sets the consumer when-fdiagnostics-format=sarif
is specified - [X] Ensure
TextDiagnosticPrinter
cannot use-fdiagnostics-format=sarif
The code in clangStaticAnalyzer
is useful for SarifDiagnosticBuffer and needs to be refactored into it’s own templated class. The templating is necessary since the prior implementation’s createRule
, createRules
functions load rule names from a .inc
file. A class and a template specification for PathDiagnostic
is necessary to cover this, while keeping a common interface between StaticAnalyzer and the compiler internals.
- [X] Refactor all the
static
functions for creating SARIF fragments into a templated class / method, and create a specialization forPathDiagnostic
One of the hangups of ITERATION-0 turned out to be there were too many moving parts that were coupled together through the notion of “diagnostic”. The original approach was to make template specializations for each of the diagnostic types. It is not straightfoward to have a clean, common implementation this way. What turns out to be a better approach is to separate the concerns by action:
- Writing SARIF documents through a common interface `SarifDocumentWriter`
- Consuming diagnostics and translating them into SARIF
Split up the SarifDocumentWriter
-> SarifLog
, SarifLogSerializer
Create: SarifLog::validate()
to recursively validate the SARIF to be seriailzed.
Need a good testing tool for this. Possibly use pydantic
+ datamodel_codegen
to create an object model from the schema, then use this to validate SARIF produced by test tools? This still doesn’t solve some of the other problems, such as Microsoft’s SARIF validator (https://sarifweb.azurewebsites.net/Validation) has some extra rules such as:
version
values must be numeric (The spec says it can be anything the tool natively outputs, so it deviates here, other cases will come as we build larger tests.)
- [ ] Improve documentation for
TextDiagnostic
,TextDiagnosticPrinter
, andTextDiagnosticBuffer
- [ ] Investigate why
-fdiagnostics-format=<value>
doesn’t show up inclang --help