Skip to content

Instantly share code, notes, and snippets.

@maxhaton
Last active September 17, 2019 19:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maxhaton/a44b0be7eaa9ee75df598794061fb9a0 to your computer and use it in GitHub Desktop.
Save maxhaton/a44b0be7eaa9ee75df598794061fb9a0 to your computer and use it in GitHub Desktop.

1. Compile Time measurements

This stage involves producing a tool ("Benchquark"?) which can do the following (on Linux, for technical reasons below):

Measure what against what?

This section will measure D code against (mainly) the compiler major-minor version, but also against individual commits to the dmd git repository.

As to what D code, it must be any - the tool created should be sufficiently generic, however the system running on CI will use a basket of representative D codebases: For example, Phobos is heavily templated, but doesn't use CTFE much whereas vibe.d uses a lot of CTFE.

Measure Binary sizes and Compile times

Binaries

The Binary sizes of all object files produced by the compiler will be recorded. It could potentially be useful to collect a little data on them as ELF files e.g. the size of the symbol table, or how large the segments are.

Compile Times

The method is obvious, however, this may raise issues as ensuring consistent performance is potentially impossible on a hypervisor-ed cloud machine - this can obviously be mitigated by having the same machine do the measurements.

Vladimir has been using callgrind's instruction counter, which will be a useful heuristic for execution time (correlated but not perfect, i.e. even microcontrollers are superscalar these days)

Compile Time Memory Usage

valgrind has multiple tools for measuring memory usage, however for this purpose GNU time (not the built in Bash command) can provide basic information on a program's execution (Max. resident set size, page faults, how many in/voluntary context switches etc.).

Vladimir says using Valgrinds massif may be a better choice in this regard: I have already implemented time support however this would be a useful step forward (Massif can provide a Memory-time view of the programs execution as well as just peak heap usage)

More detailed profiling of the compiler

Tracing the compiler

There are a wealth of linux profiling(and tracing) tools, some of which are probably a bit overkill but nonetheless a tool like callgrind can provide a useful insight into how a program behaves at runtime - it also means access to kcachegrind which provides a mature graphical respresentation of valgrind data. I think it would be wise to provide the output from callgrind rather than parsing it (to start with at least) as this would avoid reimplementing a reasonably large wheel in the form of kcachegrind.

DMDProf Integration

Vladimir Panteleev has a tool which can peer inside DMD and profile it at on a per module basis (and produce output which can be converted to a graphical representation of the compilation). On the assumption that it scales well this will be very useful, so running it against the compilation of each of project is a goal.

It may also be possible to use some of the newer linux tracing magic to do this however they seem to heavy duty for this purpose (Laeeth Isharc suggested eBPF but I think running this in a cloud environment is difficult as far as I can tell, certainly not as easy as GDB). If this is feasible it could mean being able to analyze GDC and LDC too but I'm not overly familiar with their codebases so that would have to be something for milestone N+1

Data about D files

Recently mentioned on the forum, was a desire to be able to collect high level statistics of how D features are used: This actually isn't particularly difficult to implement. This uses dmd-as-a-library to drink from the firehose and get information about D files, for example:

  • What is the average length of functions in this file?
  • How many functions are @safe
  • How long is the average mangled name. (This one came to mind because of D's habit of encouraging nested templates to the nth degree) Numerical data could be plotted as part of the next section. I'm not entirely sure what data should be collected as of today, so this will require a forum post for suggestions or similar.

In the section specifically, I am quite interested in getting the bedrock done rather than spending too much time on it because although we have some ideas for using this data in the rest of the project, they would probably take a too long to do.

2. Flip the Switch (Getting it actually connected to the CI)

The exact order of this one depends on how useful the first milestone is by itself, it might be more fruitful to get the runtime milestone working first and then do this last

This is fairly self explanatory: We get the data from compiling (say) Phobos and present in a pretty manner online. Vladimir already has a basic implementation of this (now at <perf.dlang.io>), however the eventual result should be closer to https://perf.rust-lang.org/ in scope and data collects. Vladimir suggests extending his site over making a new one, and given that his is (of course) the author of it I think this is probably a good idea.

If possible I would like to do some basic data processing with this (e.g. peak detection) both for better visualisation and also for accessibility. This would also potentially allow it to comment on PRs which cause spikes in bad things with a warning, however detecting these may be non-trivial (a basic summary would of course not be).

3. Runtime Perforamance of D

As well as being measured against compiler version (or potentially commit, depends how long a measurement takes) these will also be measured against Phobos commits.

Phobos and DRuntime

Using the unittests as crude benchmarks

Benchmarking software usually implies writing benchmarks, but it struck me that good D programs should have simple unittests built in to them. These could be used to get a basic overview of performance (I demonstrated this in my initial proposal) over time. The issue is that when they are compiled with any optimisations, the Data flow analysis immediately eliminates the unittests without side effects. This puts a barrier on their use as proper benchmarks (although I think it might not be impossible to ""link"" a debug unittest against a release build of a library, assuming the ABI is consistent - or just textually grab the unittests).

Asymptotic performance

Around 4 years ago, Andrei started work on representing the asymptotic complexity of phobos operations as UDAs. This section doesn't aim to fully resurrect this idea, however it will test the asymptotic complexity of phobos operations (holding them to their contract e.g. O(n^2) time complexity).

A Benchmarking framework

This is fairly self explanatory, however the framework will be able to expose it's benchmarks in the dynamic symbol table so it can be loaded by the tool and be operated on by the following section as well as a usual in source benchmark runner.

The exact design of this will be intended to be easily applied to Phobos and druntime code.

More detailed microarchitectural benchmarking

These two parts aren't intended to be shown in the CI infrastructure e.g. their scope is too narrow to be meaningfully used over a whole program although they could be used to measure long term progress in compiler optimizations of D - this is out of the scope of the project however.

  • Dynamic Analysis: The perf_events subsystem provides access to both kernel and hardware (when available) information which can be recorded to measure (for example) IPC of hot loops in numerical code. This will include collecting perf_events data while each benchmark runs, and possibly extend to a wrapper to make it easier to trace D programs although I don't know whether this is possible yet.

  • Static Analysis: LLVM-MCA is (think Intel IACA) a static analysis tool that uses LLVM's scheduling models to estimate the performance of loop kernels. Wrapping this would be useful when considering the performance of loops (without having to manually annotate the assembly to tell MCA where to look). This will involve taking the compiled output, giving it to LLVM-MCA and then collecting its output (It gives a visual - textual - representation of it's data by default so it would probably be easier to just show it verbatim, or with a bit of colour, rather than parsing it)

Technical / Legal Considerations

  • Windows? The above steps have been written around the assumption of only working in their entirety, this is due to many of the required tools/libaries/technologies being either unavailable or much more difficult to use on Windows (For example, accessing detailed performance analysis (a la perf) on windows requires a totally seperate implementation)
  • Licensing: It may be required to use either GPL v2 or v3, because both valgrind and perf (and libraries associated with it) are licensed under the GPL
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment