Skip to content

Instantly share code, notes, and snippets.

@LoganDark
Last active February 2, 2024 21:32
Show Gist options
  • Save LoganDark/8f9ba939bbe4b73d057ceee46b07e20f to your computer and use it in GitHub Desktop.
Save LoganDark/8f9ba939bbe4b73d057ceee46b07e20f to your computer and use it in GitHub Desktop.
I can't profile on my Windows PC

Update #2

I was able to rebuild perf in order to make it work. My previous version of perf was built with these features:

...                         dwarf: [ OFF ]
...            dwarf_getlocations: [ OFF ]
...                         glibc: [ on  ]
...                        libbfd: [ OFF ]
...                libbfd-buildid: [ OFF ]
...                        libcap: [ OFF ]
...                        libelf: [ OFF ]
...                       libnuma: [ OFF ]
...        numa_num_possible_cpus: [ OFF ]
...                       libperl: [ OFF ]
...                     libpython: [ OFF ]
...                     libcrypto: [ OFF ]
...                     libunwind: [ OFF ]
...            libdw-dwarf-unwind: [ OFF ]
...                          zlib: [ OFF ]
...                          lzma: [ OFF ]
...                     get_cpuid: [ on  ]
...                           bpf: [ on  ]
...                        libaio: [ on  ]
...                       libzstd: [ OFF ]
...        disassembler-four-args: [ OFF ]

As you may notice, a lot of libraries (like dwarf, libunwind, etc) that are relevant to stack stuff are turned off. That's because I didn't have the relevant headers installed on my system.

I installed these packages: libunwind8-dev libdwarf-dev libelf-dev libdw-dev systemtap-sdt-dev libssl-dev libslang2-dev binutils-dev libzstd-dev libbabeltrace-dev libiberty-dev numactl-devel (a couple (and the perf build scripts for them) require Python 3)

Then I had this:

...                         dwarf: [ on  ]
...            dwarf_getlocations: [ on  ]
...                         glibc: [ on  ]
...                        libbfd: [ on  ]
...                libbfd-buildid: [ on  ]
...                        libcap: [ on  ]
...                        libelf: [ on  ]
...                       libnuma: [ on  ]
...        numa_num_possible_cpus: [ on  ]
...                       libperl: [ on  ]
...                     libpython: [ OFF ]
...                     libcrypto: [ on  ]
...                     libunwind: [ on  ]
...            libdw-dwarf-unwind: [ on  ]
...                          zlib: [ on  ]
...                          lzma: [ on  ]
...                     get_cpuid: [ on  ]
...                           bpf: [ on  ]
...                        libaio: [ on  ]
...                       libzstd: [ on  ]
...        disassembler-four-args: [ on  ]

and F python, so this was more than good enough for me.

After the build, the python3 that was installed by systemtap-sdt-dev (or one of any number of things) can be yeeted to keep your system light.

This new perf binary works!!

image

Update

As it turns out, Windows Defender used to prevent ETW stuff from working correctly before a certain version. Wouldn't you know it, my Windows Defender (due to not being used) was not up to date (also due to Windows Update being disabled). In a miraculous feat of control I managed to get Windows Update to ONLY update the Windows Defender definitions and not destroy my computer. I went from 1.303.25.0 to 1.367.41.0.

After updating, cargo flamegraph works on bare-metal Windows using ETW:

image

Success, kind of. I've found a way to profile on Windows. I would like to be able to profile within WSL, though -- and the WSL2 profiler still doesn't work properly. Gist below.

I can't profile on my Windows PC

As you may know, I make Rust software. I like to make my Rust software fast. I'm currently working on a library called getargs that is, as far as I can tell, the fastest option parser in the entire Rust ecosystem.

However, my latest additions to the library have caused the speed to suffer a little. It's ~5-10% slower than it used to be, and I call that a regression. Usually, in cases like these, I'd break out the profiler.

Except in this case it hasn't worked. I can't seem to find a solution that works. I will go on and document everything I've tried and how it's all utterly broken. All of it. Every single thing.

1. Using WSL1 to profile

I want to do this, but it is intentionally unsupported.

2. Using WSL2 to profile

Using CLion

I installed WSL2 for the express purpose of profiling after being told that WSL1 is not compatible with profiling due to not running in a virtual machine. That's fine, I can take the performance hit as long as the proportions are the same.

I manually compiled perf from the WSL2 Linux kernel and added it to my PATH, then went to use IntelliJ-Rust's profiler from CLion.

No dice.

Not only is the profiler data usually empty:

image

If I crank the sample rate up to extreme levels (1000000000 samples per second), I still usually don't get anything:

image

Except this time, I have some nice error messages in the console:

Warning:
Processed 230 events and lost 4 chunks!

Check IO/CPU overload!

Over 15 profiles I did not get a single sample.

The culprit, for now? /proc/sys/kernel/perf_event_paranoid is set to 2 on boot. With this command:

$ echo -n -1 | sudo tee /proc/sys/kernel/perf_event_paranoid

I get something:

image

But have you noticed something? At a rate of 1000000000 samples per second, I got only 45 samples, and they are ALL of boring kernel stuff.

How am I supposed to use this to improve my application's performance?

Not useful. Moving on.

Using cargo flamegraph

cargo flamegraph usually exits with this error due to not being able to find anything:

writing flamegraph to "flamegraph.svg"
Error: unable to generate a flamegraph from the collapsed stack data

Caused by:
    0: I/O error: No stack counts found
    1: No stack counts found

On the off-chance it does find something, it is again, just boring kernel stuff.

I even tried installing Rust and using cargo-flamegraph as root, but same issue.

3. Using DTrace on Windows to profile

cargo flamegraph supports DTrace on Windows, but it doesn't support my DTrace.

dtrace: description 'profile-997 ' matched 1 probe
dtrace: pid 8000 has exited

(and then an infinite hang)

4. Using ETW on Windows (via cargo flamegraph) to profile

Removing DTrace from my Windows PATH results in cargo flamegraph trying a new alternative strategy that uses ETW. Too bad it also doesn't work:

writing flamegraph to "flamegraph.svg"
Error: unable to generate a flamegraph from the collapsed stack data

Caused by:
    0: I/O error: No stack counts found
    1: No stack counts found

5. Using Windows Performance Recorder/Analyzer

These tools don't support taking stack snapshots of programs, unless I'm mistaken and just can't figure it out.

6. Using the Visual Studio Debugger

image

Sure doesn't work.

I can profile on my Mac with absolutely zero effort

sudo cargo flamegraph works every time guaranteed. Detailed traces of userspace code.

Problem is that my Mac is remote. I'm not using it anymore because it has a broken keyboard & USB ports. I really just want to find a solution that works on my machine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment