Skip to content

Instantly share code, notes, and snippets.

@blu
blu / aarch64_vector_codegen_survey.md
Last active April 4, 2024 19:56
aarch64 vector codegen survey

After checking the status quo of aarch64 vector code generation by recent-enough compilers, particularly where it comes to intrinsincs codegen, a few points surfaced, which I'll try to reason about here. The originally participating compilers were gcc-11 through 13 and clang-11 through 13. Here we will focus on just one "lineage" of the sampled compliers -- clang, as its vector codegen is as-good or better than its peers. Particularly, focus will be on how percieved optimisations may affect the codegen, and thus performance, in the context of a simple function -- a vec3-by-matrix3x3 transformation, i.e. multiplication.

Performance surveyed on the following uarches:

  • cortex-a76 -- most of aarch64 cloud install base, raspberry pi 5, et al; clang-13
  • cortex-a78 -- a fair midrange performer these days; NV Orin et al; clang-11 & 12
  • apple M1 -- the uarch which set the bar of aarch64 high-end performance a few years ago; apple clang-12

Surveyed code

@blu
blu / rpi5_impressions.md
Last active March 17, 2024 05:27
rpi5 impressions re compute

First impressions after a night of perf & conformance testing

BCM2712 brief

Setting the expectations

  • BCM2712 is a 16nm part. That dictates its Power-Performance-Area (PPA) parameters & capabilities, but also pricing and availability
  • CPU complex: 4x cortex-a76 cores. Base factory clock 1.5GHz, peek factory clock 2.4GHz. Those clocks are dictated by litho process and form factor, ie. dissipation budget, and in both those categories rpi5 is a budget product. When passively cooled (via a heatsink) BCM2712 can run at full load for a dozen of minutes before throttling down. Rpi5 stock active cooler (not tested) is reported to provide all-around unthrottled performance.
@blu
blu / gw_homebrew_macos.md
Last active April 10, 2023 07:39
game-and-watch homebrew quickstart on macos Big Sur
@blu
blu / morfe_all.patch
Last active February 21, 2022 18:12
Morfe cumulative patch against 8dd9012
diff --git a/cmd/morfe/main.go b/cmd/morfe/main.go
index 6526b3b..d150606 100644
--- a/cmd/morfe/main.go
+++ b/cmd/morfe/main.go
@@ -129,7 +129,6 @@ func (gui *GUI) newRendererAndTexture(window *sdl.Window) {
if err != nil {
log.Fatalf("Failed to create renderer: %s\n", err)
}
- debugRendererInfo(gui.renderer)