Martin Krastev blu

## aarch64_vector_codegen_survey.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                blu
                / aarch64_vector_codegen_survey.md
            
            
              Last active
              April 4, 2024 19:56
            
              
                aarch64 vector codegen survey
              
          
    After checking the status quo of aarch64 vector code generation by recent-enough compilers, particularly where it comes to intrinsincs codegen, a few points surfaced, which I'll try to reason about here. The originally participating compilers were gcc-11 through 13 and clang-11 through 13. Here we will focus on just one "lineage" of the sampled compliers -- clang, as its vector codegen is as-good or better than its peers. Particularly, focus will be on how percieved optimisations may affect the codegen, and thus performance, in the context of a simple function -- a vec3-by-matrix3x3 transformation, i.e. multiplication.
Performance surveyed on the following uarches:

cortex-a76 -- most of aarch64 cloud install base, raspberry pi 5, et al; clang-13
cortex-a78 -- a fair midrange performer these days; NV Orin et al; clang-11 & 12
apple M1 -- the uarch which set the bar of aarch64 high-end performance a few years ago; apple clang-12

Surveyed code


## rpi5_impressions.md

      
              1 file
            
          
              0 forks
            
          
              2 comments
            
          
              0 stars
            
          
                blu
                / rpi5_impressions.md
            
            
              Last active
              March 17, 2024 05:27
            
              
                rpi5 impressions re compute
              
          
    First impressions after a night of perf & conformance testing

BCM2712 brief
Setting the expectations


BCM2712 is a 16nm part. That dictates its Power-Performance-Area (PPA) parameters & capabilities, but also pricing and availability
CPU complex: 4x cortex-a76 cores. Base factory clock 1.5GHz, peek factory clock 2.4GHz. Those clocks are dictated by litho process and form factor, ie. dissipation budget, and in both those categories rpi5 is a budget product. When passively cooled (via a heatsink) BCM2712 can run at full load for a dozen of minutes before throttling down. Rpi5 stock active cooler (not tested) is reported to provide all-around unthrottled performance.


## gw_homebrew_macos.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                blu
                / gw_homebrew_macos.md
            
            
              Last active
              April 10, 2023 07:39
            
              
                game-and-watch homebrew quickstart on macos Big Sur
              
          
    A guide how to get game-and-watch-base and game-and-watch-retro-go built and deployed from macOS Big Sur
NOTE: All device-comm steps given for G&W Mario over ST-link/V2; update accordingly for G&W device and SWD iface
External references

https://github.com/Upcycle-Electronics/game-and-watch-hardware

https://github.com/ghidraninja/game-and-watch-backup

https://github.com/ghidraninja/game-and-watch-base

  
## morfe_all.patch
diff --git a/cmd/morfe/main.go b/cmd/morfe/main.go
index 6526b3b..d150606 100644
--- a/cmd/morfe/main.go
+++ b/cmd/morfe/main.go
@@ -129,7 +129,6 @@ func (gui *GUI) newRendererAndTexture(window *sdl.Window) {
         if err != nil {
                 log.Fatalf("Failed to create renderer: %s\n", err)
         }
-        debugRendererInfo(gui.renderer)
	diff --git a/cmd/morfe/main.go b/cmd/morfe/main.go
	index 6526b3b..d150606 100644
	--- a/cmd/morfe/main.go
	+++ b/cmd/morfe/main.go
	@@ -129,7 +129,6 @@ func (gui GUI) newRendererAndTexture(window sdl.Window) {
	if err != nil {
	log.Fatalf("Failed to create renderer: %s\n", err)
	}
	- debugRendererInfo(gui.renderer)