Skip to content

Instantly share code, notes, and snippets.

View FCLC's full-sized avatar
🔥
testing edge cases

FelixCLC FCLC

🔥
testing edge cases
View GitHub Profile
@FCLC
FCLC / Understanding how modern processors got fast: SIMD, multiple pipes and Out of Order execution.md
Last active November 13, 2023 06:07
An approachable introduction to how modern CPUs got fast, beyond throwing more GHz at the problem

Context

I was helping a few computer science students and enthusiasts understand “how” modern processors got to be “so fast” outside of clock speed increases.  

Here is the main ;p exert  

Acronyms:  

SIMD: Single Instruction, Multiple Data  

@FCLC
FCLC / M1 Cluster follow up: Jetson Orin.md
Last active January 15, 2023 19:22
What if we built a Jetson Orin AGX micro cluster?

The follow up on M1 Cluster- Jetson Orin AGX

Original M1 Piece here: https://gist.github.com/FCLC/6e0f0e79e9d4f5740573f09d7579eb72

No system exists in a vacuum, and so as a follow up to the M1 Cluster, I thought I’d look at a similar cluster based on another integrated ARM device.

Oracle typically builds a Rpi cluster every few years. Their most recent unit, built using 1060 Rpi 3B+ is an interesting piece of tech. Another is the 750 Pi cluster built by LANL. But Pi clusters seem like the domain of Jeff Geerling and co. so, let’s look at something else. The most popular developer board is the Nvidia Jetson series, and the most powerful unit is the latest Orin AGX 64GB.

Setting the stage

@FCLC
FCLC / Mac_studio Cluster.md
Last active January 16, 2023 22:01
MacStudio Cluster: What if we threw sanity to the wind?

The Beginning

During the 13th of January 2023 HPC Huddle (now hosted by hpc.social) the topic of #HPC development and workloads on Apple silicon came up briefly.

Thinking on it, once #Asahi Linux has GPU compute support squared away; I can see a world where devices like Mac Studio with M1 ultra are augmented by Thunderbolt4 networking cards. Even if it is for PR, vendors like Oracle amongst others have demonstrated a willingness to build weird and wonderful clusters as a “because we can.” It is far from Ideal, but we have done worse to get less. Beyond Oracle and the Pi cluster, the US DOD/Air Force ran a PS3 cluster for years. https://phys.org/news/2010-12-air-playstation-3s-supercomputer.html

Setting the stage

A few baselines before I go on:

@FCLC
FCLC / deprecated_M1_cluster.md
Last active January 15, 2023 19:28
Thinking about what a small M1 Ultra cluster would look like

Ignore this post and read the new one instead: https://gist.github.com/FCLC/6e0f0e79e9d4f5740573f09d7579eb72

Originally this was a borderline copy/paste of a Mastodon exchange. it was fairly crap, so I rewrote the whole thing; the updated version is available via the link above. I prefer not to hide this sort of thing, so the archive will remain public

# Warnings and alarm bells

"What Cursed thing are you talking about now?"
@FCLC
FCLC / spack install hipsycl%"apple-clang@14.0.0".sh
Created November 29, 2022 15:55
log of hip_sycl apple-clang-14-fails after building llvm-12
Felix$ spack install hipsycl%"apple-clang@14.0.0"
==> Installing boost-1.69.0-z4oleaz6rrkaqoibusrp4e4bol3wpbv7
==> No binary for boost-1.69.0-z4oleaz6rrkaqoibusrp4e4bol3wpbv7 found: installing from source
==> Using cached archive: /usr/local/Cellar/spack/0.19.0/var/spack/cache/_source-cache/archive/8f/8f32d4617390d1c2d16f26a27ab60d97807b35440d45891fa340fc2648b04406.tar.bz2
==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/darwin_clang_version.patch
==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/system-non-virtual-dtor-include.patch
==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/system-non-virtual-dtor-test.patch
==> Applied patch /usr/local/Cellar/spack/0.19.0/var/spack/repos/builtin/packages/boost/pthread-stack-min-fix.patch
==> Ran patch() for boost
==> boost: Executing phase: 'install'
@FCLC
FCLC / A not so brief discussion of Alder Lake, the new AVX512 FP 16 extensions, Sapphire Rapids, its history, and why it requires a custom kernel.md
Last active January 7, 2024 13:23
On AVX512 FP16, Alder Lake, custom kernels, and how "Mistakes were made" has never rang so true

Warning: This is going to be a long one.  

  

I'm assuming general knowledge of x86_64 hardware extensions, and some insight into the workings of large hardware vendors. 

Understanding why AVX512 is useful not only in HPC, but also for gamers in emulation, or more efficient use of executions ports is a bonus. 

You don't need to have published 2 dozen papers on optimizing compute architecture. 

@FCLC
FCLC / PopOS_amdgpu.md
Last active April 26, 2024 00:12
Installing amdgpu-pro on popOS to enable OpenCL, ROCm and HIP

currently out of date as of September 2022, needs a fresh update

TLDR; edit the amdgpu-install script to add pop as supported debian distribution, comment out check for linux-modules-extra-[versions] since they're provided by linux-modules [per Jeremy Solle of System_76], then run with --no-dkms

3 EDIT: 4 things need to be done.

Problem 1) Pop!_OS not valid install target "Unsupported OS: /etc/os-release ID 'pop'" The issue as of now is that amdgpu-install doesnt recognize pop as a valid installation candidate. Solution is to add pop as a valid target in the amdgpu-install script.