Skip to content

Instantly share code, notes, and snippets.

View JOE1994's full-sized avatar

Youngsuk Kim JOE1994

View GitHub Profile
@zingaburga
zingaburga / sve2.md
Last active June 4, 2024 08:54
ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

ARM’s Scalable Vector Extensions: A Critical Look at SVE2 For Integer Workloads

Scalable Vector Extensions (SVE) is ARM’s latest SIMD extension to their instruction set, which was announced back in 2016. A follow-up SVE2 extension was announced in 2019, designed to incorporate all functionality from ARM’s current primary SIMD extension, NEON (aka ASIMD).

Despite being announced 5 years ago, there is currently no generally available CPU which supports any form of SVE (which excludes the [Fugaku supercomputer](https://www.fujitsu.com/global/about/innovation/

Rust 2020 reflections

This is my third roadmap post... you can find the first two here:

This year, I am quite divided due to a bunch of competing desires:

  • I still want all of the things that I wanted last year. In particular, it would be great if OS developement could finally be done on stable rust. Things like inline asm are blocking this.
@cuviper
cuviper / all-cpus.txt
Last active October 23, 2023 17:25
BabelStream, OpenMP vs. Rayon
$ perf stat -d ./omp-stream
BabelStream
Version: 3.4
Implementation: OpenMP
Running kernels 100 times
Precision: double
Array size: 268.4 MB (=0.3 GB)
Total size: 805.3 MB (=0.8 GB)
Function MBytes/sec Min (sec) Max Average
Copy 108865.244 0.00493 0.05374 0.00834
@HenningTimm
HenningTimm / rust_mem_profiling.md
Last active May 4, 2024 03:48
Memory profiling Rust code with heaptrack in 2019
@HadrienG2
HadrienG2 / High_Performance_Rust.md
Last active July 2, 2024 08:11
Making Rust a perfect fit for high-performance computations

Hello, Rust community!

My name is Hadrien and I am a software performance engineer in a particle physics lab. My daily job is to figure out ways to make scientific software use hardware more efficiently without sacrificing its correctness, primarily by adapting old-ish codebases to the changes that occured in the software and computing landscape since the days where they were designed:

  • CPU clock rates and instruction-level parallelism stopped going up, so optimizing code is now more important.
  • Multi-core CPUs went from an exotic niche to a cheap commodity, so parallelism is not optional anymore.
  • Core counts grow faster than RAM prices go down, so multi-processing is not enough anymore.
  • SIMD vectors become wider and wider, so vectorization is not a gimmick anymore.
BISECT: running pass (1) Simplify the CFG on function (_ZN38_$LT$core..option..Option$LT$T$GT$$GT$5ok_or17h4281056672a8d3efE)
BISECT: running pass (2) SROA on function (_ZN38_$LT$core..option..Option$LT$T$GT$$GT$5ok_or17h4281056672a8d3efE)
BISECT: running pass (3) Early CSE on function (_ZN38_$LT$core..option..Option$LT$T$GT$$GT$5ok_or17h4281056672a8d3efE)
BISECT: running pass (4) Simplify the CFG on function (_ZN38_$LT$core..option..Option$LT$T$GT$$GT$6unwrap17hb5bd41d1ab85ed34E)
BISECT: running pass (5) SROA on function (_ZN38_$LT$core..option..Option$LT$T$GT$$GT$6unwrap17hb5bd41d1ab85ed34E)
BISECT: running pass (6) Early CSE on function (_ZN38_$LT$core..option..Option$LT$T$GT$$GT$6unwrap17hb5bd41d1ab85ed34E)
BISECT: running pass (7) Simplify the CFG on function (_ZN3std2rt10lang_start17h87222d106ff7b973E)
BISECT: running pass (8) SROA on function (_ZN3std2rt10lang_start17h87222d106ff7b973E)
BISECT: running pass (9) Early CSE on function (_ZN3std2rt10lang_start17h87222d106ff7b973E)
BISECT: running pass (10)
@gaearon
gaearon / minification.md
Last active June 8, 2024 08:15
How to Set Up Minification

In production, it is recommended to minify any JavaScript code that is included with your application. Minification can help your website load several times faster, especially as the size of your JavaScript source code grows.

Here's one way to set it up:

  1. Install Node.js
  2. Run npm init -y in your project folder (don't skip this step!)
  3. Run npm install terser

Now, to minify a file called like_button.js, run in the terminal:

@eira-fransham
eira-fransham / unsound-insert.rs
Last active November 27, 2019 03:35
Explanation of unsoundness in `SmallVec::insert_many`
extern crate smallvec;
use smallvec::SmallVec;
struct Printer(usize);
impl Drop for Printer {
fn drop(&mut self) {
println!("Dropping {}", self.0);
}
@shafik
shafik / WhatIsStrictAliasingAndWhyDoWeCare.md
Last active July 17, 2024 07:40
What is Strict Aliasing and Why do we Care?

What is the Strict Aliasing Rule and Why do we care?

(OR Type Punning, Undefined Behavior and Alignment, Oh My!)

What is strict aliasing? First we will describe what is aliasing and then we can learn what being strict about it means.

In C and C++ aliasing has to do with what expression types we are allowed to access stored values through. In both C and C++ the standard specifies which expression types are allowed to alias which types. The compiler and optimizer are allowed to assume we follow the aliasing rules strictly, hence the term strict aliasing rule. If we attempt to access a value using a type not allowed it is classified as undefined behavior(UB). Once we have undefined behavior all bets are off, the results of our program are no longer reliable.

Unfortunately with strict aliasing violations, we will often obtain the results we expect, leaving the possibility the a future version of a compiler with a new optimization will break code we th