JasonCC JasonCC

## analysis.draft.md

      
              1 file
            
          
              54 forks
            
          
              0 comments
            
          
              337 stars
            
          
                MattPD
                / analysis.draft.md
            
            
              Last active
              June 22, 2024 07:19
            
              
                Program Analysis Resources (WIP draft) 
              
          
    Program Analysis Resources

(draft; work in progress)
See also:

Compilers

correctness


Program analysis:
Dynamic analysis - instrumentation, translation, sanitizers


## Matrix.md

      
              7 files
            
          
              73 forks
            
          
              17 comments
            
          
              867 stars
            
          
                nadavrot
                / Matrix.md
            
            
              Last active
              July 1, 2024 17:31
            
              
                Efficient matrix multiplication
              
          
    High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix
multiplication program on modern processors. In this tutorial I will use a
single core of the Skylake-client CPU with AVX2, but the principles in this post
also apply to other processors with different instruction sets (such as AVX512).
Intro

Matrix multiplication is a mathematical operation that defines the product of

  
## iterm2-solarized.md

      
              2 files
            
          
              1628 forks
            
          
              493 comments
            
          
              8809 stars
            
          
                kevin-smets
                / iterm2-solarized.md
            
            
              Last active
              July 2, 2024 12:33
            
              
                iTerm2 + Oh My Zsh + Solarized color scheme + Source Code Pro Powerline + Font Awesome + [Powerlevel10k] - (macOS)
              
          
    Default


Powerlevel10k


## magic_ring.cpp
#define _CRT_SECURE_NO_DEPRECATE

#include <stdio.h>
#include <string.h>
#include <Windows.h>

// This allocates a "magic ring buffer" that is mapped twice, with the two
// copies being contiguous in (virtual) memory. The advantage of this is
// that this allows any function that expects data to be contiguous in
// memory to read from (or write to) such a buffer. It also means that

## latency.txt
Latency Comparison Numbers (~2012)
----------------------------------
L1 cache reference                           0.5 ns
Branch mispredict                            5   ns
L2 cache reference                           7   ns                      14x L1 cache
Mutex lock/unlock                           25   ns
Main memory reference                      100   ns                      20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy             3,000   ns        3 us
Send 1K bytes over 1 Gbps network       10,000   ns       10 us
Read 4K randomly from SSD*             150,000   ns      150 us          ~1GB/sec SSD
	#define _CRT_SECURE_NO_DEPRECATE

	#include <stdio.h>
	#include <string.h>
	#include <Windows.h>

	// This allocates a "magic ring buffer" that is mapped twice, with the two
	// copies being contiguous in (virtual) memory. The advantage of this is
	// that this allows any function that expects data to be contiguous in
	// memory to read from (or write to) such a buffer. It also means that
	Latency Comparison Numbers (~2012)
	----------------------------------
	L1 cache reference 0.5 ns
	Branch mispredict 5 ns
	L2 cache reference 7 ns 14x L1 cache
	Mutex lock/unlock 25 ns
	Main memory reference 100 ns 20x L2 cache, 200x L1 cache
	Compress 1K bytes with Zippy 3,000 ns 3 us
	Send 1K bytes over 1 Gbps network 10,000 ns 10 us
	Read 4K randomly from SSD* 150,000 ns 150 us ~1GB/sec SSD