Skip to content

Instantly share code, notes, and snippets.

@mingfeima
mingfeima / part_1_memory_format_and_channels_last_optimization.md
Last active June 28, 2024 11:03
PyTorch CPU Performance Optimization Tutorial - Section I
@nadavrot
nadavrot / Matrix.md
Last active July 16, 2024 10:05
Efficient matrix multiplication

High-Performance Matrix Multiplication

This is a short post that explains how to write a high-performance matrix multiplication program on modern processors. In this tutorial I will use a single core of the Skylake-client CPU with AVX2, but the principles in this post also apply to other processors with different instruction sets (such as AVX512).

Intro

Matrix multiplication is a mathematical operation that defines the product of

@xianyi
xianyi / test_cblas_dgemm.c
Created October 11, 2013 06:59
cc -static -o test_cblas_open test_cblas_dgemm.c -I /opt/OpenBLAS/include/ -L/opt/OpenBLAS/lib -lopenblas -lpthread -lgfortran
#include <cblas.h>
#include <stdio.h>
void main()
{
int i=0;
double A[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
double B[6] = {1.0,2.0,1.0,-3.0,4.0,-1.0};
double C[9] = {.5,.5,.5,.5,.5,.5,.5,.5,.5};
cblas_dgemm(CblasColMajor, CblasNoTrans, CblasTrans,3,3,2,1,A, 3, B, 3,2,C,3);