Skip to content

Instantly share code, notes, and snippets.

@5kg
5kg / latency.txt
Created September 5, 2012 09:26 — forked from jboner/latency.txt
Latency Numbers Every Programmer Should Know
Latency Comparison Numbers
--------------------------
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20x L2 cache, 200x L1 cache
Compress 1K bytes with Zippy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD* 150,000 ns 0.15 ms
A few weeks ago Linus Torvalds answered some questions (http://meta.slashdot.org/story/12/10/11/0030249/linus-torvalds-answers-your-questions) on slashdot. All his responses make good reading but one in particular caught my eye. Asked to describe his favourite kernel hack, Torvalds grumbles he rarely looks at code these days — unless it’s to sort out someone else’s mess. He then pauses to admit he’s proud of the kernel’s fiendishly cunning filename lookup cache before continuing to moan about incompetence.
几周前,Linus Torvalds 在 slashdot 上回答了一些问题。他的所有回答都值得一读,而其中之一尤其引起我的注意。当问及最喜欢的内核技巧,Tarvalds 说道,近来已经很少读代码 -- 除非需要修复他人引发的问题。随后,他承认自己为巧妙精密的文件名查找缓存的实现而自豪,但随即便抱怨起大多数人的能力缺陷。
____
At the opposite end of the spectrum, I actually wish more people understood the really core low-level kind of coding. Not big, complex stuff like the lockless name lookup, but simply good use of pointers-to-pointers etc. For example, I’ve seen too many people who delete a singly-linked list entry by keeping track of the prev entry, and then t
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#include <assert.h>
#define tic() do { struct timespec ts_start, ts_end; clock_gettime(CLOCK_MONOTONIC, &ts_start)
#define toc() clock_gettime(CLOCK_MONOTONIC, &ts_end); \
printf("%lfs\n", (ts_end.tv_sec - ts_start.tv_sec) + (double)(ts_end.tv_nsec - ts_start.tv_nsec)/1e9); } \
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
#include <string.h>
#include <assert.h>
#define tic() do { struct timespec ts_start, ts_end; clock_gettime(CLOCK_MONOTONIC, &ts_start)
#define toc() clock_gettime(CLOCK_MONOTONIC, &ts_end); \
printf("%lfs\n", (ts_end.tv_sec - ts_start.tv_sec) + (double)(ts_end.tv_nsec - ts_start.tv_nsec)/1e9); } \
@5kg
5kg / output
Created January 24, 2013 06:24
0.900109s
1.508606s
操作系统编程中,程序的性能至关重要。现代计算机体系结构中,CPU缓存对性能起着至关重要的影响。本系列文章通过实验,说明一些合理利用CPU缓存的建议。
== 顺序存取内存
本文通过简单的并行数组求和的程序,来对缓存性能进行测试。这里实现了两个函数 `foo()` 和 `bar()` ,其中 `foo()` 交错的读取数组元素进行求和,而 `bar()` 则将数组分块后,顺序读取求和。程序如下:
[source, c]
----
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
操作系统编程中,程序的性能至关重要。现代计算机体系结构中,CPU缓存对性能起着至关重要的影响。本系列文章通过实验,说明一些合理利用CPU缓存的建议。
== 避免缓存同步
上一篇中提到的并行数组求和的程序,还能不能进一步优化呢?这里我们做一个简单的修改,每次求和不把结果写回 `s` ,而是存入临时变量 `sum` ,最后再复制到 `s` 里。请看如下代码:
[source, c]
----
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
操作系统编程中,程序的性能至关重要。现代计算机体系结构中,CPU缓存对性能起着至关重要的影响。本系列文章通过实验,说明一些合理利用CPU缓存的建议。
== 保持数据的局部性
这一次,依然在第一篇的基础上修改, `foo` 完全不变,而仅仅是把 `bar()` ,改为交错的读取内存。代码如下:
[source, c]
----
#include <time.h>
#include <stdio.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#define N 65536
#define T 10000
#define SIZE 63
struct {
char s[SIZE];
@5kg
5kg / jhwhw.cls
Created April 6, 2013 13:15 — forked from jhwilson/jhwhw.cls
%=====================================================================
% jhwhw.cls
% Provide jhwhw.cls class
%=====================================================================
%=====================================================================
% Identification
%=====================================================================
\NeedsTeXFormat{LaTeX2e}
\ProvidesClass{jhwhw}[2009/02/11 Justin Wilson's Homework Class]