This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package main | |
import ( | |
"fmt" | |
"github.com/golang/protobuf/proto" | |
pb "github.com/taskgraph/taskgraph/example/bwmf/proto" | |
"io/ioutil" | |
"os" | |
"strconv" | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
diff --git a/example/bwmf/kldiv_loss.go b/example/bwmf/kldiv_loss.go | |
index 764a46d..3dbd771 100644 | |
--- a/example/bwmf/kldiv_loss.go | |
+++ b/example/bwmf/kldiv_loss.go | |
@@ -3,6 +3,8 @@ package bwmf | |
import ( | |
"math" | |
"sort" | |
+ "time" | |
+ "fmt" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
const char * const string; | |
static char *handle; | |
void foo() | |
{ | |
xxx(handle); | |
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
package main | |
import ( | |
"log" | |
"io/ioutil" | |
"github.com/golang/protobuf/proto" | |
"./example" | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Latency Comparison Numbers | |
-------------------------- | |
L1 cache reference 0.5 ns | |
Branch mispredict 5 ns | |
L2 cache reference 7 ns 14x L1 cache | |
Mutex lock/unlock 25 ns | |
Main memory reference 100 ns 20x L2 cache, 200x L1 cache | |
Compress 1K bytes with Zippy 3,000 ns | |
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms | |
Read 4K randomly from SSD* 150,000 ns 0.15 ms |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#include <time.h> | |
#include <stdio.h> | |
#include <stdlib.h> | |
#include <pthread.h> | |
#include <string.h> | |
#include <assert.h> | |
#define tic() do { struct timespec ts_start, ts_end; clock_gettime(CLOCK_MONOTONIC, &ts_start) | |
#define toc() clock_gettime(CLOCK_MONOTONIC, &ts_end); \ | |
printf("%lfs\n", (ts_end.tv_sec - ts_start.tv_sec) + (double)(ts_end.tv_nsec - ts_start.tv_nsec)/1e9); } \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
0.900109s | |
1.508606s |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
操作系统编程中,程序的性能至关重要。现代计算机体系结构中,CPU缓存对性能起着至关重要的影响。本系列文章通过实验,说明一些合理利用CPU缓存的建议。 | |
== 顺序存取内存 | |
本文通过简单的并行数组求和的程序,来对缓存性能进行测试。这里实现了两个函数 `foo()` 和 `bar()` ,其中 `foo()` 交错的读取数组元素进行求和,而 `bar()` 则将数组分块后,顺序读取求和。程序如下: | |
[source, c] | |
---- | |
#include <time.h> | |
#include <stdio.h> | |
#include <stdlib.h> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
操作系统编程中,程序的性能至关重要。现代计算机体系结构中,CPU缓存对性能起着至关重要的影响。本系列文章通过实验,说明一些合理利用CPU缓存的建议。 | |
== 避免缓存同步 | |
上一篇中提到的并行数组求和的程序,还能不能进一步优化呢?这里我们做一个简单的修改,每次求和不把结果写回 `s` ,而是存入临时变量 `sum` ,最后再复制到 `s` 里。请看如下代码: | |
[source, c] | |
---- | |
#include <time.h> | |
#include <stdio.h> | |
#include <stdlib.h> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
操作系统编程中,程序的性能至关重要。现代计算机体系结构中,CPU缓存对性能起着至关重要的影响。本系列文章通过实验,说明一些合理利用CPU缓存的建议。 | |
== 保持数据的局部性 | |
这一次,依然在第一篇的基础上修改, `foo` 完全不变,而仅仅是把 `bar()` ,改为交错的读取内存。代码如下: | |
[source, c] | |
---- | |
#include <time.h> | |
#include <stdio.h> | |
#include <stdlib.h> |