Skip to content

Instantly share code, notes, and snippets.

cmake_minimum_required(VERSION 2.8)
# create_source_group(relativeSourcePath sourceGroupName files)
#
# Creates a source group with the specified name relative to the relative path
# specified.
#
# Parameters:
# - sourceGroupName: Name of the source group to create.
# - relativeSourcePath: Relative path to the files.
Dstat 0.7.2 CSV output
Author: Dag Wieers <dag@wieers.com> URL: http://dag.wieers.com/home-made/dstat/
Host: bm1 User: azureuser
Cmdline: dstat -tcmrd --disk-util -ny --output stats.out Date: 12 Aug 2013 02:49:03 UTC
system total cpu usage memory usage io/total dsk/total sda sdb sr0 net/total system
time usr sys idl wai hiq siq used buff cach free read writ read writ util util util recv send int csw
12/8/2014 2:49 2.296 0.901 94.702 1.99 0 0.112 157990912 44568576 550817792 6556995584 2.465 20.54 54106.362 1821659.921 7.757 0.756 0.001 0 0 1155.05 333.637
12/8/2014 2:49 0 0.249 99.751 0 0 0 158093312 44568576 550805504 6556905472 0 0 0 0 0 0 0 252 1444 1040 144
12/8/2014 2:49 0 0 100 0 0 0 158220288 44568576 550805504 6556778496 0 0 0 0 0 0 0 66 530 1029 96

Indexes are 8 (to make it small enough to read) sorted 32bit integer segments. Each integer represents the record_id that matches the term. Each segment is stored in a Key/Value store.

Index A represents the rows that have the term "Canada" in a Country column. Index B represents the rows that have the term "Ontario" in a Province column.

Segments from both indexes will be read off disk using a Key/Value store and intersected to evaluate a conjunction query.

Index A | Index B
-----------------
     Segment 1

cmake_minimum_required(VERSION 2.6)
# create_source_group(relativeSourcePath sourceGroupName files)
#
# Creates a source group with the specified name relative to the relative path
# specified.
#
# Parameters:
# - sourceGroupName: Name of the source group to create.
# - relativeSourcePath: Relative path to the files.

$ likwid-perfctr -C S0:0@S1:0 -g MEM ./example 1

412,892,333/sec

+-----------------------------+-------------+---------+
|           Metric            |   core 0    | core 1  |
+-----------------------------+-------------+---------+
|     Runtime (RDTSC) [s]     |   7.81101   | 7.81101 |
|    Runtime unhalted [s]     | 0.00114729  | 5.27201 |

| CPI | 1.40394 | 2.24556 |

$ likwid-topology -g

-------------------------------------------------------------
CPU type:       Intel Core 2 45nm processor
*************************************************************
Hardware Thread Topology
*************************************************************
Sockets:        2
Cores per socket:       4

Threads per core: 1

I'm running some SSE and AVX instructions on Harpertown and Sandy Bridge systems and I noticed the Sandy Bridge system was able to scale to more cores before the performance flat lined. The Harpertowm system did not improve when using 2 threads over 1 thread. So I started to look into why.

Running with 1 thread:
likwid-perfctr -C S0:0@S1:0 -g MEM ./example 1

+-----------------------------+-------------+---------+
|           Metric            |   core 0    | core 1  |
+-----------------------------+-------------+---------+
|     Runtime (RDTSC) [s]     |    7.795    |  7.795  |

| Runtime unhalted [s] | 0.00113065 | 5.25528 |

Running 10s test @ http://skynet1:8080
  24 threads and 24 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   292.17us   91.68us  16.32ms   98.52%
    Req/Sec     3.44k   204.55     4.89k    58.86%
  Latency Distribution
     50%  285.00us
     75%  323.00us
     90%  347.00us

99% 389.00us

Plaintext 8 threads

Connections: 8		Haywire		Requests/sec:  57334.75
Connections: 8		Go		    Requests/sec:  42901.57
-------------------------------------------------------
Connections: 16		Haywire		Requests/sec:  71460.01
Connections: 16		Go		    Requests/sec:  71891.07
-------------------------------------------------------
Connections: 32		Haywire		Requests/sec:  78421.15
Connections: 32		Go		    Requests/sec:  89202.96

Go 1.2 JSON Serialization

This test is run on Azure so it is more comparable to the EC2 results on Techempower from here:
http://www.techempower.com/benchmarks/#section=data-r8&hw=ec2&test=json

Running 15s test @ http://server:8000/json
  8 threads and 8 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.99ms    5.74ms  24.37ms   87.76%

Req/Sec 0.86k 448.25 2.00k 69.91%