- http://www.quantcup.org/
- http://www.quantcup.org/home/spec
- http://www.quantcup.org/home/judgeplatform
#define MAX_LIVE_ORDERS 65536 typedef uint64 t_orderid; typedef uint64 t_ordersize; typedef uint16 t_price;
$ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Core(TM)2 Duo CPU U9600 @ 1.60GHz stepping : 10 cpu MHz : 1601.000 cache size : 3072 KB ... flags : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dts tpr_shadow vnmi flexpriority bogomips : 3191.44 clflush size : 64 cache_alignment : 64
- http://en.wikipedia.org/wiki/CPU_cache
- http://x264dev.blogspot.com/2008/05/cacheline-splits-aka-intel-hell.html
(cycles) (ns) Register 1 < 1 L1 4 ~ 1 L2 10 ~ 3 L3 42 ~ 15 DRAM ~ 65
"Core 2 is a 'four wide' architecture meaning that it can issue and retire four instructions per clock cycle." "Core 2 Duo and Core 2 Extreme use a 14-stage pipeline"
insert(score, payload) remove(score) or remove(payload) min(score, payload)
skip list? score: uint16 price payload: NodeRef first, NodeRef last
void rpush(last_node, orderid) orderid lpop(first_node)
TODO: consider unrolling
typedef uint16 NodeRef; struct { NodeRef prev; // 2 NodeRef next // 2 } Node; Node nodes[65536]; // 262,144 NodeRef free_nodes[65536]; // 131,072 -- and the locality will be awesome! uint32 num_free_nodes; // 4 // total: 393,220 bytes -- cache-tastic! // only needed when there's a cross -- and crosses are rare uint64 orderid_for_node[65536] // 524,288