andrewschaaf/gist:889486

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
    QuantCup


http://www.quantcup.org/
http://www.quantcup.org/home/spec
http://www.quantcup.org/home/judgeplatform

Constraints

#define MAX_LIVE_ORDERS 65536

typedef uint64 t_orderid;
typedef uint64 t_ordersize;
typedef uint16 t_price;

CPU

$   cat /proc/cpuinfo
processor     : 0
vendor_id     : GenuineIntel
cpu family    : 6
model         : 23
model name    : Intel(R) Core(TM)2 Duo CPU     U9600  @ 1.60GHz
stepping      : 10
cpu MHz       : 1601.000
cache size    : 3072 KB
...
flags         : fpu vme de pse tsc msr pae mce cx8 apic mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe nx lm constant_tsc arch_perfmon pebs bts aperfmperf pni dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm sse4_1 xsave lahf_lm ida dts tpr_shadow vnmi flexpriority
bogomips      : 3191.44
clflush size  : 64
cache_alignment  : 64

Some thoughts for an attempt

Caches


http://en.wikipedia.org/wiki/CPU_cache
http://x264dev.blogspot.com/2008/05/cacheline-splits-aka-intel-hell.html

         (cycles)    (ns)
Register     1     <  1
L1           4     ~  1
L2          10     ~  3
L3          42     ~ 15
DRAM               ~ 65

Pipelining

"Core 2 is a 'four wide' architecture
  meaning that it can issue and retire four instructions per clock cycle."

"Core 2 Duo and Core 2 Extreme use a 14-stage pipeline"

Data

Sorted set of active prices

Interface

insert(score, payload)
remove(score) or remove(payload)
min(score, payload)

Implementation

skip list?

score:      uint16 price
payload:    NodeRef first, NodeRef last

(Doubly-linked-list of orderids) for each price

Interface

void        rpush(last_node, orderid)
orderid     lpop(first_node)

Implementation

TODO: consider unrolling
typedef uint16 NodeRef;

struct {
  NodeRef prev;     // 2
  NodeRef next      // 2
} Node;

Node nodes[65536];              // 262,144
NodeRef free_nodes[65536];      // 131,072 -- and the locality will be awesome!
uint32 num_free_nodes;          // 4
                                // total: 393,220 bytes -- cache-tastic!

// only needed when there's a cross -- and crosses are rare
uint64 orderid_for_node[65536]  // 524,288