PharkMillups/16

## 16
duiod # Are there any published benchmarks at all? of riak, in general (ideally with small objects).

benblack # there are, i believe, the basho folks would be the right ones to ask when they are around

duiod # Cool, I'll stick around.

benblack # http://twitter.com/dizzyco/status/13014285189

benblack # http://pl.atyp.us/wordpress/?p=2868

duiod # Yes, I read those already, but curious how much overhead riak itself adds to that.

arg # hi. someone was looking for perf numbers?

duiod # arg: that'd be me.

arg # we're releasing our benchmarking tools publicly soon

duiod # re: the perf numbers. I'd do some tests myself, but, there's no C/C++ client, so I'd haveto write something that used the protocol buffers stuff. So, trying to avoid that by just asking for numbers here.

arg # duiod: c protobufs api coming soon. what size objects and what read/write ratio?

duiod # each object I'm writing is ~1.5KB. Reads are uniformly random, I've modeled my data with Cassandra's data model atm (i.e, supercolumns), but, essentially it's something like: 40% write, 50% re-write, 10% read.
and by re-write, I mean add something to an existing key.

arg # ok what kinda boxes?
i'd suggest asking on the riak-users list so you reach the whole dev team and they can probably do a better benchmark for you

duiod # 5-10 7200 2TB disks per box, 3 boxes to start with. dual quad 5520s, 24G RAM.

arg # well i can do about 1500 1.5k writes/sec against a single node on my mac pro
thru the (slow) python protobuffs client

duiod # according to some rough napkin math, it works out to around ~2-300GB/data day, with all the indexes and such.

arg # reads are less expensive in riak than cassandra

duiod # which storage engine? bitcask or inno?

arg # bitcask

arg # i'd suggest using bitcask for new stuff theres a few rough edges still but it's going to be the default soon

duiod # 1500 seems quite slow, I assume that's limited by the client, not the server process?

arg # well thats with fsync() after each write

duiod # ah

arg # and thru a slow python protobufs library on a macbook and its also doing an extra get() on each write by default it gets the object back that it just put so you get any updates that may have happened concurrently with your write
let me turn that off and see what happens obviously these are not real benchmarks

duiod # Right, I'm just looking for some rough numbers to compare against cassandra right now. Primarily because of vnodes with riak Which I think will cause me far less pain in the long run..

arg # what do you plan to do with the data i mean query access is it mostly by key? or do you need range-requests or secondary indices

duiod # Yes, all by key. Each key has all the keys of other data that might be associated with it, etc. And data is written out to multiple keys. A single read probably triggers 20-50 other random reads, but its all point queries by key in other CFs.

arg # yeah you can simulate that kinda stuff in riak by using the same key in different buckets
to store the associations
or use links

duiod # yea, I'm still getting familiar with the terms Is there anything like a supercolumn(i.e, data stored contiguously + insert new cells without replacing the entire structure) within a key, with riak? or are links the way to go for that? (as links sound like they'd be less performant, as "links" aren't stored contiguously)

arg # no "upsert" functionality you have to write a new doc but you can break doc up into linked docs or docs with same key in different buckets to handle stuff that needs to be updated separately

duiod # i see, that's a big downside for me :(
	duiod # Are there any published benchmarks at all? of riak, in general (ideally with small objects).

	benblack # there are, i believe, the basho folks would be the right ones to ask when they are around

	duiod # Cool, I'll stick around.

	benblack # http://twitter.com/dizzyco/status/13014285189

	benblack # http://pl.atyp.us/wordpress/?p=2868

	duiod # Yes, I read those already, but curious how much overhead riak itself adds to that.

	arg # hi. someone was looking for perf numbers?

	duiod # arg: that'd be me.

	arg # we're releasing our benchmarking tools publicly soon

	duiod # re: the perf numbers. I'd do some tests myself, but, there's no C/C++ client, so I'd haveto write something that used the protocol buffers stuff. So, trying to avoid that by just asking for numbers here.

	arg # duiod: c protobufs api coming soon. what size objects and what read/write ratio?

	duiod # each object I'm writing is ~1.5KB. Reads are uniformly random, I've modeled my data with Cassandra's data model atm (i.e, supercolumns), but, essentially it's something like: 40% write, 50% re-write, 10% read.
	and by re-write, I mean add something to an existing key.

	arg # ok what kinda boxes?
	i'd suggest asking on the riak-users list so you reach the whole dev team and they can probably do a better benchmark for you

	duiod # 5-10 7200 2TB disks per box, 3 boxes to start with. dual quad 5520s, 24G RAM.

	arg # well i can do about 1500 1.5k writes/sec against a single node on my mac pro
	thru the (slow) python protobuffs client

	duiod # according to some rough napkin math, it works out to around ~2-300GB/data day, with all the indexes and such.

	arg # reads are less expensive in riak than cassandra

	duiod # which storage engine? bitcask or inno?

	arg # bitcask

	arg # i'd suggest using bitcask for new stuff theres a few rough edges still but it's going to be the default soon

	duiod # 1500 seems quite slow, I assume that's limited by the client, not the server process?

	arg # well thats with fsync() after each write

	duiod # ah

	arg # and thru a slow python protobufs library on a macbook and its also doing an extra get() on each write by default it gets the object back that it just put so you get any updates that may have happened concurrently with your write
	let me turn that off and see what happens obviously these are not real benchmarks

	duiod # Right, I'm just looking for some rough numbers to compare against cassandra right now. Primarily because of vnodes with riak Which I think will cause me far less pain in the long run..

	arg # what do you plan to do with the data i mean query access is it mostly by key? or do you need range-requests or secondary indices

	duiod # Yes, all by key. Each key has all the keys of other data that might be associated with it, etc. And data is written out to multiple keys. A single read probably triggers 20-50 other random reads, but its all point queries by key in other CFs.

	arg # yeah you can simulate that kinda stuff in riak by using the same key in different buckets
	to store the associations
	or use links

	duiod # yea, I'm still getting familiar with the terms Is there anything like a supercolumn(i.e, data stored contiguously + insert new cells without replacing the entire structure) within a key, with riak? or are links the way to go for that? (as links sound like they'd be less performant, as "links" aren't stored contiguously)

	arg # no "upsert" functionality you have to write a new doc but you can break doc up into linked docs or docs with same key in different buckets to handle stuff that needs to be updated separately

	duiod # i see, that's a big downside for me :(