Skip to content

Instantly share code, notes, and snippets.

@PharkMillups PharkMillups/16
Created May 17, 2010

Embed
What would you like to do?
duiod # Are there any published benchmarks at all? of riak, in general (ideally with small objects).
benblack # there are, i believe, the basho folks would be the right ones to ask when they are around
duiod # Cool, I'll stick around.
benblack # http://twitter.com/dizzyco/status/13014285189
benblack # http://pl.atyp.us/wordpress/?p=2868
duiod # Yes, I read those already, but curious how much overhead riak itself adds to that.
arg # hi. someone was looking for perf numbers?
duiod # arg: that'd be me.
arg # we're releasing our benchmarking tools publicly soon
duiod # re: the perf numbers. I'd do some tests myself, but, there's no C/C++ client, so I'd haveto write something that used the protocol buffers stuff. So, trying to avoid that by just asking for numbers here.
arg # duiod: c protobufs api coming soon. what size objects and what read/write ratio?
duiod # each object I'm writing is ~1.5KB. Reads are uniformly random, I've modeled my data with Cassandra's data model atm (i.e, supercolumns), but, essentially it's something like: 40% write, 50% re-write, 10% read.
and by re-write, I mean add something to an existing key.
arg # ok what kinda boxes?
i'd suggest asking on the riak-users list so you reach the whole dev team and they can probably do a better benchmark for you
duiod # 5-10 7200 2TB disks per box, 3 boxes to start with. dual quad 5520s, 24G RAM.
arg # well i can do about 1500 1.5k writes/sec against a single node on my mac pro
thru the (slow) python protobuffs client
duiod # according to some rough napkin math, it works out to around ~2-300GB/data day, with all the indexes and such.
arg # reads are less expensive in riak than cassandra
duiod # which storage engine? bitcask or inno?
arg # bitcask
arg # i'd suggest using bitcask for new stuff theres a few rough edges still but it's going to be the default soon
duiod # 1500 seems quite slow, I assume that's limited by the client, not the server process?
arg # well thats with fsync() after each write
duiod # ah
arg # and thru a slow python protobufs library on a macbook and its also doing an extra get() on each write by default it gets the object back that it just put so you get any updates that may have happened concurrently with your write
let me turn that off and see what happens obviously these are not real benchmarks
duiod # Right, I'm just looking for some rough numbers to compare against cassandra right now. Primarily because of vnodes with riak Which I think will cause me far less pain in the long run..
arg # what do you plan to do with the data i mean query access is it mostly by key? or do you need range-requests or secondary indices
duiod # Yes, all by key. Each key has all the keys of other data that might be associated with it, etc. And data is written out to multiple keys. A single read probably triggers 20-50 other random reads, but its all point queries by key in other CFs.
arg # yeah you can simulate that kinda stuff in riak by using the same key in different buckets
to store the associations
or use links
duiod # yea, I'm still getting familiar with the terms Is there anything like a supercolumn(i.e, data stored contiguously + insert new cells without replacing the entire structure) within a key, with riak? or are links the way to go for that? (as links sound like they'd be less performant, as "links" aren't stored contiguously)
arg # no "upsert" functionality you have to write a new doc but you can break doc up into linked docs or docs with same key in different buckets to handle stuff that needs to be updated separately
duiod # i see, that's a big downside for me :(
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.