PharkMillups/gist:645889

## gistfile1.txt
21:01 <allen_> just tested, and found out row data size 2000 bytes and 4000 bytes does not have differences. interesting...

21:02 <benblack> differences in what?

21:02 <benblack> (and what's a row? ;) )

21:03 <allen_> different in data size. row as in db, i meant

21:04 <allen_> fyi, my case is disk io intensive.

21:07 <benblack> are you testing a relational database?

21:07 <allen_> benblack: i think u r mocking me, i m testing riak with bitcask

21:07 <benblack> i would not expect a difference between 2k and 4k. the OS read ahead
behavior makes it likely it is reading about the same in both cases.

21:08 <benblack> allen_: i am wondering why you are calling them rows. helps not to use
relational terminology in this.

21:10 <allen_> benblack: thanks for correctiion. if I change the data size 4100, it
will be very different, i guess.

21:10 <benblack> again, depends on what the OS is doing underneath.

21:11 <benblack> are you just reading the same document over and over?

21:11 <allen_> y, i already tested with 5000 bytes, it was not good

21:11 <benblack> generally, really, how are you testing?

21:11 <allen_> tsung

21:12 <benblack> through http?

21:12 <allen_> yes

21:12 <benblack> if your goal is performance, you know the protobufs interface is _much_ faster?

21:12 <allen_> y, i know. few ms is not a big deal for me.

21:12 <benblack> it's not a few ms

21:13 <allen_> then how much?

21:13 <benblack> tsung is a tool, but doesn't tell me how you are testing

21:13 <benblack> how many documents? what access pattern?

21:14 <benblack> is your working set larger than memory in the clsuter?

21:14 <benblack> what r/w/n_vals?

21:15 <allen_> 30M documents, 4:1 r:w, yes larger than memory. r:1, w:1, n:2 I tested

21:15 <benblack> your working set or your dataset is larger than memory?

21:16 <allen_> don't know the meaning of working set.

21:16 <benblack> the set of things most accessed

21:16 <benblack> is your access pattern completely random across all 30M documents?

21:17 <allen_> yes, i use uniform acess

21:17 <benblack> is that your actual access pattern?

21:17 <allen_> yes

21:17 <benblack> have you tested this with other databases?

21:17 <benblack> with the exact same hardware

21:17 <allen_> nope

21:17 <benblack> ok

21:17 <benblack> here's the situation

21:17 <benblack> it doesn't matter what db you use

21:18 <benblack> you are describing the worst case scenario

21:18 <benblack> you either need to increase the total RAM in your cluster to allow
your entire dataset to be in cache or you need SSDs

21:18 <benblack> or you just accept the latency of going to disk for the constant misses

21:19 <allen_> kool, thanks I will recommend that

21:20 <benblack> it's much more common that access patterns across datasets are heavily
biased to a subset of the data

21:20 <benblack> so you can have much less RAM than the total dataset of only rarely
need to hit disk

21:21 <allen_> k, question how much pb is faster than http access?

21:21 <benblack> as you obviously know, some apps just have random/uniform access
across their entire dataset

21:21 <benblack> you'd need to measure for your app, but you could see throughput
more than double (assuming your throughput isn't dominated by disk latency)

21:21 <benblack> something to test

21:22 <benblack> have you tried using basho_bench?

21:22 <allen_> yes

21:22 <allen_> basho_bench is serializing requests in a worker, and doing its best.

21:23 <benblack> have you increase the number of works?

21:23 <benblack> workers

21:24 <allen_> I did for the worst case, it did not give me a single error.

21:24 <benblack> asking something different: you said it is serializing requests in a
single worker. you increased the number of workers and all requests went through only 1?

21:25 <allen_> i meant serializing requests in a worker, i meant

21:25 <benblack> right, so you increase workers

21:25 <benblack> how many workers did you use?

21:25 <allen_> increased workers up to 100

21:26 <benblack> what mode?

21:26 <benblack> and what hardware on server vs client

21:26 <allen_> max mode.

21:27 <allen_> is hardware relevant for basho_bench?

21:27 <benblack> the relative performance of the client and server is

21:32 <allen_> to answer it, the same hardware on server and client. solaris

21:37 <benblack> with how many nodes in the cluster?

21:37 <allen_> 5

21:37 <benblack> how many clients?

21:38 <allen_> only one client, bash_bench test, I think i misunderstood.

21:38 <benblack> how many client machines?


21:40 <allen_> benblack: client machines? i m doing load testing, sending requests from
load tester to riak servers.

21:41 <benblack> allen_: i understand, my question is how many "load tester"
machines you are using

21:41 <allen_> oh.. one machine

21:42 <benblack> allen_: can i suggest there is a serious flaw in your methodology?

21:42 <allen_> sure

21:42 <benblack> you have a 5 node cluster

21:42 <benblack> and you are testing from 1 machine

21:43 <benblack> it is entirely possible you are running out of capacity (cpu or network bandwidth)
on that test machine

21:43 <benblack> so the performance limit you are seeing is not riak at all

21:43 <benblack> are you distributing the request load across all 5 cluster nodes or sending
all requests to a single node?

21:44 <allen_> it's in the same DC, and sending request to 5 node, round-robin

21:44 <benblack> allen_: what throughput are you using with that setup?

21:45 <DeadZen> a single load testing server should have like 3 network cards ;)

21:45 <benblack> s/using/seeing/ with that setup, allen_

21:46 <allen_> benblack: 14ms/sec

21:46 <benblack> since, for example, riak requires entire objects be written at once

21:47 <benblack> allen_: sorry, what?

21:47 <benblack> 14ms/sec? i don't understand

21:47 <allen_> sorry 14ms avg

21:47 <benblack> allen_: avg not so useful...what is the request rate?

21:48 <allen_> 1700tps

21:50 <benblack> with what size objects?

21:50 <allen_> 4K

21:50 <benblack> and with 2k?

21:50 <allen_> yes

21:50 <benblack> what is the CPU load on the test client during this?

21:51 <allen_> since it is vm, it varies, min 1.8, max 5.5 cpuload

21:51 <benblack> not load

21:51 <benblack> %

21:51 <benblack> but what you are telling me is you are most likely
maxing out your client

21:52 <allen_> I don't have data, but it was very low.

21:52 <benblack> it is capable of 1700 reqs/sec with your testing.

21:52 <benblack> is this on your own infrastructure or on EC2 or something?

21:53 <allen_> it's on Jouent cloud.

21:53 <benblack> oy vey

21:53 <allen_> ?

21:53 <benblack> here is my recommendation: run multiple test clients at once
on multiple machines

21:54 <benblack> (oy vey-> if you are so concerned about performance, use physical machines)

21:54 <benblack> i don't know what exactly your 5 cluster nodes are

21:54 <allen_> physical machines? u mean dediacated servers?

21:54 <benblack> you said you had strong performance requirements

21:54 <benblack> so do i

21:55 <benblack> that's why i use dedicated servers.

21:55 <allen_> y I wish I could, I just followed Bash blog.

21:55 <benblack> again, i don't know what the cluster nodes are, but what you are describing
sounds a lot like a client bottleneck, not a server side issue.

21:56 <allen_> client bottleneck, hmm .

21:58 <benblack> start multiple clients and run your tests from them at the same time.

21:58 <benblack> assuming you aren't bottlenecking on something else, i am guessing the
total throughput will be higher than 1700 reqs/sec.

22:00 <allen_> more than 5 client machines? costly..

22:00 <benblack> try 2.

22:00 <benblack> if things go faster, you are probably seeing a client bottleneck.

22:00 <allen_> kool

22:01 <allen_> http://blog.basho.com/category/joyent/

22:01 <allen_> that's how I have servers on Joyent

22:01 <benblack> i'm sure it's fine.

22:01 <benblack> you just need to benchmark better.

22:02 <benblack> (and tell arg to just open a socket)

22:03 <allen_> thanks benblack, I will use multiple clents and see the result

22:03 <allen_> gotta sleep
	21:01 <allen_> just tested, and found out row data size 2000 bytes and 4000 bytes does not have differences. interesting...

	21:02 <benblack> differences in what?

	21:02 <benblack> (and what's a row? ;) )

	21:03 <allen_> different in data size. row as in db, i meant

	21:04 <allen_> fyi, my case is disk io intensive.

	21:07 <benblack> are you testing a relational database?

	21:07 <allen_> benblack: i think u r mocking me, i m testing riak with bitcask

	21:07 <benblack> i would not expect a difference between 2k and 4k. the OS read ahead
	behavior makes it likely it is reading about the same in both cases.

	21:08 <benblack> allen_: i am wondering why you are calling them rows. helps not to use
	relational terminology in this.

	21:10 <allen_> benblack: thanks for correctiion. if I change the data size 4100, it
	will be very different, i guess.

	21:10 <benblack> again, depends on what the OS is doing underneath.

	21:11 <benblack> are you just reading the same document over and over?

	21:11 <allen_> y, i already tested with 5000 bytes, it was not good

	21:11 <benblack> generally, really, how are you testing?

	21:11 <allen_> tsung

	21:12 <benblack> through http?

	21:12 <allen_> yes

	21:12 <benblack> if your goal is performance, you know the protobufs interface is _much_ faster?

	21:12 <allen_> y, i know. few ms is not a big deal for me.

	21:12 <benblack> it's not a few ms

	21:13 <allen_> then how much?

	21:13 <benblack> tsung is a tool, but doesn't tell me how you are testing

	21:13 <benblack> how many documents? what access pattern?

	21:14 <benblack> is your working set larger than memory in the clsuter?

	21:14 <benblack> what r/w/n_vals?

	21:15 <allen_> 30M documents, 4:1 r:w, yes larger than memory. r:1, w:1, n:2 I tested

	21:15 <benblack> your working set or your dataset is larger than memory?

	21:16 <allen_> don't know the meaning of working set.

	21:16 <benblack> the set of things most accessed

	21:16 <benblack> is your access pattern completely random across all 30M documents?

	21:17 <allen_> yes, i use uniform acess

	21:17 <benblack> is that your actual access pattern?

	21:17 <allen_> yes

	21:17 <benblack> have you tested this with other databases?

	21:17 <benblack> with the exact same hardware

	21:17 <allen_> nope

	21:17 <benblack> ok

	21:17 <benblack> here's the situation

	21:17 <benblack> it doesn't matter what db you use

	21:18 <benblack> you are describing the worst case scenario

	21:18 <benblack> you either need to increase the total RAM in your cluster to allow
	your entire dataset to be in cache or you need SSDs

	21:18 <benblack> or you just accept the latency of going to disk for the constant misses

	21:19 <allen_> kool, thanks I will recommend that

	21:20 <benblack> it's much more common that access patterns across datasets are heavily
	biased to a subset of the data

	21:20 <benblack> so you can have much less RAM than the total dataset of only rarely
	need to hit disk

	21:21 <allen_> k, question how much pb is faster than http access?

	21:21 <benblack> as you obviously know, some apps just have random/uniform access
	across their entire dataset

	21:21 <benblack> you'd need to measure for your app, but you could see throughput
	more than double (assuming your throughput isn't dominated by disk latency)

	21:21 <benblack> something to test

	21:22 <benblack> have you tried using basho_bench?

	21:22 <allen_> yes

	21:22 <allen_> basho_bench is serializing requests in a worker, and doing its best.

	21:23 <benblack> have you increase the number of works?

	21:23 <benblack> workers

	21:24 <allen_> I did for the worst case, it did not give me a single error.

	21:24 <benblack> asking something different: you said it is serializing requests in a
	single worker. you increased the number of workers and all requests went through only 1?

	21:25 <allen_> i meant serializing requests in a worker, i meant

	21:25 <benblack> right, so you increase workers

	21:25 <benblack> how many workers did you use?

	21:25 <allen_> increased workers up to 100

	21:26 <benblack> what mode?

	21:26 <benblack> and what hardware on server vs client

	21:26 <allen_> max mode.

	21:27 <allen_> is hardware relevant for basho_bench?

	21:27 <benblack> the relative performance of the client and server is

	21:32 <allen_> to answer it, the same hardware on server and client. solaris

	21:37 <benblack> with how many nodes in the cluster?

	21:37 <allen_> 5

	21:37 <benblack> how many clients?

	21:38 <allen_> only one client, bash_bench test, I think i misunderstood.

	21:38 <benblack> how many client machines?


	21:40 <allen_> benblack: client machines? i m doing load testing, sending requests from
	load tester to riak servers.

	21:41 <benblack> allen_: i understand, my question is how many "load tester"
	machines you are using

	21:41 <allen_> oh.. one machine

	21:42 <benblack> allen_: can i suggest there is a serious flaw in your methodology?

	21:42 <allen_> sure

	21:42 <benblack> you have a 5 node cluster

	21:42 <benblack> and you are testing from 1 machine

	21:43 <benblack> it is entirely possible you are running out of capacity (cpu or network bandwidth)
	on that test machine

	21:43 <benblack> so the performance limit you are seeing is not riak at all

	21:43 <benblack> are you distributing the request load across all 5 cluster nodes or sending
	all requests to a single node?

	21:44 <allen_> it's in the same DC, and sending request to 5 node, round-robin

	21:44 <benblack> allen_: what throughput are you using with that setup?

	21:45 <DeadZen> a single load testing server should have like 3 network cards ;)

	21:45 <benblack> s/using/seeing/ with that setup, allen_

	21:46 <allen_> benblack: 14ms/sec

	21:46 <benblack> since, for example, riak requires entire objects be written at once

	21:47 <benblack> allen_: sorry, what?

	21:47 <benblack> 14ms/sec? i don't understand

	21:47 <allen_> sorry 14ms avg

	21:47 <benblack> allen_: avg not so useful...what is the request rate?

	21:48 <allen_> 1700tps

	21:50 <benblack> with what size objects?

	21:50 <allen_> 4K

	21:50 <benblack> and with 2k?

	21:50 <allen_> yes

	21:50 <benblack> what is the CPU load on the test client during this?

	21:51 <allen_> since it is vm, it varies, min 1.8, max 5.5 cpuload

	21:51 <benblack> not load

	21:51 <benblack> %

	21:51 <benblack> but what you are telling me is you are most likely
	maxing out your client

	21:52 <allen_> I don't have data, but it was very low.

	21:52 <benblack> it is capable of 1700 reqs/sec with your testing.

	21:52 <benblack> is this on your own infrastructure or on EC2 or something?

	21:53 <allen_> it's on Jouent cloud.

	21:53 <benblack> oy vey

	21:53 <allen_> ?

	21:53 <benblack> here is my recommendation: run multiple test clients at once
	on multiple machines

	21:54 <benblack> (oy vey-> if you are so concerned about performance, use physical machines)

	21:54 <benblack> i don't know what exactly your 5 cluster nodes are

	21:54 <allen_> physical machines? u mean dediacated servers?

	21:54 <benblack> you said you had strong performance requirements

	21:54 <benblack> so do i

	21:55 <benblack> that's why i use dedicated servers.

	21:55 <allen_> y I wish I could, I just followed Bash blog.

	21:55 <benblack> again, i don't know what the cluster nodes are, but what you are describing
	sounds a lot like a client bottleneck, not a server side issue.

	21:56 <allen_> client bottleneck, hmm .

	21:58 <benblack> start multiple clients and run your tests from them at the same time.

	21:58 <benblack> assuming you aren't bottlenecking on something else, i am guessing the
	total throughput will be higher than 1700 reqs/sec.

	22:00 <allen_> more than 5 client machines? costly..

	22:00 <benblack> try 2.

	22:00 <benblack> if things go faster, you are probably seeing a client bottleneck.

	22:00 <allen_> kool

	22:01 <allen_> http://blog.basho.com/category/joyent/

	22:01 <allen_> that's how I have servers on Joyent

	22:01 <benblack> i'm sure it's fine.

	22:01 <benblack> you just need to benchmark better.

	22:02 <benblack> (and tell arg to just open a socket)

	22:03 <allen_> thanks benblack, I will use multiple clents and see the result

	22:03 <allen_> gotta sleep