-
-
Save ivansmf/6ec2197b69d1b7b26153 to your computer and use it in GitHub Desktop.
How we hit one million Cassandra writes on Google Compute Engine - Reproducing Results
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
A. Disclaimers: | |
1. The scripts are offered for instructional purposes | |
2. Cassandra.yaml and cassandra-env.sh are edited by one of the test scripts. We DO change the default settings to gain performance, and the new settings are specifically designed to work on n1-standard-8 data nodes. For instance, we set the Java heap size to a large value that might not fit on other VM types. | |
B. Assumptions: | |
The scripts assume that your username on the target VM is the same as the local development server. More specifically, that the output of `whoami` on your development server is the same as in the VM. | |
C. Prerequisites: | |
1. Download all test scripts to a local folder and untar it. | |
To download: wget http://storage.googleapis.com/p3rf-downloads/cassandra_1m_writes_per_sec_gist.tgz | |
To untar: tar xzf cassandra_1m_writes_per_sec_gist.tgz | |
2. Download Cassandra binary distribution tarball into the tarballs folder. You can find detailed instructions at http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installTarball_t.html | |
$ curl -L http://downloads.datastax.com/community/dsc.tar.gz | |
3. Download the Oracle Java 7 tarball into the tarballs folder (we used server-jre-7u40-linux-x64.tar.gz). You can replace step by installing the JDK, but we did not measure the performance impact of other releases. You can find the binary download here: http://docs.oracle.com/javase/7/docs/webnotes/install/linux/linux-server-jre.html (sign up required). | |
4. You'll need to replace [project_name] by an actual project name or ID. | |
D. Creating the Cassandra cluster and data loaders: | |
1. Create disks. | |
$ gcutil --project=[project_name] adddisk --zone=us-central1-b --wait_until_complete --size_gb=1000 `for i in {1..300}; do echo -n pd1t-$i " "; done` | |
2. Create data nodes. | |
$ gcutil --project=[project_name] addinstance --zone=us-central1-b --add_compute_key_to_project --auto_delete_boot_disk --automatic_restart --use_compute_key --wait_until_running --image=debian-7-wheezy-v20131120 --machine_type=n1-standard-8 `for i in {1..300}; do echo -n cas-$i " "; done` | |
3. Create loaders. | |
$ gcutil --project=[project_name] addinstance --zone=us-central1-b --add_compute_key_to_project --auto_delete_boot_disk --automatic_restart --use_compute_key --wait_until_running --image=debian-7-wheezy-v20131120 --machine_type=n1-highcpu-8 `for i in {1..30}; do echo -n l-$i " "; done` | |
4. Attach the disks to data nodes. | |
$ for i in {1..300}; do gcutil --project=[project_name] attachdisk --zone=us-central1-b --disk=pd1t-$i cas-$i; done | |
5. Authorize one of the loaders to ssh and rsync everywhere. Time to complete 5 minutes. | |
$ gcutil --project=[project_name] ssh --zone=us-central1-b l-1 | |
$ ssh-keygen -t rsa | |
$ exit | |
6. Download the public key | |
$ gcutil --project=[project_name] pull --zone=us-central1-b l-1 /home/`whoami`/.ssh/id_rsa.pub l-1.id_rsa.pub | |
7. Upload the key to all other VMs | |
$ for i in {1..30}; do gcutil --project=[project_name] push --zone=us-central1-b l-$i l-1.id_rsa.pub /home/`whoami`/.ssh/; done | |
$ for i in {1..300}; do gcutil --project=[project_name] push --zone=us-central1-b cas-$i l-1.id_rsa.pub /home/`whoami`/.ssh/; done | |
8. Authorize l-1 to ssh into every VM in the project | |
$ for vm in `gcutil --project=[project_name] listinstances | awk '{print $10;}' | sed ':a;N;$!ba;s/\n/ /g'`; do ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /home/`whoami`/.ssh/google_compute_engine -A -p 22 `whoami`@$vm "cat /home/`whoami`/.ssh/l-1.id_rsa.pub >> /home/`whoami`/.ssh/authorized_keys" ; done | |
9. Generate the cluster configuration file | |
$ echo SUDOUSER=\"`whoami`\" >benchmark.conf; echo DATA_FOLDER=\"cassandra_data\">>benchmark.conf ; for r in `gcutil 2>/dev/null --project=[project_name] listinstances --zone=us-central1-b | awk 'BEGIN {c=0; l=0;} /cas/ { print "CASSANDRA"++c"=\""$10":"$8":/dev/sdb\"";} /l\-[0-9]/ { print "LOAD_GENERATOR"++l"=\""$10"\""; }'`; do echo $r; done >> benchmark.conf | |
10. Upload all test scripts to l-1 | |
$ tar czf scripts.tgz * | |
$ gcutil --project=[project_name] push --zone=us-central1-b l-1 scripts.tgz /home/`whoami` | |
11. ssh to l-1 to setup the cluster | |
$ gcutil --project=[project_name] ssh --zone=us-central1-b l-1 | |
12. unpack the scripts | |
$ tar xzf scripts.tgz | |
13. Run setup_cluster.sh. Please make sure that all nodes are up and running. | |
$ ./setup_cluster.sh | |
14. Run tests | |
$ ./inserts_test.sh | |
15. Gather results from each loader | |
E. Deleting the cluster: | |
1. Delete data nodes | |
$ gcutil --project=[project_name] deleteinstance --zone=us-central1-b `for i in {1..300}; do echo -n cas-$i " "; done` --force --delete_boot_pd | |
2. Delete data loaders | |
$ gcutil --project=[project_name] deleteinstance --zone=us-central1-b `for i in {1..30}; do echo -n l-$i " "; done` --force --delete_boot_pd | |
3. Delete disks | |
$ gcutil --project=[project_name] deletedisk --zone=us-central1-b `for i in {1..300}; do echo -n pd1t-$i " "; done` --force |
Line 2: curl -L ... is missing at the end: | tar xz
great effort.... But can anyone explain me, how to fix the jvm size based on processor size and speed..
.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for sharing this.
I created a bash script base on the above
https://github.com/tzach/cassandra-benchmark-gce