Skip to content

Instantly share code, notes, and snippets.


yencarnacion/README Secret

Forked from ivansmf/README
Created May 27, 2017
What would you like to do?
How we hit one million Cassandra writes on Google Compute Engine - Reproducing Results
A. Disclaimers:
1. The scripts are offered for instructional purposes
2. Cassandra.yaml and are edited by one of the test scripts. We DO change the default settings to gain performance, and the new settings are specifically designed to work on n1-standard-8 data nodes. For instance, we set the Java heap size to a large value that might not fit on other VM types.
B. Assumptions:
The scripts assume that your username on the target VM is the same as the local development server. More specifically, that the output of `whoami` on your development server is the same as in the VM.
C. Prerequisites:
1. Download all test scripts to a local folder and untar it.
To download: wget
To untar: tar xzf cassandra_1m_writes_per_sec_gist.tgz
2. Download Cassandra binary distribution tarball into the tarballs folder. You can find detailed instructions at
$ curl -L
3. Download the Oracle Java 7 tarball into the tarballs folder (we used server-jre-7u40-linux-x64.tar.gz). You can replace step by installing the JDK, but we did not measure the performance impact of other releases. You can find the binary download here: (sign up required).
4. You'll need to replace [project_name] by an actual project name or ID.
D. Creating the Cassandra cluster and data loaders:
1. Create disks.
$ gcutil --project=[project_name] adddisk --zone=us-central1-b --wait_until_complete --size_gb=1000 `for i in {1..300}; do echo -n pd1t-$i " "; done`
2. Create data nodes.
$ gcutil --project=[project_name] addinstance --zone=us-central1-b --add_compute_key_to_project --auto_delete_boot_disk --automatic_restart --use_compute_key --wait_until_running --image=debian-7-wheezy-v20131120 --machine_type=n1-standard-8 `for i in {1..300}; do echo -n cas-$i " "; done`
3. Create loaders.
$ gcutil --project=[project_name] addinstance --zone=us-central1-b --add_compute_key_to_project --auto_delete_boot_disk --automatic_restart --use_compute_key --wait_until_running --image=debian-7-wheezy-v20131120 --machine_type=n1-highcpu-8 `for i in {1..30}; do echo -n l-$i " "; done`
4. Attach the disks to data nodes.
$ for i in {1..300}; do gcutil --project=[project_name] attachdisk --zone=us-central1-b --disk=pd1t-$i cas-$i; done
5. Authorize one of the loaders to ssh and rsync everywhere. Time to complete 5 minutes.
$ gcutil --project=[project_name] ssh --zone=us-central1-b l-1
$ ssh-keygen -t rsa
$ exit
6. Download the public key
$ gcutil --project=[project_name] pull --zone=us-central1-b l-1 /home/`whoami`/.ssh/
7. Upload the key to all other VMs
$ for i in {1..30}; do gcutil --project=[project_name] push --zone=us-central1-b l-$i /home/`whoami`/.ssh/; done
$ for i in {1..300}; do gcutil --project=[project_name] push --zone=us-central1-b cas-$i /home/`whoami`/.ssh/; done
8. Authorize l-1 to ssh into every VM in the project
$ for vm in `gcutil --project=[project_name] listinstances | awk '{print $10;}' | sed ':a;N;$!ba;s/\n/ /g'`; do ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /home/`whoami`/.ssh/google_compute_engine -A -p 22 `whoami`@$vm "cat /home/`whoami`/.ssh/ >> /home/`whoami`/.ssh/authorized_keys" ; done
9. Generate the cluster configuration file
$ echo SUDOUSER=\"`whoami`\" >benchmark.conf; echo DATA_FOLDER=\"cassandra_data\">>benchmark.conf ; for r in `gcutil 2>/dev/null --project=[project_name] listinstances --zone=us-central1-b | awk 'BEGIN {c=0; l=0;} /cas/ { print "CASSANDRA"++c"=\""$10":"$8":/dev/sdb\"";} /l\-[0-9]/ { print "LOAD_GENERATOR"++l"=\""$10"\""; }'`; do echo $r; done >> benchmark.conf
10. Upload all test scripts to l-1
$ tar czf scripts.tgz *
$ gcutil --project=[project_name] push --zone=us-central1-b l-1 scripts.tgz /home/`whoami`
11. ssh to l-1 to setup the cluster
$ gcutil --project=[project_name] ssh --zone=us-central1-b l-1
12. unpack the scripts
$ tar xzf scripts.tgz
13. Run Please make sure that all nodes are up and running.
$ ./
14. Run tests
$ ./
15. Gather results from each loader
E. Deleting the cluster:
1. Delete data nodes
$ gcutil --project=[project_name] deleteinstance --zone=us-central1-b `for i in {1..300}; do echo -n cas-$i " "; done` --force --delete_boot_pd
2. Delete data loaders
$ gcutil --project=[project_name] deleteinstance --zone=us-central1-b `for i in {1..30}; do echo -n l-$i " "; done` --force --delete_boot_pd
3. Delete disks
$ gcutil --project=[project_name] deletedisk --zone=us-central1-b `for i in {1..300}; do echo -n pd1t-$i " "; done` --force
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.