Skip to content

Instantly share code, notes, and snippets.


ivansmf/README Secret

Last active October 7, 2020 01:57
Show Gist options
  • Star 27 You must be signed in to star a gist
  • Fork 11 You must be signed in to fork a gist
  • Save ivansmf/6ec2197b69d1b7b26153 to your computer and use it in GitHub Desktop.
Save ivansmf/6ec2197b69d1b7b26153 to your computer and use it in GitHub Desktop.
How we hit one million Cassandra writes on Google Compute Engine - Reproducing Results
A. Disclaimers:
1. The scripts are offered for instructional purposes
2. Cassandra.yaml and are edited by one of the test scripts. We DO change the default settings to gain performance, and the new settings are specifically designed to work on n1-standard-8 data nodes. For instance, we set the Java heap size to a large value that might not fit on other VM types.
B. Assumptions:
The scripts assume that your username on the target VM is the same as the local development server. More specifically, that the output of `whoami` on your development server is the same as in the VM.
C. Prerequisites:
1. Download all test scripts to a local folder and untar it.
To download: wget
To untar: tar xzf cassandra_1m_writes_per_sec_gist.tgz
2. Download Cassandra binary distribution tarball into the tarballs folder. You can find detailed instructions at
$ curl -L
3. Download the Oracle Java 7 tarball into the tarballs folder (we used server-jre-7u40-linux-x64.tar.gz). You can replace step by installing the JDK, but we did not measure the performance impact of other releases. You can find the binary download here: (sign up required).
4. You'll need to replace [project_name] by an actual project name or ID.
D. Creating the Cassandra cluster and data loaders:
1. Create disks.
$ gcutil --project=[project_name] adddisk --zone=us-central1-b --wait_until_complete --size_gb=1000 `for i in {1..300}; do echo -n pd1t-$i " "; done`
2. Create data nodes.
$ gcutil --project=[project_name] addinstance --zone=us-central1-b --add_compute_key_to_project --auto_delete_boot_disk --automatic_restart --use_compute_key --wait_until_running --image=debian-7-wheezy-v20131120 --machine_type=n1-standard-8 `for i in {1..300}; do echo -n cas-$i " "; done`
3. Create loaders.
$ gcutil --project=[project_name] addinstance --zone=us-central1-b --add_compute_key_to_project --auto_delete_boot_disk --automatic_restart --use_compute_key --wait_until_running --image=debian-7-wheezy-v20131120 --machine_type=n1-highcpu-8 `for i in {1..30}; do echo -n l-$i " "; done`
4. Attach the disks to data nodes.
$ for i in {1..300}; do gcutil --project=[project_name] attachdisk --zone=us-central1-b --disk=pd1t-$i cas-$i; done
5. Authorize one of the loaders to ssh and rsync everywhere. Time to complete 5 minutes.
$ gcutil --project=[project_name] ssh --zone=us-central1-b l-1
$ ssh-keygen -t rsa
$ exit
6. Download the public key
$ gcutil --project=[project_name] pull --zone=us-central1-b l-1 /home/`whoami`/.ssh/
7. Upload the key to all other VMs
$ for i in {1..30}; do gcutil --project=[project_name] push --zone=us-central1-b l-$i /home/`whoami`/.ssh/; done
$ for i in {1..300}; do gcutil --project=[project_name] push --zone=us-central1-b cas-$i /home/`whoami`/.ssh/; done
8. Authorize l-1 to ssh into every VM in the project
$ for vm in `gcutil --project=[project_name] listinstances | awk '{print $10;}' | sed ':a;N;$!ba;s/\n/ /g'`; do ssh -o UserKnownHostsFile=/dev/null -o CheckHostIP=no -o StrictHostKeyChecking=no -i /home/`whoami`/.ssh/google_compute_engine -A -p 22 `whoami`@$vm "cat /home/`whoami`/.ssh/ >> /home/`whoami`/.ssh/authorized_keys" ; done
9. Generate the cluster configuration file
$ echo SUDOUSER=\"`whoami`\" >benchmark.conf; echo DATA_FOLDER=\"cassandra_data\">>benchmark.conf ; for r in `gcutil 2>/dev/null --project=[project_name] listinstances --zone=us-central1-b | awk 'BEGIN {c=0; l=0;} /cas/ { print "CASSANDRA"++c"=\""$10":"$8":/dev/sdb\"";} /l\-[0-9]/ { print "LOAD_GENERATOR"++l"=\""$10"\""; }'`; do echo $r; done >> benchmark.conf
10. Upload all test scripts to l-1
$ tar czf scripts.tgz *
$ gcutil --project=[project_name] push --zone=us-central1-b l-1 scripts.tgz /home/`whoami`
11. ssh to l-1 to setup the cluster
$ gcutil --project=[project_name] ssh --zone=us-central1-b l-1
12. unpack the scripts
$ tar xzf scripts.tgz
13. Run Please make sure that all nodes are up and running.
$ ./
14. Run tests
$ ./
15. Gather results from each loader
E. Deleting the cluster:
1. Delete data nodes
$ gcutil --project=[project_name] deleteinstance --zone=us-central1-b `for i in {1..300}; do echo -n cas-$i " "; done` --force --delete_boot_pd
2. Delete data loaders
$ gcutil --project=[project_name] deleteinstance --zone=us-central1-b `for i in {1..30}; do echo -n l-$i " "; done` --force --delete_boot_pd
3. Delete disks
$ gcutil --project=[project_name] deletedisk --zone=us-central1-b `for i in {1..300}; do echo -n pd1t-$i " "; done` --force
Copy link

tzach commented Jul 1, 2014

Thanks for sharing this.
I created a bash script base on the above

Copy link

pygupta commented Dec 9, 2014

Line 2: curl -L ... is missing at the end: | tar xz

Copy link

great effort.... But can anyone explain me, how to fix the jvm size based on processor size and speed..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment