rayhassan/cassandra-sizing-notes.txt

## cassandra-sizing-notes.txt
refrences:
https://dzone.com/articles/cassandra-design-best-practices
https://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf
https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

As regards tips/advice then bear in mind most of this will apply to the Guest OS running Cassandra

run noop scheduler

set OS limits

* soft nofile 32768
* hard nofile 32768
root soft nofile 32768
root hard nofile 32768
* soft memlock unlimited
* hard memlock unlimited
root soft memlock unlimited

root hard memlock unlimited
* soft as unlimited
* hard as unlimited
root soft as unlimited
root hard as unlimited

Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:
cassandra_user - nproc 32768

Tarball installation	/etc/security/limits.conf
Package installation	/etc/security/limits.d/cassandra.conf
Configure the following settings for the <cassandra_user> in the configuration file:
<cassandra_user> - memlock unlimited
<cassandra_user> - nofile 1048576
<cassandra_user> - nproc 32768
<cassandra_user> - as unlimited


net.ipv4.tcp_keepalive_time=60
net.ipv4.tcp_keepalive_probes=3
net.ipv4.tcp_keepalive_intvl=10
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.rmem_default=16777216
net.core.wmem_default=16777216
net.core.optmem_max=40960
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216

sysctl -w vm.max_map_count=1048575 To make this setting permanent,
vm.max_map_count = 1048575 Should be added to: /etc/sysctl.conf

o disable freq scaling / set frequency governor
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/power_management_guide/cpufreq_setup#enabling_a_cpufreq_governor

o disable numa and/or zone_reclaim
https://www.thegeekdiary.com/how-to-disable-numa-in-centos-rhel-67/
To disable NUMA, add numa=off to the kernel line in grub.conf file
echo 0 > /proc/sys/vm/zone_reclaim_mode

So that the Java Virtual Machine (JVM) does not endlessly swap, all paging spaces should be
deactivated like so: sudo swapoff –all or as root swapoff –all This change can be made
permanent by removing swap file entries from /etc/fstab

Tune the JVM Heap size on each node of the cluster, depending on the amount of memory on that
specific node. Too big a Heap size can impair Cassandra’s efficiency. Recommended settings are:

RAM Heap Size
< 2GB ½ of RAM
2GB to 4GB 1GB
> 4GB ¼ of RAM, but not more than 8GB

The database automatically calculates the maximum heap size (MAX_HEAP_SIZE) based on this formula:
max(min(1/2 ram, 1024 megabytes), min(1/4 ram, 32765 megabytes)) <<<<< number taken from blog (1) !

(1) http://java-performance.info/over-32g-heap-java
(2) http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html

Writes are done as updates and then the original data gets tombstoned for later deletion after compaction.
Often, you will need double the amount of space in relation to the data to be deleted. They may have a handle
on that already. They will know often they overwrite/update data I hope. Otherwise check incremental backups perhaps?


one other thing to consider - which relates the cassandra rf3 - if they need n VMs to host the
entire working set in RAM then the usual caveat is they need 3n VMs for the replica set to provide
redundancy as per casssandra guidelines
	refrences:
	https://dzone.com/articles/cassandra-design-best-practices
	https://www.datastax.com/wp-content/uploads/2014/04/WP-DataStax-Enterprise-Best-Practices.pdf
	https://docs.datastax.com/en/dse/5.1/dse-admin/datastax_enterprise/config/configRecommendedSettings.html

	As regards tips/advice then bear in mind most of this will apply to the Guest OS running Cassandra

	run noop scheduler

	set OS limits

	* soft nofile 32768
	* hard nofile 32768
	root soft nofile 32768
	root hard nofile 32768
	* soft memlock unlimited
	* hard memlock unlimited
	root soft memlock unlimited

	root hard memlock unlimited
	* soft as unlimited
	* hard as unlimited
	root soft as unlimited
	root hard as unlimited

	Set the nproc limits to 32768 in the /etc/security/limits.d/90-nproc.conf configuration file:
	cassandra_user - nproc 32768

	Tarball installation /etc/security/limits.conf
	Package installation /etc/security/limits.d/cassandra.conf
	Configure the following settings for the <cassandra_user> in the configuration file:
	<cassandra_user> - memlock unlimited
	<cassandra_user> - nofile 1048576
	<cassandra_user> - nproc 32768
	<cassandra_user> - as unlimited


	net.ipv4.tcp_keepalive_time=60
	net.ipv4.tcp_keepalive_probes=3
	net.ipv4.tcp_keepalive_intvl=10
	net.core.rmem_max=16777216
	net.core.wmem_max=16777216
	net.core.rmem_default=16777216
	net.core.wmem_default=16777216
	net.core.optmem_max=40960
	net.ipv4.tcp_rmem=4096 87380 16777216
	net.ipv4.tcp_wmem=4096 65536 16777216

	sysctl -w vm.max_map_count=1048575 To make this setting permanent,
	vm.max_map_count = 1048575 Should be added to: /etc/sysctl.conf

	o disable freq scaling / set frequency governor
	https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/power_management_guide/cpufreq_setup#enabling_a_cpufreq_governor

	o disable numa and/or zone_reclaim
	https://www.thegeekdiary.com/how-to-disable-numa-in-centos-rhel-67/
	To disable NUMA, add numa=off to the kernel line in grub.conf file
	echo 0 > /proc/sys/vm/zone_reclaim_mode

	So that the Java Virtual Machine (JVM) does not endlessly swap, all paging spaces should be
	deactivated like so: sudo swapoff –all or as root swapoff –all This change can be made
	permanent by removing swap file entries from /etc/fstab

	Tune the JVM Heap size on each node of the cluster, depending on the amount of memory on that
	specific node. Too big a Heap size can impair Cassandra’s efficiency. Recommended settings are:

	RAM Heap Size
	< 2GB ½ of RAM
	2GB to 4GB 1GB
	> 4GB ¼ of RAM, but not more than 8GB

	The database automatically calculates the maximum heap size (MAX_HEAP_SIZE) based on this formula:
	max(min(1/2 ram, 1024 megabytes), min(1/4 ram, 32765 megabytes)) <<<<< number taken from blog (1) !

	(1) http://java-performance.info/over-32g-heap-java
	(2) http://blog.ragozin.info/2012/03/secret-hotspot-option-improving-gc.html

	Writes are done as updates and then the original data gets tombstoned for later deletion after compaction.
	Often, you will need double the amount of space in relation to the data to be deleted. They may have a handle
	on that already. They will know often they overwrite/update data I hope. Otherwise check incremental backups perhaps?


	one other thing to consider - which relates the cassandra rf3 - if they need n VMs to host the
	entire working set in RAM then the usual caveat is they need 3n VMs for the replica set to provide
	redundancy as per casssandra guidelines