tamsky/meetup-notes

## meetup-notes
Jon Haddad: Diagnosing Cassandra Problems in Production
http://www.meetup.com/ladevops/events/213408212/

where do systems fail?
spof
typical : replication - master failure
master insert bottleneck
san&nas performance
hardware/software upgrades
any # of cloud issues
lack of viz - prepare for everything
dc failure == ??


----
"it worked in dev"
 - devs rarely use legit datasets
 - unpredicatable query perf (joins are insane)
- index updates
- test on ssd, deploy to spinning rust?
- usually no contention in dev
- clusters are very differeny beasts from single instances


the problem with most dbs
- clustering is an afterthought
 - bolt on
 - developers start with acid, but give it up when they start using replication
 - bad practices are encouraged
- failover is an afterthought
 - edge cases in failover process
- multidc requires RT to master xcountry to INSERT/UPDATE
- you can't cheat latency


--

how is cassandra diff?

ring based replication
only 1 type of server (cassandra)
all nodes hold data and can answer queries
data is stored on RF=N servers
no spof
eventually consistent
data is found by key (CQL)
build for HA & scale
multi-dc
runs on JVM


----

preventattive meausres
- opscenter
- metrics integration
- munin
- monit
- icinga
- graphite / statsd (app level)
- logstash

--------
- weird consistency issues - NTP working?
 - last write wins -- time skew
 - problems with streamsing / repair - version conflicts
 - run cleanup after you add nodes (reclaim disk space)
 - slow queries
  - compaction
  - histograms
  - tracing
- nodes flapping/failing
 - check opscenter
  - dig into system metrics
  - jvm gc issues


---
compaction
 - compaction merges sstables
 - too much compaction?
 - opscenter provides insight into compaction cluster-wide
 - nodetool
  - compactionhistory
  - getcompactionthroughput
- leveld vs SizeTiered vs DateTiered
 - leveled on SSD + Read Heavy
 - size tiered on spinning rust
 - size tiered is great for write heavy time series workloads
 - DateSeries tiered -- (NEW) good for time series ?

------  sysutils

iostat
htop
iftop & netstat
dstat
strace

------
- proxyhistograms
  - high level read and write times
 - includes network latency
- cfhistograms <keyspace> <table>
 - reports stats for single table on a single node
 - used to identify tables with perf prob

--- query tracing

TRACING on;
select * from blah where pk=1 limit 100;


---- jvm gc

- generational gc (parnew & cms)
 - new gen (eden + survivor0 + survivor1)
 - old generation
- new obj are created in eden
- minor gc
 - occurs when new gen fills up
 - stops the world
 - dead objects are removed
 - live obj are prmoted into survivor, then old gen
 - removing objects is fast, promoting objects is slow

---- old gen
- obj are promoted to new gen from old gen
- major gc
 - old gens fill up some %
 - mostly concurrent
 - 2 short stop the world pauses
- full gc
 - occurs when old gen fills up or obj can't be promoted
 - stop the world
 - collects all generations
 - these are bad!

------------- GC profiling

- opscenter gc stats
 - look for correlations between gc spikes and r/w latency

jstat -gcutil 89760(pid) 250(interval) 10000
- casssandra gc logging
 - can be activated in cassandra-env.sh
- jstat
 - prints gc activity


--- look out for
- long multi-second pauses
 - caused by full gcs.  old gen is filling up faster than the concurrent gc can keep up with it.
typ. means garbage is being promoted out of the new gen too soon
- long minor gc
 - many of the objects in the new gen are being promoted to the old gen
 - most commonly caused by new gen being too big
 - sometimes caused by obj being promoted prematurely

smaller new gen size = smaller, more predictable performance when gc occurs

---------
@rustyrazorblade
in depth disk analysis @AlTobey
planetcassandra.com
blake eggleston blog on JVM tuning
http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads

---
tablesnap - can do backups automatically to S3
	Jon Haddad: Diagnosing Cassandra Problems in Production
	http://www.meetup.com/ladevops/events/213408212/

	where do systems fail?
	spof
	typical : replication - master failure
	master insert bottleneck
	san&nas performance
	hardware/software upgrades
	any # of cloud issues
	lack of viz - prepare for everything
	dc failure == ??


	----
	"it worked in dev"
	- devs rarely use legit datasets
	- unpredicatable query perf (joins are insane)
	- index updates
	- test on ssd, deploy to spinning rust?
	- usually no contention in dev
	- clusters are very differeny beasts from single instances



	the problem with most dbs
	- clustering is an afterthought
	- bolt on
	- developers start with acid, but give it up when they start using replication
	- bad practices are encouraged
	- failover is an afterthought
	- edge cases in failover process
	- multidc requires RT to master xcountry to INSERT/UPDATE
	- you can't cheat latency


	--

	how is cassandra diff?

	ring based replication
	only 1 type of server (cassandra)
	all nodes hold data and can answer queries
	data is stored on RF=N servers
	no spof
	eventually consistent
	data is found by key (CQL)
	build for HA & scale
	multi-dc
	runs on JVM


	----

	preventattive meausres
	- opscenter
	- metrics integration
	- munin
	- monit
	- icinga
	- graphite / statsd (app level)
	- logstash

	--------
	- weird consistency issues - NTP working?
	- last write wins -- time skew
	- problems with streamsing / repair - version conflicts
	- run cleanup after you add nodes (reclaim disk space)
	- slow queries
	- compaction
	- histograms
	- tracing
	- nodes flapping/failing
	- check opscenter
	- dig into system metrics
	- jvm gc issues



	---
	compaction
	- compaction merges sstables
	- too much compaction?
	- opscenter provides insight into compaction cluster-wide
	- nodetool
	- compactionhistory
	- getcompactionthroughput
	- leveld vs SizeTiered vs DateTiered
	- leveled on SSD + Read Heavy
	- size tiered on spinning rust
	- size tiered is great for write heavy time series workloads
	- DateSeries tiered -- (NEW) good for time series ?

	------ sysutils

	iostat
	htop
	iftop & netstat
	dstat
	strace

	------
	- proxyhistograms
	- high level read and write times
	- includes network latency
	- cfhistograms <keyspace> <table>
	- reports stats for single table on a single node
	- used to identify tables with perf prob

	--- query tracing

	TRACING on;
	select * from blah where pk=1 limit 100;


	---- jvm gc

	- generational gc (parnew & cms)
	- new gen (eden + survivor0 + survivor1)
	- old generation
	- new obj are created in eden
	- minor gc
	- occurs when new gen fills up
	- stops the world
	- dead objects are removed
	- live obj are prmoted into survivor, then old gen
	- removing objects is fast, promoting objects is slow

	---- old gen
	- obj are promoted to new gen from old gen
	- major gc
	- old gens fill up some %
	- mostly concurrent
	- 2 short stop the world pauses
	- full gc
	- occurs when old gen fills up or obj can't be promoted
	- stop the world
	- collects all generations
	- these are bad!

	------------- GC profiling

	- opscenter gc stats
	- look for correlations between gc spikes and r/w latency

	jstat -gcutil 89760(pid) 250(interval) 10000
	- casssandra gc logging
	- can be activated in cassandra-env.sh
	- jstat
	- prints gc activity


	--- look out for
	- long multi-second pauses
	- caused by full gcs. old gen is filling up faster than the concurrent gc can keep up with it.
	typ. means garbage is being promoted out of the new gen too soon
	- long minor gc
	- many of the objects in the new gen are being promoted to the old gen
	- most commonly caused by new gen being too big
	- sometimes caused by obj being promoted prematurely

	smaller new gen size = smaller, more predictable performance when gc occurs

	---------
	@rustyrazorblade
	in depth disk analysis @AlTobey
	planetcassandra.com
	blake eggleston blog on JVM tuning
	http://tech.shift.com/post/74311817513/cassandra-tuning-the-jvm-for-read-heavy-workloads

	---
	tablesnap - can do backups automatically to S3