newfront/ezra_meetup_discussion_nosql.txt

## ezra_meetup_discussion_nosql.txt
#Ruby Meetup Group
Ezra

Limitations of SQL
(horizontal databases) - not covered in mysql

- Limitations
 - don't scale past a single master

#lesssql
	- hybrid systems (solution)
	*find a small part of solution not on critical path (session_data, logs, etc)
	'Redis' - alternative database

New Tour of New Database Types:
	Redis
		- fast, in memory key/value store
		- alternate data types -> lists, sets (hash table)
		- set intersections (commonalities across users, lookup)
		- sessions, hit counters, log buffers (can use)

		Pros
			- all operations happen in memory

		Cons
			- data has to fit in memory
			- data structure server

		Re-Distribute-Your-Load
		Efficient Data Handling (IO based)

		Scales (single threaded) (Like memcache)
			- allows you to spread out over server machines

		Uses: as fast as you can get from a data store

	Tokyo Cabinet
		- Large Data workhorse
		- Fully Syncronus, no chance of losing data
		- Memory Caching,

		More key/value type
			- can have extensible code structures built into system

		Pros:
			- Tokyo Server (80 GB)
			- Fixed Length Records
			- Efficient, Smallest on-desk footprint

		Cons:
			- above 70GB, gets funky

		Replication
			- master, master
			- master, slave

		Uses For: Fastest, store large amounts of data, tune RAM server usage
			- gets embedded in process


	MongoDB
		- document database
		(mySQL of key/value stores) - easiest step from MySQL databases
			- tables are collections of documents
			- rolling buffer
			- great complex queries
			- index on attributes
			- (Not Tied down to schema)

		- Set collections as Shartable (auto-rebalancing)
		- JSON document database

		Cons: no transaction

		Pros: recovery tools
			- advanced query system
			- I/O open, write - grid file system
			- scales horizontally

		MongoDB - fast syncronus writes, good for web, logging, statistics
			- can use hugely complex queries
			- have flexibility in queries

	Riak
		- Document Oriented DB
			- HTTP/JSON query interface
			- Add and Remove Nodes
			-Erlang map/reduce query interface
			- Tunable Nobs, I want you to write to 3 servers, etc (Rule sets)
			http://riak.basho.com

			Pros:
				- schemaless
				- wants to stay alive

			Cons:
				- interface via http, json
				- ruby binding

			Uses for:
				- manage
				- add nodes when you need them

	Cassandra
		- Eventually consistent node distribution
			- column familys, etc
			- structured key/value store
			- can easily get back great sorted
			RULES: rack aware, data aware, location aware

		- When you need to scale out huge amounts of data

		- Writes will always succeed

		Pros:
			- Can add as many nodes as you need
			- Twitter will jump on board
			- Scale out over petabyte

		Cons:

	Dynomite
		- cliffmoon/dynomite

		- no high level types
		- Based on Amazon's Dynamo Papers
		- key/blob

		Uses:
			- Large amount of files (static) that you want to serve

		Cons:
			- bring new nodes into cluster (system can easily get overloaded)
			- (re-balance data)
		*in active development

		Use when you want to scale easily
		Use as image asset store

	Redis, Tokyo, MongoDB (stable)
	*being used in production

	*cassandra (look out for stable release)

	- Chef Recipes on github

	Pitfalls of #LSSSQL
		- no referential Integrity
		- not as much tooling
		- almost non existent disaster recovery tools
		- not as much production, used in anger experience

	*Customers care (save the data!)

	Cloud-Computing
		- horizontal cloud computing
		- add more nodes when you need them (cloud data)

	*Hypertable (offline, large batch processing)
		- map reduce, offline cron based processing

	*HyperCube
		- object relational mapping

	*remapping
		- logic trees (easier to build out in new style dbs)

	Moneta (github)

	*InfoBrightEngine (for MySQL)


	------------------------------------------------------
	joins can be done within the client
		- scalable by taking data from multiple end-points
	------------------------------------------------------

	Day to Day Issues
		- what happens when you hit your limits
		- memcache infront of mysql, redistribute other data into multiple / single systems
		- Solid State Drives (ssd - hotspots on ssd)
	------------------------------------------------------
	Fusion I/O (solid state)
		- controllers getting smarter

	Riak
		- boot config, simple to configure

	SlideShare - (post slides)
	Google App Engine (Data Store) - always slow, but always same slow
	Benchmarking: (?) - no huge studies

	Cassandra - nodes talk to eachother
		- eventually consistent

	Key/Value convergence on the move.

	MongoDB
		(+) Mongo team helps via IRC
		(+) Feature Requests
		(+) Good first step, document store

	Redis -> Tokyo
	(s1):(s2)

	*breakdown of object model
		- now multiple queries to save, build, etc (crash = dead state)

	- save code as rows in db, utilize db to run and return code
	#Ruby Meetup Group
	Ezra

	Limitations of SQL
	(horizontal databases) - not covered in mysql

	- Limitations
	- don't scale past a single master

	#lesssql
	- hybrid systems (solution)
	*find a small part of solution not on critical path (session_data, logs, etc)
	'Redis' - alternative database

	New Tour of New Database Types:
	Redis
	- fast, in memory key/value store
	- alternate data types -> lists, sets (hash table)
	- set intersections (commonalities across users, lookup)
	- sessions, hit counters, log buffers (can use)

	Pros
	- all operations happen in memory

	Cons
	- data has to fit in memory
	- data structure server

	Re-Distribute-Your-Load
	Efficient Data Handling (IO based)

	Scales (single threaded) (Like memcache)
	- allows you to spread out over server machines

	Uses: as fast as you can get from a data store

	Tokyo Cabinet
	- Large Data workhorse
	- Fully Syncronus, no chance of losing data
	- Memory Caching,

	More key/value type
	- can have extensible code structures built into system

	Pros:
	- Tokyo Server (80 GB)
	- Fixed Length Records
	- Efficient, Smallest on-desk footprint

	Cons:
	- above 70GB, gets funky

	Replication
	- master, master
	- master, slave

	Uses For: Fastest, store large amounts of data, tune RAM server usage
	- gets embedded in process


	MongoDB
	- document database
	(mySQL of key/value stores) - easiest step from MySQL databases
	- tables are collections of documents
	- rolling buffer
	- great complex queries
	- index on attributes
	- (Not Tied down to schema)

	- Set collections as Shartable (auto-rebalancing)
	- JSON document database

	Cons: no transaction

	Pros: recovery tools
	- advanced query system
	- I/O open, write - grid file system
	- scales horizontally

	MongoDB - fast syncronus writes, good for web, logging, statistics
	- can use hugely complex queries
	- have flexibility in queries

	Riak
	- Document Oriented DB
	- HTTP/JSON query interface
	- Add and Remove Nodes
	-Erlang map/reduce query interface
	- Tunable Nobs, I want you to write to 3 servers, etc (Rule sets)
	http://riak.basho.com

	Pros:
	- schemaless
	- wants to stay alive

	Cons:
	- interface via http, json
	- ruby binding

	Uses for:
	- manage
	- add nodes when you need them

	Cassandra
	- Eventually consistent node distribution
	- column familys, etc
	- structured key/value store
	- can easily get back great sorted
	RULES: rack aware, data aware, location aware

	- When you need to scale out huge amounts of data

	- Writes will always succeed

	Pros:
	- Can add as many nodes as you need
	- Twitter will jump on board
	- Scale out over petabyte

	Cons:

	Dynomite
	- cliffmoon/dynomite

	- no high level types
	- Based on Amazon's Dynamo Papers
	- key/blob

	Uses:
	- Large amount of files (static) that you want to serve

	Cons:
	- bring new nodes into cluster (system can easily get overloaded)
	- (re-balance data)
	*in active development

	Use when you want to scale easily
	Use as image asset store

	Redis, Tokyo, MongoDB (stable)
	*being used in production

	*cassandra (look out for stable release)

	- Chef Recipes on github

	Pitfalls of #LSSSQL
	- no referential Integrity
	- not as much tooling
	- almost non existent disaster recovery tools
	- not as much production, used in anger experience

	*Customers care (save the data!)

	Cloud-Computing
	- horizontal cloud computing
	- add more nodes when you need them (cloud data)

	*Hypertable (offline, large batch processing)
	- map reduce, offline cron based processing

	*HyperCube
	- object relational mapping

	*remapping
	- logic trees (easier to build out in new style dbs)

	Moneta (github)

	*InfoBrightEngine (for MySQL)


	------------------------------------------------------
	joins can be done within the client
	- scalable by taking data from multiple end-points
	------------------------------------------------------

	Day to Day Issues
	- what happens when you hit your limits
	- memcache infront of mysql, redistribute other data into multiple / single systems
	- Solid State Drives (ssd - hotspots on ssd)
	------------------------------------------------------
	Fusion I/O (solid state)
	- controllers getting smarter

	Riak
	- boot config, simple to configure

	SlideShare - (post slides)
	Google App Engine (Data Store) - always slow, but always same slow
	Benchmarking: (?) - no huge studies

	Cassandra - nodes talk to eachother
	- eventually consistent

	Key/Value convergence on the move.

	MongoDB
	(+) Mongo team helps via IRC
	(+) Feature Requests
	(+) Good first step, document store

	Redis -> Tokyo
	(s1):(s2)

	*breakdown of object model
	- now multiple queries to save, build, etc (crash = dead state)

	- save code as rows in db, utilize db to run and return code