Skip to content

Instantly share code, notes, and snippets.

@newfront
Created April 25, 2011 19:56
Show Gist options
  • Save newfront/941087 to your computer and use it in GitHub Desktop.
Save newfront/941087 to your computer and use it in GitHub Desktop.
Ezra NoSQL Talk
#Ruby Meetup Group
Ezra
Limitations of SQL
(horizontal databases) - not covered in mysql
- Limitations
- don't scale past a single master
#lesssql
- hybrid systems (solution)
*find a small part of solution not on critical path (session_data, logs, etc)
'Redis' - alternative database
New Tour of New Database Types:
Redis
- fast, in memory key/value store
- alternate data types -> lists, sets (hash table)
- set intersections (commonalities across users, lookup)
- sessions, hit counters, log buffers (can use)
Pros
- all operations happen in memory
Cons
- data has to fit in memory
- data structure server
Re-Distribute-Your-Load
Efficient Data Handling (IO based)
Scales (single threaded) (Like memcache)
- allows you to spread out over server machines
Uses: as fast as you can get from a data store
Tokyo Cabinet
- Large Data workhorse
- Fully Syncronus, no chance of losing data
- Memory Caching,
More key/value type
- can have extensible code structures built into system
Pros:
- Tokyo Server (80 GB)
- Fixed Length Records
- Efficient, Smallest on-desk footprint
Cons:
- above 70GB, gets funky
Replication
- master, master
- master, slave
Uses For: Fastest, store large amounts of data, tune RAM server usage
- gets embedded in process
MongoDB
- document database
(mySQL of key/value stores) - easiest step from MySQL databases
- tables are collections of documents
- rolling buffer
- great complex queries
- index on attributes
- (Not Tied down to schema)
- Set collections as Shartable (auto-rebalancing)
- JSON document database
Cons: no transaction
Pros: recovery tools
- advanced query system
- I/O open, write - grid file system
- scales horizontally
MongoDB - fast syncronus writes, good for web, logging, statistics
- can use hugely complex queries
- have flexibility in queries
Riak
- Document Oriented DB
- HTTP/JSON query interface
- Add and Remove Nodes
-Erlang map/reduce query interface
- Tunable Nobs, I want you to write to 3 servers, etc (Rule sets)
http://riak.basho.com
Pros:
- schemaless
- wants to stay alive
Cons:
- interface via http, json
- ruby binding
Uses for:
- manage
- add nodes when you need them
Cassandra
- Eventually consistent node distribution
- column familys, etc
- structured key/value store
- can easily get back great sorted
RULES: rack aware, data aware, location aware
- When you need to scale out huge amounts of data
- Writes will always succeed
Pros:
- Can add as many nodes as you need
- Twitter will jump on board
- Scale out over petabyte
Cons:
Dynomite
- cliffmoon/dynomite
- no high level types
- Based on Amazon's Dynamo Papers
- key/blob
Uses:
- Large amount of files (static) that you want to serve
Cons:
- bring new nodes into cluster (system can easily get overloaded)
- (re-balance data)
*in active development
Use when you want to scale easily
Use as image asset store
Redis, Tokyo, MongoDB (stable)
*being used in production
*cassandra (look out for stable release)
- Chef Recipes on github
Pitfalls of #LSSSQL
- no referential Integrity
- not as much tooling
- almost non existent disaster recovery tools
- not as much production, used in anger experience
*Customers care (save the data!)
Cloud-Computing
- horizontal cloud computing
- add more nodes when you need them (cloud data)
*Hypertable (offline, large batch processing)
- map reduce, offline cron based processing
*HyperCube
- object relational mapping
*remapping
- logic trees (easier to build out in new style dbs)
Moneta (github)
*InfoBrightEngine (for MySQL)
------------------------------------------------------
joins can be done within the client
- scalable by taking data from multiple end-points
------------------------------------------------------
Day to Day Issues
- what happens when you hit your limits
- memcache infront of mysql, redistribute other data into multiple / single systems
- Solid State Drives (ssd - hotspots on ssd)
------------------------------------------------------
Fusion I/O (solid state)
- controllers getting smarter
Riak
- boot config, simple to configure
SlideShare - (post slides)
Google App Engine (Data Store) - always slow, but always same slow
Benchmarking: (?) - no huge studies
Cassandra - nodes talk to eachother
- eventually consistent
Key/Value convergence on the move.
MongoDB
(+) Mongo team helps via IRC
(+) Feature Requests
(+) Good first step, document store
Redis -> Tokyo
(s1):(s2)
*breakdown of object model
- now multiple queries to save, build, etc (crash = dead state)
- save code as rows in db, utilize db to run and return code
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment