What is Hadoop? "Hadoop is an open source software platform for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware" - Hortonworks
- Distributed storage: stores data across many hard drives & has backup copies