Skip to content

Instantly share code, notes, and snippets.

View divayjindal95's full-sized avatar

Divay Jindal divayjindal95

View GitHub Profile
@divayjindal95
divayjindal95 / terminologies
Created April 17, 2016 17:58 — forked from karimkhanp/terminologies
Dumping all terminologies, tool and technology required for BigData
-------------------------------------------------------- Edit to Enlarge ----------------------------------------------
Apache spark - Apache Spark is an open-source data analytics cluster computing framework originally developed in the AMPLab at UC Berkeley.[1] Spark fits into the Hadoop open-source community, building on top of the Hadoop Distributed File System (HDFS).[2] However, Spark is not tied to the two-stage MapReduce paradigm, and promises performance up to 100 times faster than Hadoop MapReduce for certain applications.
Database pipelining - http://www.tuplejump.com/img/ff08.theplatform.png
As you will notice it's just not about processing the data, but involves a lot of other components. Collection, storage, exploration, ML and visualization are critical to the proect's success.
SOLR - Solr to build a highly scalable data analytics engine to enable customers to engage in lightning fast, real-time knowledge discovery.