manku-timma/gist:06dc665db1ae4f4becff

## gistfile1.md

      
    Raw
  

              gistfile1.md
            
          
It supports SQL, Scala, Java, Python, etc
It supports hive tables, json tables, native scala data structures etc as RDDs
It supports batch, streaming, interactive, iterative etc modes of computation
It supports AWS, Mesos, YARN, openstack etc as underlying computation engines
It supports tachyon, hdfs, s3, hive as storage engines; also DBs and noSQL DBs

Useful scala links

http://www.scala-lang.org/docu/files/ScalaOverview.pdf
http://www.cs.ucsb.edu/~benh/162/Programming-in-Scala.pdf

Interesting things to think about:

Tachyon - in-memory distributed file system based on lineage
MDCC and other stuff from BDAS which are focused on point updates to big data
GraphX
MLBase and MLLib

Martin Odersky observations:

with all the theoretical advantages of functional programming, some trigger is needed for its wide adoption
parallel and distributed programming is the catalyst; reason is that there is lot of parallelism to be utilized (AWS etc)

Graphx

tables and graphs are merged w.r.t read and write
spark api for table management and graphlab api for graph management are brought together
useful algorithms are:
pagerank
connected components
shortest path
ALS