doanduyhai/Spark-Cassandra-Zeppelin-instructions.md

## Spark-Cassandra-Zeppelin-instructions.md

      
    Raw
  

              Spark-Cassandra-Zeppelin-instructions.md
            
          
    This procedure is for Spark running in a stand-alone deployment mode
Please follow those instructions:


Clone Zeppelin project from master branch on Github


If you use DSE 4.8 (thus Spark 1.4) edit the file $ZEPPELIN_HOME/spark-dependencies/pom.xml.
Duplicate the Maven profile cassandra-spark-1.3 to cassandra-spark-1.4 and update the
spark-cassandra-connector version to 1.4.0


Build it with this Maven command mvn clean package -Pcassandra-spark-1.3 (or 1.4 if using DSE 4.8) -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests.
Ensure you have Maven version at least 3.x


Duplicate the file $ZEPPELINE_HOME/conf/zeppelin-env.sh.template to $ZEPPELINE_HOME/conf/zeppelin-env.sh


Edit the file $ZEPPELINE_HOME/conf/zeppelin-env.sh and add export MASTER=spark://<spark_DSE_master_IP>:7077


Start Zeppelin with $ZEPPELIN_HOME/bin/zeppelin-daemon.sh start


Goto localhost:8080 to open Zeppelin, go to the Interpreter menu


Edit Spark interpreter properties to change the property master and set it to spark://<spark_DSE_master_IP>:7077.
Add also the new property spark.cassandra.connection.host to point to a list of IP addresses of your Cassandra cluster.
Save the change and confirm by Yes when the popup asks you to confirm.


Restart Zeppelin with  $ZEPPELIN_HOME/bin/zeppelin-daemon.sh restart


Now you can use Spark, Cassandra and the Spark Cassandra connector. Do not forget to import the Scala implicits:
  import org.apache.spark.SparkContext._
  import com.datastax.spark.connector._