Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Instructions to use Zeppelin with Spark and Cassandra

This procedure is for Spark running in a stand-alone deployment mode

Please follow those instructions:

  1. Clone Zeppelin project from master branch on Github

  2. If you use DSE 4.8 (thus Spark 1.4) edit the file $ZEPPELIN_HOME/spark-dependencies/pom.xml. Duplicate the Maven profile cassandra-spark-1.3 to cassandra-spark-1.4 and update the spark-cassandra-connector version to 1.4.0

  3. Build it with this Maven command mvn clean package -Pcassandra-spark-1.3 (or 1.4 if using DSE 4.8) -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests. Ensure you have Maven version at least 3.x

  4. Duplicate the file $ZEPPELINE_HOME/conf/zeppelin-env.sh.template to $ZEPPELINE_HOME/conf/zeppelin-env.sh

  5. Edit the file $ZEPPELINE_HOME/conf/zeppelin-env.sh and add export MASTER=spark://<spark_DSE_master_IP>:7077

  6. Start Zeppelin with $ZEPPELIN_HOME/bin/zeppelin-daemon.sh start

  7. Goto localhost:8080 to open Zeppelin, go to the Interpreter menu

  8. Edit Spark interpreter properties to change the property master and set it to spark://<spark_DSE_master_IP>:7077. Add also the new property spark.cassandra.connection.host to point to a list of IP addresses of your Cassandra cluster. Save the change and confirm by Yes when the popup asks you to confirm.

  9. Restart Zeppelin with $ZEPPELIN_HOME/bin/zeppelin-daemon.sh restart

Now you can use Spark, Cassandra and the Spark Cassandra connector. Do not forget to import the Scala implicits:

  import org.apache.spark.SparkContext._
  import com.datastax.spark.connector._
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment