Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Instructions to use Zeppelin with Spark and Cassandra

This procedure is for Spark running in a stand-alone deployment mode

Please follow those instructions:

  1. Clone Zeppelin project from master branch on Github

  2. If you use DSE 4.8 (thus Spark 1.4) edit the file $ZEPPELIN_HOME/spark-dependencies/pom.xml. Duplicate the Maven profile cassandra-spark-1.3 to cassandra-spark-1.4 and update the spark-cassandra-connector version to 1.4.0

  3. Build it with this Maven command mvn clean package -Pcassandra-spark-1.3 (or 1.4 if using DSE 4.8) -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests. Ensure you have Maven version at least 3.x

  4. Duplicate the file $ZEPPELINE_HOME/conf/ to $ZEPPELINE_HOME/conf/

  5. Edit the file $ZEPPELINE_HOME/conf/ and add export MASTER=spark://<spark_DSE_master_IP>:7077

  6. Start Zeppelin with $ZEPPELIN_HOME/bin/ start

  7. Goto localhost:8080 to open Zeppelin, go to the Interpreter menu

  8. Edit Spark interpreter properties to change the property master and set it to spark://<spark_DSE_master_IP>:7077. Add also the new property to point to a list of IP addresses of your Cassandra cluster. Save the change and confirm by Yes when the popup asks you to confirm.

  9. Restart Zeppelin with $ZEPPELIN_HOME/bin/ restart

Now you can use Spark, Cassandra and the Spark Cassandra connector. Do not forget to import the Scala implicits:

  import org.apache.spark.SparkContext._
  import com.datastax.spark.connector._
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment