Skip to content

Instantly share code, notes, and snippets.

@hkhamm
Last active August 1, 2019 19:44
Show Gist options
  • Star 9 You must be signed in to star a gist
  • Fork 6 You must be signed in to fork a gist
  • Save hkhamm/69aff1dfe731e2fa61b2 to your computer and use it in GitHub Desktop.
Save hkhamm/69aff1dfe731e2fa61b2 to your computer and use it in GitHub Desktop.
Install, Setup, and Test Spark and Cassandra on Mac OS X

Install, Setup, and Test Spark and Cassandra on Mac OS X

This Gist assumes you already followed the instructions to install Cassandra, created a keyspace and table, and added some data.

Install Apache Spark

brew install apache-spark

Get the Spark Cassandra Connector

Clone the download script from Github Gist:

git clone https://gist.github.com/b700fe70f0025a519171.git

Rename the cloned directory:

mv b700fe70f0025a519171 connector

Run the script:

bash install_connector.sh

Start the Spark Master and a Worker

./usr/local/Cellar/apache-spark/1.0.2/libexec/sbin/start-all.sh

Testing the install

Make a note of the path to your connector directory.

Open the Spark Shell with the connector:

spark-shell --driver-class-path $(echo path/to/connector/*.jar | sed 's/ /:/g')

Wait for everything to load. Once it is finished, you'll see a scala prompt:

scala >

You'll need to stop the default SparkContext, since you'll create your own with the script.

scala > sc.stop

Once that is finished, get ready to paste the script in:

scala > :paste

Paste in this script, make sure to change the path to the connector and to change keyspace and table to the names of your keyspace and table:

import com.datastax.spark.connector._
import org.apache.spark._

val conf = new SparkConf()
conf.set("spark.cassandra.connection.host", "127.0.0.1")
conf.set("spark.home","/usr/local/Cellar/apache-spark/1.0.2/libexec")

// You may not need these two settings if you haven't set up password authentication in Cassandra
conf.set("spark.cassandra.auth.username", "cassandra")
conf.set("spark.cassandra.auth.password", "cassandra")

val sc = new SparkContext("spark://localhost:7077", "Cassandra Connector Test", conf)
sc.addJar("path/to/connector/cassandra-driver-core-2.0.3.jar")
sc.addJar("path/to/connector/cassandra-thrift-2.0.9.jar")
sc.addJar("path/to/connector/commons-codec-1.2.jar")
sc.addJar("path/to/connector/commons-lang3-3.1.jar")
sc.addJar("path/to/connector/commons-logging-1.1.1.jar")
sc.addJar("path/to/connector/guava-16.0.1.jar")
sc.addJar("path/to/connector/httpclient-4.2.5.jar")
sc.addJar("path/to/connector/httpcore-4.2.4.jar")
sc.addJar("path/to/connector/joda-convert-1.6.jar")
sc.addJar("path/to/connector/joda-time-2.3.jar")
sc.addJar("path/to/connector/libthrift-0.9.1.jar")
sc.addJar("path/to/connector/lz4-1.2.0.jar")
sc.addJar("path/to/connector/metrics-core-3.0.2.jar")
sc.addJar("path/to/connector/netty-3.9.0.Final.jar")
sc.addJar("path/to/connector/slf4j-api-1.7.5.jar")
sc.addJar("path/to/connector/snappy-java-1.0.5.jar")
sc.addJar("path/to/connector/spark-cassandra-connector_2.10-1.0.0-rc2.jar")

val table = sc.cassandraTable("keyspace", "table")
table.count

Make sure you are on a new line after 'table.count', then hit ctl-D to get out of paste mode.

If everything is set up correctly it should start running the script and at the end it will print out the number of rows in your Cassandra database.

Thanks to Al Toby, Open Source Mechanic at DataStax, for the connector installation script and for the blog post that helped me write this guide.

Have fun with Spark and Cassandra!

@tehong
Copy link

tehong commented Sep 24, 2015

I've updated the install_connector.sh to use the latest ivy jar and latest spark-cassandra-connector:

[master][~/Downloads/tmp/connector]$ cat install_connector.sh

!/bin/bash

Installs the spark-cassandra-connector and support libs

mkdir /opt/connector
cd /opt/connector

rm *.jar

curl -o ivy-2.4.0.jar
'https://repo1.maven.org/maven2/org/apache/ivy/ivy/2.4.0/ivy-2.4.0.jar'
curl -o spark-cassandra-connector_2.11-1.5.0-M1.jar
'https://repo1.maven.org/maven2/com/datastax/spark/spark-cassandra-connector_2.11/1.5.0-M1/spark-cassandra-connector_2.11-1.5.0-M1.jar'

java -jar ivy-2.4.0.jar -dependency org.apache.cassandra cassandra-thrift 2.2.1 -retrieve "[artifact]-revision.[ext]"
java -jar ivy-2.4.0.jar -dependency com.datastax.cassandra cassandra-driver-core 2.1.7.1 -retrieve "[artifact]-revision.[ext]"
java -jar ivy-2.4.0.jar -dependency joda-time joda-time 2.8.2 -retrieve "[artifact]-revision.[ext]"
java -jar ivy-2.4.0.jar -dependency org.joda joda-convert 1.7 -retrieve "[artifact]-revision.[ext]"

rm -f *-{sources,javadoc}.jar

However, i had to delete the /opt/connector/log4j-over-slf4j-1.7.7.jar manually since the latest Spark now uses slf4j-log4j instead.

Copy link

ghost commented Jan 23, 2016

As with most gists, out-of-date versions failing: situation normal. Thong's solution has a markdown paste error, need these retrieve patterns:

-retrieve "[artifact]-[revision](-[classifier]).[ext]"

The jars in the paste script are now invalid: there are now newer versions, and older versions (yep) and jars that no longer exist at all. Punting and going back to Linux for my sandbox.

It's definitely not unique to this setup, these rarely work anywhere but on the author's machine and then only for a week or two -- only half kidding there. So much for Ant, Maven and Gradle. Nothing can save us from Java Jar Hell.

@sparksha
Copy link

sparksha commented Jan 5, 2017

I am getting thie error
// Exiting paste mode, now interpreting.

:21: error: object datastax is not a member of package com
import com.datastax.spark.connector._
^
:44: error: value cassandraTable is not a member of org.apache.spark.SparkContext
val table = sc.cassandraTable("univ", "student")

@abdu22
Copy link

abdu22 commented Jul 26, 2019

I am getting the same with
// Exiting paste mode, now interpreting.

:24: error: object datastax is not a member of package com
import com.datastax.spark.connector._
^
:54: error: value cassandraTable is not a member of org.apache.spark.SparkContext
val table = sc.cassandraTable("lab", "movies")

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment