Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Deploy a local multi-node Cassandra cluster using Docker
Run Spark shell using spark-shell --packages datastax:spark-cassandra-connector:2.4.0-s_2.11 command
Check the versions compatibility on
import com.datastax.spark.connector._
import org.apache.spark.sql._
val spark = SparkSession.builder().
appName("Spark SQL practice").master("local[*]").
config("", "localhost").
val sc = spark.sparkContext
val rdd = sc.cassandraTable("keyspace_name", "table_name")
It returns and RDD
Next part, you can read it as RDD of a case class
final case class CustomCaseClass(field1: Long, field2: Long, field3: String)
val merchantTxn = sc.cassandraTable[CustomCaseClass]("keyspace_name", "table_name")
# Run the first node and keep it in background up and running
docker run --name cassandra-1-3-0 -p 9042:9042 -d cassandra:3.0
INSTANCE1=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-1-3-0)
echo "Instance 1: ${INSTANCE1}"
# Run the second node
docker run --name cassandra-2-3-0 -p 9043:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1 cassandra:3.0
INSTANCE2=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-2-3-0)
echo "Instance 2: ${INSTANCE2}"
# Connect to the cluster using cqlsh
# It will delete the docker container automatically once you close it
docker run -it --link cassandra-1-3-0 --rm cassandra:3.0 bash -c "exec cqlsh $INSTANCE1"
# Cleanup
# docker stop cassandra-1-3-0
# docker stop cassandra-2-3-0
# docker rm cassandra-1-3-0
# docker rm cassandra-2-3-0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment