Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Deploy a local multi-node Cassandra cluster using Docker
/*
Run Spark shell using spark-shell --packages datastax:spark-cassandra-connector:2.4.0-s_2.11 command
Check the versions compatibility on https://github.com/datastax/spark-cassandra-connector#version-compatibility
*/
import com.datastax.spark.connector._
import org.apache.spark.sql._
val spark = SparkSession.builder().
appName("Spark SQL practice").master("local[*]").
config("spark.cassandra.connection.host", "localhost").
getOrCreate()
val sc = spark.sparkContext
val rdd = sc.cassandraTable("keyspace_name", "table_name")
/*
It returns and RDD
com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow]
Next part, you can read it as RDD of a case class
*/
final case class CustomCaseClass(field1: Long, field2: Long, field3: String)
val merchantTxn = sc.cassandraTable[CustomCaseClass]("keyspace_name", "table_name")
#!/bin/bash
# Run the first node and keep it in background up and running
docker run --name cassandra-1-3-0 -p 9042:9042 -d cassandra:3.0
INSTANCE1=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-1-3-0)
echo "Instance 1: ${INSTANCE1}"
# Run the second node
docker run --name cassandra-2-3-0 -p 9043:9042 -d -e CASSANDRA_SEEDS=$INSTANCE1 cassandra:3.0
INSTANCE2=$(docker inspect --format='{{ .NetworkSettings.IPAddress }}' cassandra-2-3-0)
echo "Instance 2: ${INSTANCE2}"
# Connect to the cluster using cqlsh
# It will delete the docker container automatically once you close it
docker run -it --link cassandra-1-3-0 --rm cassandra:3.0 bash -c "exec cqlsh $INSTANCE1"
# Cleanup
# docker stop cassandra-1-3-0
# docker stop cassandra-2-3-0
# docker rm cassandra-1-3-0
# docker rm cassandra-2-3-0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment