Skip to content

Instantly share code, notes, and snippets.

@PatrickCallaghan
Last active February 8, 2021 23:04
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save PatrickCallaghan/311959a4b27a3ed1c68f to your computer and use it in GitHub Desktop.
Save PatrickCallaghan/311959a4b27a3ed1c68f to your computer and use it in GitHub Desktop.

The default is to put this on its own node so you will need to start DSE with dse cassandra -k to create a spark analytics node.

First run the https://github.com/PatrickCallaghan/datastax-creditcard-demo project to populate the Cassandra cluster (follow instructions in README). This project will populate the cluster with credit card data and allow us to filter the results from scala in the spark shell. This will run for approx one minute and finish.

In a new window start the shark shell dse spark

Now you can run the following commands the filter out the required data that is required.

case class credit_card_balance (credit_card_no: String, balance_at: Option[java.util.Date], balance: Option[Double], transaction_time: java.util.Date, transaction_id: String, location: String, items: java.util.Map[String,Double], issuer: String, amount: Double);

val transactionsRdd = sc.cassandraTable[credit_card_balance]("datastax_creditcard_demo","credit_card_transactions_balance").cache


//Get a count of all transactions
transactionsRdd.count


//Get a count where all balances over 3000
val transactionOver3000 = transactionsRdd.filter((transaction) => transaction.amount > 3000).count

//Get a count where all balances between 3000 and 4000
val transactionbetween30004000 = transactionsRdd.filter((transaction) => transaction.amount > 3000 && transaction.amount < 4000)
transactionbetween30004000.count

//Cache the transactions for Currys
val currys = transactionsRdd.filter((transaction) => transaction.issuer.equals("Currys")).cache
currys.count

//Drill down on Currys in Manchester
val currysManchester = currys.filter((transaction) => transaction.location.equals("Manchester")).cache
currysManchester.count

currysManchester.collect.foreach(println)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment