PatrickCallaghan/CreditCardSparkShark.md

## CreditCardSparkShark.md

      
    Raw
  

              CreditCardSparkShark.md
            
          
    The default is to put this on its own node so you will need to start DSE with dse cassandra -k to create a spark analytics node.
First run the https://github.com/PatrickCallaghan/datastax-creditcard-demo project to populate the Cassandra cluster (follow instructions in README). This project will populate the cluster with credit card data and allow us to filter the results from scala in the spark shell. This will run for approx one minute and finish.
In a new window start the shark shell dse spark
Now you can run the following commands the filter out the required data that is required.
case class credit_card_balance (credit_card_no: String, balance_at: Option[java.util.Date], balance: Option[Double], transaction_time: java.util.Date, transaction_id: String, location: String, items: java.util.Map[String,Double], issuer: String, amount: Double);

val transactionsRdd = sc.cassandraTable[credit_card_balance]("datastax_creditcard_demo","credit_card_transactions_balance").cache


//Get a count of all transactions
transactionsRdd.count


//Get a count where all balances over 3000
val transactionOver3000 = transactionsRdd.filter((transaction) => transaction.amount > 3000).count

//Get a count where all balances between 3000 and 4000
val transactionbetween30004000 = transactionsRdd.filter((transaction) => transaction.amount > 3000 && transaction.amount < 4000)
transactionbetween30004000.count

//Cache the transactions for Currys
val currys = transactionsRdd.filter((transaction) => transaction.issuer.equals("Currys")).cache
currys.count

//Drill down on Currys in Manchester
val currysManchester = currys.filter((transaction) => transaction.location.equals("Manchester")).cache
currysManchester.count

currysManchester.collect.foreach(println)