Skip to content

Instantly share code, notes, and snippets.

@rajkiran485
rajkiran485 / jsonParser.scala
Created September 18, 2020 06:24 — forked from chaitanyapolipalli/jsonParser.scala
Spark Program to read Nested JSON
package hr_data
import java.io.IOException
import java.text.SimpleDateFormat
import java.util.Calendar
import com.typesafe.config.ConfigFactory
import org.apache.spark.sql.SparkSession
object jsonParser {
@rajkiran485
rajkiran485 / PropertyTests.scala
Created September 16, 2020 14:07 — forked from davidallsopp/PropertyTests.scala
Examples of writing mixed unit/property-based (ScalaTest with ScalaCheck) tests. Includes tables and generators as well as 'traditional' tests.
/**
* Examples of writing mixed unit/property-based (ScalaCheck) tests.
*
* Includes tables and generators as well as 'traditional' tests.
*
* @see http://www.scalatest.org/user_guide/selecting_a_style
* @see http://www.scalatest.org/user_guide/property_based_testing
*/
import org.scalatest._
@rajkiran485
rajkiran485 / MyTestSuite.scala
Created November 27, 2019 19:10 — forked from melrief/MyTestSuite.scala
Problem of Spark with FunSuite and defaultParallelism
import org.scalatest.FunSuite
import org.apache.spark.SparkConf
import org.apache.spark.SparkContext
class MyTestSuite extends FunSuite {
val conf = new SparkConf()
.setAppName("My Spark test")
@rajkiran485
rajkiran485 / BIGDATA_Spark_Python.md
Created August 13, 2019 17:10
Big Data Spark Cheat sheet focusing on using Python API

Big Data Spark Cheat Sheet

This cheat sheet assume below software version

  • Spark 2.2, which requires JDK 1.8
  • CDH 5.13
  • JDK 1.8
@rajkiran485
rajkiran485 / BIGDATA_HIVE_Syntax.md
Created August 13, 2019 16:45 — forked from kzhangkzhang/BIGDATA_HIVE_Syntax.md
Hive Syntax Cheat Sheet

Hive Syntax Cheat Sheet

General rule

  • interchangeable constructs
  • hive is case sensitive
  • secmicolon to terminate statements

Hive Data Types

Metadata

#Selecting a database	
USE database;	USE database;

#Listing databases	
SHOW DATABASES;	SHOW DATABASES;
@rajkiran485
rajkiran485 / DataFrameSuite.scala
Created March 27, 2019 09:50 — forked from umbertogriffo/DataFrameSuite.scala
DataFrameSuite allows you to check if two DataFrames are equal. You can assert the DataFrames equality using method assertDataFrameEquals. When DataFrames contains doubles or Spark Mllib Vector, you can assert that the DataFrames approximately equal using method assertDataFrameApproximateEquals
package test.com.idlike.junit.df
import breeze.numerics.abs
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{Column, DataFrame, Row}
/**
* Created by Umberto on 06/02/2017.
*/
val textFile = sc.textFile("hdfs://...")
val counts = textFile.flatMap(line => line.split(" "))
.map(word => (word, 1))
.reduceByKey(_ + _)
counts.saveAsTextFile("hdfs://...")
import org.apache.spark._
import org.apache.spark.SparkConf
import org.apache.spark.sql.hive.HiveContext
import com.databricks.sparck.csv
Object Solution extends App {
val conf = new SparkConf().setAppName("Problem_Execution")
val sc = new SparkContext(conf)
val hiveContext = new HiveContext(sc)
import org.apache.spark.sql.SparkSession
import org.apache.hadoop.hbase.util.Bytes
import org.apache.hadoop.hbase.client.Result
import org.apache.hadoop.hbase.HBaseConfiguration
import org.apache.hadoop.hbase.mapreduce.TableInputFormat
import org.apache.hadoop.hbase.io.ImmutableBytesWritable
object readHbaseTableAsDF extends Serializable {
case class EmpRow(empID:String, name:String, city:String)