Skip to content

Instantly share code, notes, and snippets.


Kaushal Prajapati skp33

View GitHub Profile
import org.apache.spark.sql.SparkSession;
public class SparkHiveTest {
public static void main(String[] args) {
SparkSession spark = SparkSession
.appName("Java Spark Hive Example")
.config("spark.master", "local")
View spray_nulls_in_left_join.scala
implicit class DataFrameExtended(df: DataFrame) {
import df.sqlContext.implicits._
def anyNull(cols: Seq[Column]): Column = (_ || _)
* LEFT JOIN should not join anything when join-key contains a NULL (but usually this
* would result in shuffling NULL keyed items into single or few reducers).
* This can be easily fixed by adding an additional temporary join condition that:
* - is a random seed when any of the keys is null, thus addressing the NULL skew
skp33 / CreateTableStatement.scala
Created Jan 25, 2020
Create table schema from dataset schema
View CreateTableStatement.scala
import java.lang.reflect.Method
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.catalyst.TableIdentifier
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
import org.apache.spark.sql.execution.command.ShowCreateTableCommand
import org.apache.spark.sql.types.StructType
def showCreateTableCommand(
View gist:8e2be47f9b50d704567e6e6cf6f8f7fd
spark.conf.set("spark.sql.streaming.metricsEnabled", "true")
skp33 / SparkUtils.scala
Created Nov 20, 2018 — forked from ibuenros/SparkUtils.scala
Spark productionizing utilities developed by Ooyala, shown in Spark Summit 2014
View SparkUtils.scala
import com.codahale.metrics.{MetricRegistry, Meter, Gauge}
import org.apache.spark.{SparkEnv, Accumulator}
import org.apache.spark.metrics.source.Source
import org.joda.time.DateTime
import scala.collection.mutable
skp33 / PointType.scala
Created Aug 30, 2018 — forked from sadikovi/PointType.scala
Spark UDT and UDAF with custom buffer type
View PointType.scala
package org.apache.spark
import org.apache.spark.sql.catalyst.util._
import org.apache.spark.sql.types._
@SQLUserDefinedType(udt = classOf[PointType])
case class Point(mac: String, start: Long, end: Long) {
override def hashCode(): Int = {
31 * (31 * mac.hashCode) + start.hashCode
skp33 / SudokuSolver.scala
Created Jul 27, 2018 — forked from pathikrit/SudokuSolver.scala
Sudoku Solver in Scala
View SudokuSolver.scala
val n = 9
val s = Math.sqrt(n).toInt
type Board = IndexedSeq[IndexedSeq[Int]]
def solve(board: Board, cell: Int = 0): Option[Board] = (cell%n, cell/n) match {
case (r, `n`) => Some(board)
case (r, c) if board(r)(c) > 0 => solve(board, cell + 1)
case (r, c) =>
def cells(i: Int) = Seq(board(r)(i), board(i)(c), board(s*(r/s) + i/s)(s*(c/s) + i%s))
def guess(x: Int) = solve(board.updated(r, board(r).updated(c, x)), cell + 1)
skp33 / Deserialization.scala
Created Apr 6, 2018 — forked from ramn/Deserialization.scala
Object serialization example in Scala
View Deserialization.scala
class Animal(name: String, age: Int) extends Serializable {
override def toString = s"Animal($name, $age)"
case class Person(name: String)
// or fork := true in sbt
skp33 / CogroupDf.scala
Created Mar 29, 2018 — forked from ahoy-jon/CogroupDf.scala
DataFrame.cogroup is the new HList.flatMap (UNFORTUNATELY, THIS IS VERY SLOW)
View CogroupDf.scala
package org.apache.spark.sql.utils
import org.apache.spark.Partitioner
import org.apache.spark.rdd.{CoGroupedRDD, RDD}
import org.apache.spark.sql.catalyst.{CatalystTypeConverters, ScalaReflection}
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.{ArrayType, StructField, StructType}
import org.apache.spark.sql.{SQLContext, DataFrame, Row}
import scala.reflect.ClassTag
import scala.reflect.runtime.universe.TypeTag
skp33 / reactive_map.js
Created Mar 17, 2018 — forked from granturing/reactive_map.js
Sample reactive Leaflet code for Zeppelin
View reactive_map.js
<!-- place this in an %angular paragraph -->
<link rel="stylesheet" href="" />
<div id="map" style="height: 800px; width: 100%"></div>
<script type="text/javascript">
function initMap() {
var map ='map').setView([30.00, -30.00], 3);
L.tileLayer('http://{s}{z}/{x}/{y}.png', {