Skip to content

Instantly share code, notes, and snippets.

View eric-maynard's full-sized avatar

Eric Maynard eric-maynard

View GitHub Profile
@eric-maynard
eric-maynard / TripleCounterExample.scala
Last active April 9, 2019 06:22
Custom Spark AccumulatorV2 example
import org.apache.spark.util._
def vectorizedAddUnsafe128x3(a: Array[Byte], b: Array[Byte]): Array[Byte] = {
var carry: Int = 0
((a.length -1) to 0 by -4).map(i => {
val x: Long =
java.lang.Integer.toUnsignedLong(((((((a(i - 3)) << 24) | a(i - 2)) << 16) | a(i - 1)) << 8) | a(i)) +
java.lang.Integer.toUnsignedLong(((((((b(i - 3)) << 24) | b(i - 2)) << 16) | b(i - 1)) << 8) | b(i)) +
carry
carry = if (x > 0xFFFFFFFFL && (i - 3) % 16 != 0) 1 else 0
@eric-maynard
eric-maynard / create-view-table.sh
Last active November 20, 2018 20:31
A script to analyze a large volume of Impala views with an Impala table
#!/bin/bash
##################################################
# Constants #
##################################################
TABLE_NAME="underlying_tables"
DB_COL="db_name"
VIEW_COL="view_name"
TABLE_COL="table_name"
@eric-maynard
eric-maynard / create-query-table.sh
Created November 7, 2018 20:36
A script to analyze Impala queries using Impala
#!/bin/bash
##################################################
# Constants #
##################################################
# curl constants:
HOST="bkestelman-1.gce.cloudera.com"
CM_USER="admin"
CM_PASS="admin"
@eric-maynard
eric-maynard / create-partition-table.sh
Last active November 20, 2018 14:31
A script to analyze a large volume of Impala partitions with an Impala table
#!/bin/bash
##################################################
# Constants #
##################################################
delimiter='~'
soft_delimiter=","
column_limit=9
@eric-maynard
eric-maynard / Example.scala
Last active November 19, 2018 14:30
Manipulating nested Spark DataFrames
package com.cloudera.example
import org.apache.spark.sql._
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
import scala.collection.mutable._
import scala.collection.mutable.ListBuffer
//command line args:
val buckets = 5
val myBucket = 1//method 2 only
//helper functions
case class TableWithHash(tableName: String, hash: Int)
def hashTable(string: tableName): TableWithHash = {
TableWithHash(tableName, (tableName.hashCode % buckets).toInt)
}
@eric-maynard
eric-maynard / cv-rf.scala
Last active December 11, 2017 04:18
A simple example of a cross-validated Random Forest model
import org.apache.spark.ml._
import org.apache.spark.ml.tuning._
import org.apache.spark.ml.evaluation._
import org.apache.spark.mllib.regression._
import org.apache.spark.mllib.linalg.DenseVector
import org.apache.spark.sql._
import org.apache.spark.ml.classification._
import org.apache.spark.ml.feature._
import sqlContext.implicits._
import org.apache.spark.mllib.evaluation.BinaryClassificationMetrics