Skip to content

Instantly share code, notes, and snippets.

@sadikovi
sadikovi / inotify.scala
Last active January 2, 2020 11:56
HDFS notification system example
import java.net.URI
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs._
import org.apache.hadoop.hdfs.client._
import org.apache.hadoop.hdfs.inotify._
val url = new URI("hdfs://localhost:8020")
val conf = new Configuration(false)
val dfs = new HdfsAdmin(url, conf)
val stream = dfs.getInotifyEventStream()
@sadikovi
sadikovi / SystemSimulation.scala
Created February 25, 2015 09:23
Example of Gatling scenario that uses complex authentication with response processing (asking for auth-token, encrypting it, sending back, verifying logon). Each "browsing" request is sent, and based on response several sub-requests are generated, imitating drill-down into some piece of data on a website.
package systemsimulation
import io.gatling.core.Predef._
import io.gatling.core.session._
import io.gatling.http.Predef._
import scala.concurrent.duration._
import general._
class SystemSimulation extends Simulation {
// configure proxy
@sadikovi
sadikovi / PointType.scala
Last active April 2, 2019 00:48
Spark UDT and UDAF with custom buffer type
package org.apache.spark
import org.apache.spark.sql.catalyst.util._
import org.apache.spark.sql.types._
@SQLUserDefinedType(udt = classOf[PointType])
case class Point(mac: String, start: Long, end: Long) {
override def hashCode(): Int = {
31 * (31 * mac.hashCode) + start.hashCode
}
@sadikovi
sadikovi / parquet_read.scala
Created November 4, 2018 09:46
Parquet MR read file and list all of the records
////////////////////////////////////////////////////////////////
// == Parquet read ==
////////////////////////////////////////////////////////////////
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.mapreduce._
import org.apache.hadoop.mapreduce.lib.input.FileSplit
import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
import org.apache.parquet.hadoop.ParquetInputSplit
import org.apache.parquet.hadoop.ParquetRecordReader
@sadikovi
sadikovi / CustomEncoder.scala
Created September 21, 2018 12:10
Code to create a custom encoder for a class with different fields, including Row
def clazz[T](cls: Class[T], encoders: Seq[(String, ExpressionEncoder[_])]): ExpressionEncoder[T] = {
encoders.foreach { case (_, enc) => enc.assertUnresolved() }
val schema = StructType(encoders.map {
case (fieldName, e) =>
val (dataType, nullable) = if (e.flat) {
e.schema.head.dataType -> e.schema.head.nullable
} else {
e.schema -> true
}
@sadikovi
sadikovi / Example.java
Last active September 14, 2018 08:23
Issue #158 example
final class Example {
void /* test */ func() {
String a = "a";
String b = "a" + b + "c()";
Buffer buf = "test" + "new Buffer() {};";
HashSet<String> test = new HashSet<String>();
}
public int get_int() {
@sadikovi
sadikovi / DefaultSource.scala
Last active June 18, 2018 12:53
Example of StreamSinkProvider for structured streaming with custom query execution
package org.apache.spark.sql.sadikovi
import java.io.{ObjectInputStream, ObjectOutputStream}
import java.util.UUID
import org.apache.hadoop.conf.Configuration
import org.apache.hadoop.fs.Path
import org.apache.hadoop.mapreduce.{JobContext, TaskAttemptContext}
import org.apache.spark.internal.io._
@sadikovi
sadikovi / module.patch
Created May 31, 2018 07:05
Java 9 module grammar for language-java
diff --git a/grammars/java.cson b/grammars/java.cson
index cb9947a..399c914 100644
--- a/grammars/java.cson
+++ b/grammars/java.cson
@@ -109,6 +109,9 @@
{
'include': '#code'
}
+ {
+ 'include': '#module'
@sadikovi
sadikovi / stats.rs
Created April 26, 2018 08:36
Mutable Statistics Buffer (for collecting statistics during writes). For PR https://github.com/sunchao/parquet-rs/pull/94
// ----------------------------------------------------------------------
// Statistics updates
struct MutableStatisticsBuffer<T: DataType> {
typed: TypedStatistics<T>,
sort_order: SortOrder
}
impl<T: DataType> MutableStatisticsBuffer<T> {
pub fn new(column_order: ColumnOrder, is_min_max_deprecated: bool) -> Self {
@sadikovi
sadikovi / spellchecker.scala
Last active April 25, 2018 03:50
Simple spell checker based on dynamic programming
abstract class Spelling
case class CorrectSpelling(word: String) extends Spelling
case class IncorrectSpelling(word: String, suggestions: List[String]) extends Spelling
case class Spellchecker(dictionary: String) {
private val numSuggestions = 10
private val maxDistance = 5
// set of valid words (replace with trie for space efficiency)
private val set = readDict(dictionary)
private val heap = new java.util.PriorityQueue[(Int, String)](