Skip to content

Instantly share code, notes, and snippets.

@pathikrit
pathikrit / README.md
Last active April 24, 2021 17:36
My highly opinionated list of things needed to build an app in Scala
@pathikrit
pathikrit / SparkDataLoad.scala
Last active June 1, 2020 15:03
Spark utils to ship data
import java.nio.charset.{ Charset, StandardCharsets }
import org.apache.spark.sql._
import org.apache.spark.sql.types._
object SparkDataLoad {
def fromCsv[A : Encoder](
path: Set[String],
encoding: Charset = StandardCharsets.UTF_8,
useHeader: Boolean = false,
@pathikrit
pathikrit / Morph.scala
Created April 29, 2020 15:07
Case class morpher in scala
import shapeless._, syntax.singleton._, record._, ops.hlist._
/**
* Given an instance A and it's generic representation AR and function f from AR => BR
* we can covert A to B if we also have the generic representation of BR as B
* We also handle misalignments using shapeless's align typeclass (https://stackoverflow.com/questions/29242873/shapeless-turn-a-case-class-into-another-with-fields-in-different-order)
*/
case class Morph[A, AR](a: A)(implicit reprA: LabelledGeneric.Aux[A, AR]) {
// Why this DSL you say? Hack to get around scalac idiocy: https://stackoverflow.com/a/46614684/471136
def to[B] = new {
@pathikrit
pathikrit / Config.scala
Last active March 5, 2020 14:22
Better Config
import scala.util.control.NonFatal
import better.files.Scanner.Read
/**
* Extend this trait to create your application config
*
* Pros of this approach:
* 1) Library free approach - only 15 lines of dependency free "library" (four one-line defs for you to override)
* 2) Failures happen when the Config object is loaded instead of when a config value is accessed
* 3) Strongly typed
@pathikrit
pathikrit / GzipSplitter.scala
Last active December 12, 2019 21:29
Split a file into multiple GZIP files
import java.io.InputStream
import better.files._
import squants.information._, InformationConversions._
object GzipSplitter {
/** Splits the $inputstream into approximately equal chunks of $splitSize gzip files under $outputDirectory */
def split(
inputStream : InputStream,
outputDirectory : File = File.newTemporaryDirectory(),
@pathikrit
pathikrit / IntervalMap.java
Created May 14, 2013 08:35
Interval Map in Java
package com.github.pathikrit.scalgos;
import java.util.Map.Entry;
import java.util.TreeMap;
/**
* Efficient data structure to map values to intervals
* e.g. set(5, 60000, "hello") would set all keys in [5, 60000) to be "hello"
*
* All operations are O(log n) (in practice much faster since n is usually number of segments)
@pathikrit
pathikrit / README.md
Last active November 2, 2019 22:20
Contravariance vs. Covariance
  • Let C<A> be a higher-kinded type e.g. in List<Animal>, List is C and Animal is A.
  • Let S be a subtype of T e.g. in class Cat extends Animal, Cat is S and Animal is T
  • If C<S> is a subtype of C<T>, then C is covaraint on T e.g. List<Cat> is a subtype of List<Animal>
  • If C<T> is a subtype of C<S>, then C is contravariant on T e.g. Predicate<Animal> is a subtype of Predicate<Cat>
  • If neither C<T> and nor C<S> are subtypes of the other, thenC is invariant on T
  • If both C<T> and C<S> are subtypes of each other, then C is phantom variant on T. This is possible in languages which support phantom types like Haskell

In Scala:

@pathikrit
pathikrit / SphericalDistance.scala
Last active September 24, 2019 19:06
Distance calculator between 2 coordinates on a planet
/** Distance between 2 coordinates (in degrees) */
def dist(
p1: (Double, Double), // Coordinate 1 (in degrees)
p2: (Double, Double), // Coordinate 2 (in degrees)
manhattanDist: Boolean = false, // If true, calculate Manhattan distance on the sphere :)
diameter: Double = 7917.5 // Diameter of Earth in miles; set this to whatever planet/units you want
): Double = {
import Math._
def haversine(theta: Double) = (1 - cos(theta))/2
@pathikrit
pathikrit / BooleanMonitor.scala
Last active August 16, 2019 13:05
Boolean Monitor
import java.util.concurrent.TimeUnit
import scala.concurrent.duration.Duration
import com.google.common.util.concurrent.Monitor
class BooleanMonitor(monitor: Monitor = new Monitor())(check: => Boolean) {
private val guard = new Monitor.Guard(monitor) { override def isSatisfied = check }
def whenSatisfied[U](timeout: Duration = Duration.Inf)(f: => U): U = {
@pathikrit
pathikrit / SparkSchemaDsl.scala
Created July 12, 2019 13:01
Spark Schema DSL
import org.apache.spark.sql.types._
import org.apache.spark.sql._
object SchemaDsl {
case class ScalaToSparkType[ScalaType](sparkType: DataType, isNullable: Boolean = false) {
def toField(name: String) = StructField(name = name, dataType = sparkType, nullable = isNullable)
}
implicit val stringType: ScalaToSparkType[String] = ScalaToSparkType(StringType)
implicit val intType: ScalaToSparkType[Int] = ScalaToSparkType(IntegerType)