Skip to content

Instantly share code, notes, and snippets.

Pathikrit Bhowmick pathikrit

Block or report user

Report or block pathikrit

Hide content and notifications from this user.

Learn more about blocking users

Contact Support about this user’s behavior.

Learn more about reporting abuse

Report abuse
View GitHub Profile
pathikrit / GzipSplitter.scala
Last active Dec 12, 2019
Split a file into multiple GZIP files
View GzipSplitter.scala
import better.files._
import squants.information._, InformationConversions._
object GzipSplitter {
/** Splits the $inputstream into approximately equal chunks of $splitSize gzip files under $outputDirectory */
def split(
inputStream : InputStream,
outputDirectory : File = File.newTemporaryDirectory(),
pathikrit / SparkDataLoad.scala
Last active Oct 1, 2019
Spark utils to ship data
View SparkDataLoad.scala
import java.nio.charset.{ Charset, StandardCharsets }
import org.apache.spark.sql._
import org.apache.spark.sql.types._
object SparkDataLoad {
def fromCsv[A : Encoder](
path: Set[String],
encoding: Charset = StandardCharsets.UTF_8,
useHeader: Boolean = false,
pathikrit / SphericalDistance.scala
Last active Sep 24, 2019
Distance calculator between 2 coordinates on a planet
View SphericalDistance.scala
/** Distance between 2 coordinates (in degrees) */
def dist(
p1: (Double, Double), // Coordinate 1 (in degrees)
p2: (Double, Double), // Coordinate 2 (in degrees)
manhattanDist: Boolean = false, // If true, calculate Manhattan distance on the sphere :)
diameter: Double = 7917.5 // Diameter of Earth in miles; set this to whatever planet/units you want
): Double = {
import Math._
def haversine(theta: Double) = (1 - cos(theta))/2
View BooleanMonitor.scala
import java.util.concurrent.TimeUnit
import scala.concurrent.duration.Duration
class BooleanMonitor(monitor: Monitor = new Monitor())(check: => Boolean) {
private val guard = new Monitor.Guard(monitor) { override def isSatisfied = check }
def whenSatisfied[U](timeout: Duration = Duration.Inf)(f: => U): U = {
View SparkSchemaDsl.scala
import org.apache.spark.sql.types._
import org.apache.spark.sql._
object SchemaDsl {
case class ScalaToSparkType[ScalaType](sparkType: DataType, isNullable: Boolean = false) {
def toField(name: String) = StructField(name = name, dataType = sparkType, nullable = isNullable)
implicit val stringType: ScalaToSparkType[String] = ScalaToSparkType(StringType)
implicit val intType: ScalaToSparkType[Int] = ScalaToSparkType(IntegerType)
pathikrit / script.scala
Last active Dec 12, 2018
Move duplicate files to a directory
View script.scala
import better.files._
def moveDupes(
dir: File,
logFile: File = (File.home / "dupes.txt"),
dupeFolder: File = (File.home / 'dupes).createDirectory()
) = {
for {
log <- logFile.printWriter()
(hash, toKeep :: toMove) <- dir.listRecursively.toSeq.groupBy(_.md5).mapValues(_.toList)
pathikrit / GitPunchCard.scala
Last active Apr 15, 2018
Scala Script to print Git PunchCard
View GitPunchCard.scala
* Quick and dirty Scala app to print git commit punch-card e.g.
* ┃08┃09┃10┃11┃12┃13┃14┃15┃16┃17┃18┃19┃20┃21┃22┃23┃00┃01┃02┃03┃04┃05┃06┃07┃
* Sun┃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
* Mon┃▁▁▁▄▄▄▅▅▅▅▅▅▄▄▄▆▆▆▇▇▇▇▇▇███▆▆▆▅▅▅▄▄▄▃▃▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
* Tue┃▁▁▁▃▃▃▆▆▆▅▅▅▅▅▅▅▅▅▆▆▆▇▇▇▆▆▆▆▆▆▅▅▅▅▅▅▄▄▄▃▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
* Wed┃▁▁▁▄▄▄▅▅▅▇▇▇▅▅▅▅▅▅███▇▇▇▅▅▅▆▆▆▇▇▇▃▃▃▃▃▃▂▂▂▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
* Thu┃▁▁▁▂▂▂▄▄▄▆▆▆▅▅▅▆▆▆▇▇▇▇▇▇▆▆▆▇▇▇▅▅▅▄▄▄▃▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
* Fri┃▁▁▁▂▂▂▄▄▄▅▅▅▅▅▅▄▄▄▄▄▄▅▅▅▅▅▅▃▃▃▃▃▃▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁
View local.sbt
triggeredMessage := Watched.clearWhenTriggered
libraryDependencies += "com.lihaoyi" % "ammonite" % "latest.release" % "test" cross CrossVersion.full
initialCommands in (Test, console) := """ammonite.Main().run()"""
watchSources ++= (
(baseDirectory.value * "*.sbt").get
++ (baseDirectory.value / "project" * "*.scala").get
++ (baseDirectory.value / "project" * "*.sbt").get
pathikrit / MajorityElement.scala
Last active Jun 12, 2017
Boyer–Moore majority vote algorithm
View MajorityElement.scala
import scala.collection.generic.Growable
* Boyer–Moore majority vote algorithm (–Moore_majority_vote_algorithm)
* A Data structure that supports O(1) tracking of the majority element in streaming data
* (i.e. something that occurs strictly > 50% of the time)
class MajorityElement[A] extends Growable[A] {
private[this] var majorityElement = Option.empty[A]
private[this] var count = 0
pathikrit /
Last active Nov 2, 2019
Contravariance vs. Covariance
  • Let C<A> be a higher-kinded type e.g. in List<Animal>, List is C and Animal is A.
  • Let S be a subtype of T e.g. in class Cat extends Animal, Cat is S and Animal is T
  • If C<S> is a subtype of C<T>, then C is covaraint on T e.g. List<Cat> is a subtype of List<Animal>
  • If C<T> is a subtype of C<S>, then C is contravariant on T e.g. Predicate<Animal> is a subtype of Predicate<Cat>
  • If neither C<T> and nor C<S> are subtypes of the other, thenC is invariant on T
  • If both C<T> and C<S> are subtypes of each other, then C is phantom variant on T. This is possible in languages which support phantom types like Haskell

In Scala:

You can’t perform that action at this time.