Skip to content

Instantly share code, notes, and snippets.

View umbertogriffo's full-sized avatar

Umberto Griffo umbertogriffo

View GitHub Profile
@frgomes
frgomes / AnyToDouble.scala
Last active January 23, 2022 23:15
Scala - Converts Any to Double, to LocalDate and Date
// this flavour is pure magic...
def toDouble: (Any) => Double = { case i: Int => i case f: Float => f case d: Double => d }
// whilst this flavour is longer but you are in full control...
object any2Double extends Function[Any,Double] {
def apply(any: Any): Double =
any match { case i: Int => i case f: Float => f case d: Double => d }
}
// like when you can invoke any2Double from another similar conversion...
@tomron
tomron / spark_knn_approximation.py
Created November 19, 2015 16:47
A naive approximation of k-nn algorithm (k-nearest neighbors) in pyspark. Approximation quality can be controlled by number of repartitions and number of repartition
from __future__ import print_function
import sys
from math import sqrt
import argparse
from collections import defaultdict
from random import randint
from pyspark import SparkContext
@ahoy-jon
ahoy-jon / CogroupDf.scala
Last active February 3, 2020 11:08
DataFrame.cogroup is the new HList.flatMap (UNFORTUNATELY, THIS IS VERY SLOW)
package org.apache.spark.sql.utils
import org.apache.spark.Partitioner
import org.apache.spark.rdd.{CoGroupedRDD, RDD}
import org.apache.spark.sql.catalyst.{CatalystTypeConverters, ScalaReflection}
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.{ArrayType, StructField, StructType}
import org.apache.spark.sql.{SQLContext, DataFrame, Row}
import scala.reflect.ClassTag
import scala.reflect.runtime.universe.TypeTag
@squito
squito / AccumulatorListener.scala
Last active March 15, 2019 06:34
Accumulator Examples
import scala.collection.mutable.Map
import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
import org.apache.spark.scheduler.{SparkListenerStageCompleted, SparkListener}
import org.apache.spark.SparkContext._
/**
* just print out the values for all accumulators from the stage.
* you will only get updates from *named* accumulators, though
@iamaziz
iamaziz / cipynb.py
Created February 16, 2015 01:01
Convert all ipython notebook(s) in a given directory into the selected format and place output in a separate folder. Using: ipython nbconvert and find command (Unix-like OS).
#!/usr/bin/env python
__author__ = 'Aziz'
"""
Convert all ipython notebook(s) in a given directory into the selected format and place output in a separate folder.
usages: python cipynb.py `directory` [-to FORMAT]
Using: ipython nbconvert and find command (Unix-like OS).
name := "playground"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"
libraryDependencies += "net.sf.opencsv" % "opencsv" % "2.3"
@lesstif
lesstif / tomcat-service.sh
Last active July 27, 2021 01:26
RHEL/CentOS tomcat7 init.d service script.
#!/bin/bash
#
# tomcat
#
# chkconfig: 345 96 30
# description: Start up the Tomcat servlet engine.
#
# processname: java
# pidfile: /var/run/tomcat.pid
#