Skip to content

Instantly share code, notes, and snippets.

View umbertogriffo's full-sized avatar

Umberto Griffo umbertogriffo

View GitHub Profile
@lesstif
lesstif / tomcat-service.sh
Last active July 27, 2021 01:26
RHEL/CentOS tomcat7 init.d service script.
#!/bin/bash
#
# tomcat
#
# chkconfig: 345 96 30
# description: Start up the Tomcat servlet engine.
#
# processname: java
# pidfile: /var/run/tomcat.pid
#
name := "playground"
version := "1.0"
scalaVersion := "2.10.4"
libraryDependencies += "org.apache.spark" %% "spark-core" % "1.1.0"
libraryDependencies += "net.sf.opencsv" % "opencsv" % "2.3"
@iamaziz
iamaziz / cipynb.py
Created February 16, 2015 01:01
Convert all ipython notebook(s) in a given directory into the selected format and place output in a separate folder. Using: ipython nbconvert and find command (Unix-like OS).
#!/usr/bin/env python
__author__ = 'Aziz'
"""
Convert all ipython notebook(s) in a given directory into the selected format and place output in a separate folder.
usages: python cipynb.py `directory` [-to FORMAT]
Using: ipython nbconvert and find command (Unix-like OS).
@squito
squito / AccumulatorListener.scala
Last active March 15, 2019 06:34
Accumulator Examples
import scala.collection.mutable.Map
import org.apache.spark.{Accumulator, AccumulatorParam, SparkContext}
import org.apache.spark.scheduler.{SparkListenerStageCompleted, SparkListener}
import org.apache.spark.SparkContext._
/**
* just print out the values for all accumulators from the stage.
* you will only get updates from *named* accumulators, though
@ahoy-jon
ahoy-jon / CogroupDf.scala
Last active February 3, 2020 11:08
DataFrame.cogroup is the new HList.flatMap (UNFORTUNATELY, THIS IS VERY SLOW)
package org.apache.spark.sql.utils
import org.apache.spark.Partitioner
import org.apache.spark.rdd.{CoGroupedRDD, RDD}
import org.apache.spark.sql.catalyst.{CatalystTypeConverters, ScalaReflection}
import org.apache.spark.sql.execution.LogicalRDD
import org.apache.spark.sql.types.{ArrayType, StructField, StructType}
import org.apache.spark.sql.{SQLContext, DataFrame, Row}
import scala.reflect.ClassTag
import scala.reflect.runtime.universe.TypeTag
@tomron
tomron / spark_knn_approximation.py
Created November 19, 2015 16:47
A naive approximation of k-nn algorithm (k-nearest neighbors) in pyspark. Approximation quality can be controlled by number of repartitions and number of repartition
from __future__ import print_function
import sys
from math import sqrt
import argparse
from collections import defaultdict
from random import randint
from pyspark import SparkContext
@frgomes
frgomes / AnyToDouble.scala
Last active January 23, 2022 23:15
Scala - Converts Any to Double, to LocalDate and Date
// this flavour is pure magic...
def toDouble: (Any) => Double = { case i: Int => i case f: Float => f case d: Double => d }
// whilst this flavour is longer but you are in full control...
object any2Double extends Function[Any,Double] {
def apply(any: Any): Double =
any match { case i: Int => i case f: Float => f case d: Double => d }
}
// like when you can invoke any2Double from another similar conversion...
@gbishop
gbishop / Args.ipynb
Last active July 18, 2022 11:43
Allow arguments to be passed to notebooks via URL or command line.
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@blakewrege
blakewrege / NodeData.txt
Last active November 6, 2022 14:59
Parallel Prims Algorithm for Spark Graphx
4,6
1,2,5
1,3,8
1,4,4
2,3,8
2,4,7
3,4,1
# Spark Streaming Logging Configuration
# See also: http://spark.apache.org/docs/2.0.2/running-on-yarn.html#debugging-your-application
log4j.rootLogger=INFO, stderr
# application namespace configuration
log4j.logger.de.inovex.mysparkapp=stderr, stdout
# Write all logs to standard Spark stderr file
log4j.appender.stderr=org.apache.log4j.RollingFileAppender