Skip to content

Instantly share code, notes, and snippets.

View andypetrella's full-sized avatar

Andy Petrella andypetrella

View GitHub Profile
@andypetrella
andypetrella / GraphFramesExample.snb
Last active March 4, 2016 18:25 — forked from deanwampler/GraphFramesExample.snb
Databrick's Python example for the new GraphFrame API ported to Scala and Spark Notebook
{
"metadata" : {
"name" : "GraphFramesExample",
"user_save_timestamp" : "1970-01-01T01:00:00.000Z",
"auto_save_timestamp" : "1970-01-01T01:00:00.000Z",
"language_info" : {
"name" : "scala",
"file_extension" : "scala",
"codemirror_mode" : "text/x-scala"
},
@andypetrella
andypetrella / TOC.cell.js
Last active October 2, 2015 18:26
Drop this cell into one of your Spark Notebook (https://github.com/andypetrella/spark-notebook/) declaring headings, execute, hide output, hide input
:javascript
require(["jquery", "underscore"], function(j, _) {
j(".toc").remove();
var toc = j(document.createElement("div"));
toc.attr("id", "toc")
.addClass("toc")
.css("position", "fixed")
.css("top", "15%")
.css("width", "8%")
.addClass("panel").addClass("panel-info");
@andypetrella
andypetrella / Test MLlib.snb
Last active August 29, 2015 14:15
import mllib and netlib in notebook
{"content":{"metadata":{"name":"Test MLlib","user_save_timestamp":"1970-01-01T01:00:00.000Z","auto_save_timestamp":"1970-01-01T01:00:00.000Z","language_info":{"name":"scala","file_extension":"scala","codemirror_mode":"text/x-scala"},"trusted":true},"cells":[{"metadata":{"trusted":true,"collapsed":false},"cell_type":"code","source":":local-repo /tmp/spark-notebook","outputs":[{"name":"stdout","output_type":"stream","text":"res1: String = Repo changed to /tmp/spark-notebook!\n"},{"metadata":{},"data":{"text/html":"Repo changed to /tmp/spark-notebook!"},"output_type":"execute_result","execution_count":1}]},{"metadata":{"trusted":true,"collapsed":false},"cell_type":"code","source":":dp org.apache.spark % spark-mllib_2.10 % 1.2.0\n- org.apache.spark % spark-core_2.10 % _\n- org.apache.hadoop % _ % _\ncom.github.fommil.netlib % all % 1.1.2 % pom","outputs":[{"name":"stdout","output_type":"stream","text":"jars: Array[String] = [Ljava.lang.String;@4639adb7\nres1: scala.xml.Elem = \n<pre onclick=\"this.style.display=(
@andypetrella
andypetrella / README.md
Last active August 29, 2015 14:08
Zip won't work? → "Honi soit qui mal y pense"

This gist is extracting the problem I'm facing from the Spark-Notebook (see here) to deal with a dynamic form that generates new Spark SQL.

It is in a transient status, meaning that it still contain a lot of legacy constructions that I'd like to get rid:

  • Connection
  • Observable
  • Observer But one thing at a time :-D.

The result of this thing will visually be like the image below (or above :-D)

trait CRDD[S<:Space] {
def continuum:S
def break:S#SpacePoint => (S#SpacePoint, S#SpacePoint)
// the rest follow more or less the DStream definition
}
@andypetrella
andypetrella / LCLU dans un système distribué.md
Last active August 29, 2015 14:01
Sujets de mémoire; ULg, montéfiore.

Utilisation des systèmes distribués pour l'analyse géospatiale et sociale du changement de territoire.

Le but du travail sera d'implémenter des techniques d'apprentissage et de prédiction de l'évolution d'un territoire en se basant sur un historique d'images satellitaires et envisager l'intégration des informations présentent dans un réseau social.

L'analyse géospatiale de la couverture et de l'utilisation du territoire est un des principaux sujets de recherche en géomatique et télédétection. Cela fait intervenir plusieurs techniques provenant de domaines divers, entre autres on peut citer les système de règles avec contraintes, les automates cellulaires, les réseaux probabilistes (ainsi que leurs combinaisons).

Le perfectionnement de ces techniques est indéniable mais il souffrira à terme de problèmes de performance et de "scalabilité". Cette apréhension se base sur le constat du pourcentage grandissant

@andypetrella
andypetrella / build.sbt
Last active January 3, 2016 12:09
An sbt configuration for https://github.com/NightHacking/LambdasHacking. Put this in "Code" and launch sbt. IT REQUIRES SBT > 0.13.1
libraryDependencies += "junit" % "junit" % "4.11" % "test"
libraryDependencies += "com.novocode" % "junit-interface" % "0.10" % "test"
javaSource in Test := baseDirectory.value / "test"
testOptions += Tests.Argument(TestFrameworks.JUnit, "-q", "-v")
javacOptions ++= Seq("-source", "1.8")
@andypetrella
andypetrella / README.md
Last active December 22, 2015 10:58
Scrapping the Belgium PICC

How to scrap the Belgium PICC

Download ESRI PICC data

Use the Query operation on the service, aksing for:

  • f=json json format
  • fields=*
  • geometryType=esriGeometryEnvelope
  • geometry=...
@andypetrella
andypetrella / MRTweet.scala
Last active December 22, 2015 06:19
Expressiveness, conciseness and behavior
val l:List[A]
val m[K,V]:A=>List[(K, V)]
val r[V]:List[V]=>V
val s[K]:K=>K
l .flatMap(m)
.groupBy(_._1) // map phase
.mapValues(_.map(_._2))
.groupBy{case (x, xs) => s(x) } // shuffled
.mapValues(_.map{case (x,xs) => (x, r(xs))}) // reduced
@andypetrella
andypetrella / mr.scala
Created August 29, 2013 15:38
MR scala
//def
val l:List[A]
val m[K,V]:A=>List[(K, V)]
val r[V]:List[V]=>V
val s[K]:K=>K
//mr
l.flatMap(m)
.groupBy(_._1)
.mapValues(_.map(_._2))