Skip to content

Instantly share code, notes, and snippets.


Louis FRULEUX bluesheeptoken

View GitHub Profile
import random
p = 1/6
nb_roll = 4
nb_simulation = 10_000
nb_sucess = 0
for _ in range(nb_simulation):
if any(random.random() < p for _ in range(nb_roll)):
bluesheeptoken /
Created Nov 27, 2020
Change Spark version in each submodule
Small script to change Apache Spark version in all the modules.
It has been tested against Spark 3.0.1
We used it to rebuild Spark against different versions of the dependencies
import os
def main():
old_version = "3.0.1"
bluesheeptoken / MeanAggregatorUdaf.scala
Created Nov 9, 2020
Examples of Mean Udaf using `UserDefinedAggregateFunction` and `Aggregator`
View MeanAggregatorUdaf.scala
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.expressions.{Aggregator, UserDefinedFunction}
import org.apache.spark.sql.functions._
case class AggregatorState(sum: Long, count: Long)
// Aggregator[IN, BUF, OUT]
val meanAggregator = new Aggregator[Long, AggregatorState, Double]() {
// Initialize your buffer