Skip to content

Instantly share code, notes, and snippets.

Avatar

Louis FRULEUX bluesheeptoken

View GitHub Profile
View proba_simulation_dice_forge.py
import random
p = 1/6
nb_roll = 4
nb_simulation = 10_000
nb_sucess = 0
for _ in range(nb_simulation):
if any(random.random() < p for _ in range(nb_roll)):
@bluesheeptoken
bluesheeptoken / change_spark_version.py
Created Nov 27, 2020
Change Spark version in each submodule
View change_spark_version.py
"""
Small script to change Apache Spark version in all the modules.
It has been tested against Spark 3.0.1
We used it to rebuild Spark against different versions of the dependencies
"""
import os
def main():
old_version = "3.0.1"
@bluesheeptoken
bluesheeptoken / MeanAggregatorUdaf.scala
Created Nov 9, 2020
Examples of Mean Udaf using `UserDefinedAggregateFunction` and `Aggregator`
View MeanAggregatorUdaf.scala
import org.apache.spark.sql.Encoder
import org.apache.spark.sql.catalyst.encoders.ExpressionEncoder
import org.apache.spark.sql.expressions.{Aggregator, UserDefinedFunction}
import org.apache.spark.sql.functions._
case class AggregatorState(sum: Long, count: Long)
// Aggregator[IN, BUF, OUT]
val meanAggregator = new Aggregator[Long, AggregatorState, Double]() {
// Initialize your buffer