Skip to content

Instantly share code, notes, and snippets.

View umbertogriffo's full-sized avatar

Umberto Griffo umbertogriffo

View GitHub Profile
@umbertogriffo
umbertogriffo / TestPerformance.scala
Last active April 13, 2017 09:33
This Scala code tests the performance of Euclidean distance developed using map-reduce pattern, treeReduce and treeAggregate.
import org.apache.commons.lang.SystemUtils
import org.apache.spark.mllib.random.RandomRDDs._
import org.apache.spark.sql.SQLContext
import org.apache.spark.{SparkConf, SparkContext}
import scala.math.sqrt
/**
* Created by Umberto on 08/02/2017.
*/
import java.util.*;
import java.util.Map.Entry;
import java.util.stream.Collectors;
/**
* Created by Umberto on 16/05/2017.
*/
public class HashMapUtils {
@umbertogriffo
umbertogriffo / RddAPI.scala
Last active January 29, 2020 12:57
This is a collections of examples about Apache Spark's RDD Api. These examples aim to help me test the RDD functionality.
/*
This is a collections of examples about Apache Spark's RDD Api. These examples aim to help me test the RDD functionality.
References:
http://spark.apache.org/docs/latest/programming-guide.html
http://homepage.cs.latrobe.edu.au/zhe/ZhenHeSparkRDDAPIExamples.html
*/
object RddAPI {
@umbertogriffo
umbertogriffo / TopicModelingFromScratchinPython.ipynb
Created February 12, 2018 14:08
Topic Modeling From Scratch in Python
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@umbertogriffo
umbertogriffo / JavaRddAPI.java
Created February 23, 2018 11:50
This is a collections of examples about Apache Spark's JavaRDD Api. These examples aim to help me test the JavaRDD functionality.
package test.idlike.spark.datastructure;
import org.apache.commons.lang3.SystemUtils;
import org.apache.spark.api.java.JavaDoubleRDD;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.function.Function;
import org.apache.spark.api.java.function.VoidFunction;
import scala.Tuple2;
@umbertogriffo
umbertogriffo / falsehoods-programming-time-list.md
Created August 6, 2019 10:03 — forked from timvisee/falsehoods-programming-time-list.md
Falsehoods programmers believe about time, in a single list

Falsehoods programmers believe about time

This is a compiled list of falsehoods programmers tend to believe about working with time.

Don't re-invent a date time library yourself. If you think you understand everything about time, you're probably doing it wrong.

Falsehoods

  • There are always 24 hours in a day.
  • February is always 28 days long.
  • Any 24-hour period will always begin and end in the same day (or week, or month).
@umbertogriffo
umbertogriffo / broadcast_join_medium_size.scala
Last active December 11, 2020 16:05
broadcast_join_medium_size
import org.apache.spark.sql.functions._
val mediumDf = Seq((0, "zero"), (4, "one")).toDF("id", "value")
val largeDf = Seq((0, "zero"), (2, "two"), (3, "three"), (4, "four"), (5, "five")).toDF("id", "value")
mediumDf.show()
largeDf.show()
/*
+---+-----+
@umbertogriffo
umbertogriffo / install_python_rosetta.sh
Last active February 23, 2022 12:28
This installs Python under Rosetta and assign it to pyenv to avoid: ModuleNotFoundError: No module named '_ctypes' on M1 Apple Silicon
#!/usr/bin/env bash
# This installs Python under Rosetta and assign it to pyenv.
# This way of installing Python avoids: ModuleNotFoundError: No module named '_ctypes'
# pyenv has to be installed from Github https://laict.medium.com/install-python-on-macos-11-m1-apple-silicon-using-pyenv-12e0729427a9
version=$1
if [ "$#" -ne 1 ]; then
echo "Illegal number of parameters. Usage:"
@umbertogriffo
umbertogriffo / build_lightgbm_from_gitthub.md
Last active February 24, 2023 12:52
MacOS 12 M1 (Apple Silicon) - Build LightGBM from GitHub

MacOS 12 M1 (Apple Silicon) - Build LightGBM from GitHub

Install CMake (3.16 or higher):

brew install cmake
# On MacOS 11 M1 - Apple Silicon
ibrew install cmake

Install OpenMP:

@umbertogriffo
umbertogriffo / install_tensorflow.md
Last active July 20, 2022 11:54
MacOS 12 M1 (Apple Silicon) - Installs Tensorflow 2.9.1