Skip to content

Instantly share code, notes, and snippets.

View umbertogriffo's full-sized avatar

Umberto Griffo umbertogriffo

View GitHub Profile
@umbertogriffo
umbertogriffo / DataFrameSuite.scala
Last active February 12, 2020 06:13
DataFrameSuite allows you to check if two DataFrames are equal. You can assert the DataFrames equality using method assertDataFrameEquals. When DataFrames contains doubles or Spark Mllib Vector, you can assert that the DataFrames approximately equal using method assertDataFrameApproximateEquals
package test.com.idlike.junit.df
import breeze.numerics.abs
import org.apache.spark.rdd.RDD
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.{Column, DataFrame, Row}
/**
* Created by Umberto on 06/02/2017.
*/
@umbertogriffo
umbertogriffo / broadcast_join_medium_size.scala
Last active December 11, 2020 16:05
broadcast_join_medium_size
import org.apache.spark.sql.functions._
val mediumDf = Seq((0, "zero"), (4, "one")).toDF("id", "value")
val largeDf = Seq((0, "zero"), (2, "two"), (3, "three"), (4, "four"), (5, "five")).toDF("id", "value")
mediumDf.show()
largeDf.show()
/*
+---+-----+
@umbertogriffo
umbertogriffo / TwitterSentimentAnalysisAndN-gramWithHadoopAndHiveSQL.md
Last active May 11, 2021 13:22
Step by step Tutorial on Twitter Sentiment Analysis and n-gram with Hadoop and Hive SQL

PREREQUISITES

* Download JSON Serde at:
* http://files.cloudera.com/samples/hive-serdes-1.0-SNAPSHOT.jar
* and to renominate it as hive-serdes-1.0.jar
  • Add Jar to HIVE_AUX_JARS_PATH of HiveServer2:

    1. Copy the JAR files to the host on which HiveServer2 is running. Save the JARs to any directory you choose, and make a note of the path (create directory in /usr/share/).
@umbertogriffo
umbertogriffo / install_python_rosetta.sh
Last active February 23, 2022 12:28
This installs Python under Rosetta and assign it to pyenv to avoid: ModuleNotFoundError: No module named '_ctypes' on M1 Apple Silicon
#!/usr/bin/env bash
# This installs Python under Rosetta and assign it to pyenv.
# This way of installing Python avoids: ModuleNotFoundError: No module named '_ctypes'
# pyenv has to be installed from Github https://laict.medium.com/install-python-on-macos-11-m1-apple-silicon-using-pyenv-12e0729427a9
version=$1
if [ "$#" -ne 1 ]; then
echo "Illegal number of parameters. Usage:"
@umbertogriffo
umbertogriffo / install_tensorflow.md
Last active July 20, 2022 11:54
MacOS 12 M1 (Apple Silicon) - Installs Tensorflow 2.9.1
@umbertogriffo
umbertogriffo / build_opencv.md
Last active February 24, 2023 12:48
MacOS 12 M1 (Apple Silicon) - Build OpenCV

MacOS 12 M1 (Apple Silicon) - Build OpenCV

Install CMake (3.16 or higher):

brew install cmake

If it's already installed, uninstall opencv-python:

@umbertogriffo
umbertogriffo / build_lightgbm_from_gitthub.md
Last active February 24, 2023 12:52
MacOS 12 M1 (Apple Silicon) - Build LightGBM from GitHub

MacOS 12 M1 (Apple Silicon) - Build LightGBM from GitHub

Install CMake (3.16 or higher):

brew install cmake
# On MacOS 11 M1 - Apple Silicon
ibrew install cmake

Install OpenMP:

@umbertogriffo
umbertogriffo / UniqueId.java
Last active March 6, 2023 08:16
Generate Long ID from UUID
/**
* Genereate unique ID from UUID in positive space
* Reference: http://www.gregbugaj.com/?p=587
* @return long value representing UUID
*/
private Long generateUniqueId()
{
long val = -1;
do
{
@umbertogriffo
umbertogriffo / HBaseBackup.rb
Last active March 24, 2023 15:01
This code takes a snapshot of all HBase tables, using the snapshot command (No file copies are performed). Tested on CDH-5.4.4-1
# Checking if the hbase.snapshot.enabled property in hbase-site.xml is set to true
# To execute script launch this command on shell: hbase shell HBaseBackup.rb
@clusterToSave = "hdfs:///srv2:8082/hbase"
# CHECK THE PATH OF HBase lib
@libjars = `ls /opt/cloudera/parcels/CDH-5.4.4-1.cdh5.4.4.p0.4/lib/hbase/*.jar | tr "\n" ","`
@ignore = [ /zipkin\..*/i, /.*_temp/i, /.*tmp/i, /test_.*/i, /.*_test/i, /.*_old/i ]
@mappers = "2"
include Java
@umbertogriffo
umbertogriffo / cuda_11.7_installation_on_Ubuntu_22.04
Created April 6, 2023 13:13 — forked from primus852/cuda_11.7_installation_on_Ubuntu_22.04
Instructions for CUDA v11.7 and cuDNN 8.5 installation on Ubuntu 22.04 for PyTorch 1.12.1
#!/bin/bash
### steps ####
# verify the system has a cuda-capable gpu
# download and install the nvidia cuda toolkit and cudnn
# setup environmental variables
# verify the installation
###
### to verify your gpu is cuda enable check