Skip to content

Instantly share code, notes, and snippets.

View mdespriee's full-sized avatar

mathieu mdespriee

View GitHub Profile
@fcamblor
fcamblor / UpdateMACJDK.md
Last active November 9, 2020 13:49
[MAC users] Updating Oracle JDK to OpenJDK

Starting from February 2019, Oracle JDK updates will require Commercial license if you make any business using Java (eg, you're a Java Developper not spending only your spare time developping with Java).

It includes Java not only on your server, but on your laptop as well.

Source | Source

What's nasty is Java versions will remains free to download since if you use it as an individual/personal use, you won't be concerned by the commercial aspects of the license.

@felixcheung
felixcheung / sparkPercentile.scala
Last active March 28, 2018 08:26
Spark compute percentile with RDD in Scala
/**
* compute percentile from an unsorted Spark RDD
* @param data: input data set of Long integers
* @param tile: percentile to compute (eg. 85 percentile)
* @return value of input data at the specified percentile
*/
def computePercentile(data: RDD[Long], tile: Double): Double = {
// NIST method; data to be sorted in ascending order
val r = data.sortBy(x => x)
val c = r.count()
@keo
keo / bootstrap.sh
Last active January 25, 2024 15:49
Setup encrypted partition for Docker containers
#!/bin/sh
# Setup encrypted disk image
# For Ubuntu 14.04 LTS
CRYPTFS_ROOT=/cryptfs
apt-get update
apt-get -y upgrade
apt-get -y install cryptsetup
@wpm
wpm / spark_parallel_boost.py
Last active December 3, 2018 02:56
A simple example of how to integrate the Spark parallel computing framework and the scikit-learn machine learning toolkit. This script randomly generates test and train data sets, trains an ensemble of decision trees using boosting, and applies the ensemble to the test set. The ensemble training is done in parallel.
from pyspark import SparkContext
import numpy as np
from sklearn.cross_validation import train_test_split, Bootstrap
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score
from sklearn.tree import DecisionTreeClassifier
def run(sc):
@iamatypeofwalrus
iamatypeofwalrus / roll_ipython_in_aws.md
Last active January 22, 2024 11:18
Create an iPython HTML Notebook on Amazon's AWS Free Tier from scratch.

What

Roll your own iPython Notebook server with Amazon Web Services (EC2) using their Free Tier.

What are we using? What do you need?

  • An active AWS account. First time sign-ups are eligible for the free tier for a year
  • One Micro Tier EC2 Instance
  • With AWS we will use the stock Ubuntu Server AMI and customize it.
  • Anaconda for Python.
  • Coffee/Beer/Time
@sekimura
sekimura / text_strip_margin.py
Created May 13, 2012 04:08
Text (heredoc) strip margin in Python
import re
def strip_margin(text):
return re.sub('\n[ \t]*\|', '\n', text)
def strip_heredoc(text):
indent = len(min(re.findall('\n[ \t]*(?=\S)', text) or ['']))
pattern = r'\n[ \t]{%d}' % (indent - 1)
return re.sub(pattern, '\n', text)