Skip to content

Instantly share code, notes, and snippets.

View eliasah's full-sized avatar

Elie A. eliasah

View GitHub Profile
@eliasah
eliasah / SQLTransformerWithJoin.scala
Created June 17, 2020 12:37 — forked from MLnick/SQLTransformerWithJoin.scala
Using SQLTransformer to join DataFrames in ML Pipeline
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.0.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_77)
Type in expressions to have them evaluated.
Type :help for more information.

Keybase proof

I hereby claim:

  • I am eliasah on github.
  • I am eliasah (https://keybase.io/eliasah) on keybase.
  • I have a public key whose fingerprint is 1766 983D 67EA 3816 AF95 EB70 AEB8 3993 39D9 51E9

To claim this, I am signing this object:

@eliasah
eliasah / multiple-files-remove-prefix.md
Created September 27, 2019 14:35
Remove prefix from multiple files in Linux console

Bash

for file in prefix*; do mv "$file" "${file#prefix}"; done;

The for loop iterates over all files with the prefix. The do removes from all those files iterated over the prefix.

Here is an example to remove "bla_" form the following files:

bla_1.txt
bla_2.txt
@eliasah
eliasah / custom_s3_endpoint_in_spark.md
Created August 30, 2019 08:54 — forked from tobilg/custom_s3_endpoint_in_spark.md
Description on how to use a custom S3 endpoint (like Rados Gateway for Ceph)

Custom S3 endpoints with Spark

To be able to use custom endpoints with the latest Spark distribution, one needs to add an external package (hadoop-aws). Then, custum endpoints can be configured according to docs.

Use the hadoop-aws package

bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.2

SparkContext configuration

strip_glm <- function(cm) {
cm$y = c()
cm$model = c()
cm$residuals = c()
cm$fitted.values = c()
cm$effects = c()
cm$qr$qr = c()
cm$linear.predictors = c()
cm$weights = c()
@eliasah
eliasah / bootstrap-install-zeppelin-0.8-aws-linux.sh
Created November 21, 2018 14:16 — forked from vak/bootstrap-install-zeppelin-0.8-aws-linux.sh
Custom bootstrap script to install Zeppelin 0.8 on AWS EMR (tested on EMR 5.16.0)
#!/bin/bash -ex
# ATTENTION:
#
# 1. ensure you have about 1Gb on the storage of /usr/lib/ for the Zeppelin huge bundle chosen by default below,
# or choose a smaller bundle from Zeppelin web-site
#
# 2. adjust values of ZEPPELIN_NOTEBOOK_S3_BUCKET
# and ZEPPELIN_NOTEBOOK_S3_USER if you need S3-persistance of your Zeppelin Notebooks to your S3 bucket
# otherwize just remove all three last exports lines starting from 'export ZEPPELIN_NOTEBOOK_S'
@eliasah
eliasah / terminal-git-branch-name.md
Created October 2, 2018 08:28 — forked from joseluisq/terminal-git-branch-name.md
Add Git Branch Name to Terminal Prompt (Mac)

Add Git Branch Name to Terminal Prompt (Mac)

image

Open ~/.bash_profile in your favorite editor and add the following content to the bottom.

# Git branch in prompt.

parse_git_branch() {
@eliasah
eliasah / minikube.md
Last active May 23, 2018 08:54 — forked from codesword/minikube.md
Installing minikube using xhyve driver

Install docker-machine-driver-xhyve

docker-machine-driver-xhyve is a docker machine driver plugin for xhyve native OS X Hypervisor. xhyve is a lightweight OS X virtualization solution. In my opinion, it's a far better option than virtualbox for running minikube.

Brew

On MacOS sierra, download latest using

brew install docker-machine-driver-xhyve --HEAD
@eliasah
eliasah / job_submit.sh
Last active May 22, 2018 15:07
PySpark Hidden Rest API example - adapted from https://gist.github.com/arturmkrtchyan/5d8559b2911ac951d34a
#!/bin/bash
curl -X POST http://[spark-cluster-ip]:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action":"CreateSubmissionRequest",
"appArgs":[
"/home/eliasah/Desktop/spark_pi.py"
],
"appResource":"file:/home/eliasah/Desktop/spark_pi.py",
"clientSparkVersion":"2.2.1",
"environmentVariables":{
@eliasah
eliasah / installing_keras_tf.md
Last active October 8, 2018 11:52
Install Keras 2.0.5 on Tensorflow 1.2.1 backend with Anaconda

create conda env

conda create --name keras
source activate keras

installing utilities

conda install python=3.5 numpy scikit-learn=0.18.1 jupyter matplotlib pip
conda install pandas h5py pillow lxml