Skip to content

Instantly share code, notes, and snippets.

View dusenberrymw's full-sized avatar

Mike Dusenberry dusenberrymw

View GitHub Profile
@dusenberrymw
dusenberrymw / systemml_committer_guide.md
Last active January 9, 2017 22:13
Apache SystemML Committer Git Guide

SystemML Git Guide

Setup Git repo locally

  • Fork Apache SystemML to your personal GitHub account by browsing to [https://github.com/apache/incubator-systemml] and clicking "Fork".
  • Clone your personal GitHub fork of Apache SystemML:
    • git clone git@github.com:USERNAME/incubator-systemml.git // assuming the use of SSH keys with GitHub
  • Add GitHub (read-only mirror) and Apache-owned (committer writeable) Git repositories as remotes:
    • cd incubator-systemml
    • git remote add apache-github https://github.com/apache/incubator-systemml.git
  • git remote add apache https://git-wip-us.apache.org/repos/asf/incubator-systemml.git
@dusenberrymw
dusenberrymw / custom.css
Last active February 13, 2019 14:09
Jupyter Solarized Dark Custom CSS (~/.jupyter/custom/custom.css)
/*
Name: Base16 Solarized Dark
Author: Ethan Schoonover (http://ethanschoonover.com/solarized)
CodeMirror template adapted for IPython Notebook by Nikhil Sonnad (https://github.com/nsonnad/base16-ipython-notebook)
CodeMirror template by Jan T. Sott (https://github.com/idleberg/base16-chrome-devtools)
Original Base16 color scheme by Chris Kempson (https://github.com/chriskempson/base16)
*/
{"paragraphs":[{"text":"%md\n## Quick Setup","dateUpdated":"2016-08-01T12:09:19-0700","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true,"editorMode":"ace/mode/markdown","editorHide":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1470078505348_-349631352","id":"20160801-120825_354862280","result":{"code":"SUCCESS","type":"HTML","msg":"<h2>Quick Setup</h2>\n"},"dateCreated":"2016-08-01T12:08:25-0700","dateStarted":"2016-08-01T12:09:17-0700","dateFinished":"2016-08-01T12:09:17-0700","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:1214"},{"text":"import org.apache.sysml.api.mlcontext._\nimport org.apache.sysml.api.mlcontext.ScriptFactory._\n\n// Create a SystemML MLContext object\nval ml = new MLContext(sc)","dateUpdated":"2016-08-01T12:12:09-0700","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled"
# Magics
%matplotlib inline
%load_ext autoreload
%autoreload 2
# Imports
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
@dusenberrymw
dusenberrymw / notebook.json
Created September 5, 2016 23:23
Jupyter 2 space indent (~/.jupyter/nbconfig/notebook.json)
{
"CodeCell": {
"cm_config": {
"indentUnit": 2
}
}
}
@dusenberrymw
dusenberrymw / install_OpenBLAS_for_Spark.md
Last active October 11, 2018 11:34
Install OpenBLAS for use with Spark.
  • Install OpenBLAS
    • yum install libgfortran
    • Note: OpenBLAS contains both BLAS and LAPACK routines. So in the end, we can create softlinks for the standard BLAS and LAPACK shared library locations.
    • Download from Github: https://github.com/xianyi/OpenBLAS
    • make
    • make install
    • Create links:
      ln -sf /opt/OpenBLAS/lib/libopenblas.so /opt/OpenBLAS/lib/libblas.so
      ln -sf /opt/OpenBLAS/lib/libopenblas.so /opt/OpenBLAS/lib/libblas.so.3
      
@dusenberrymw
dusenberrymw / spark_tips_and_tricks.md
Last active February 8, 2023 05:11
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  • Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
@dusenberrymw
dusenberrymw / proxy.pac
Last active February 7, 2024 23:50
Proxy PAC file template for selective SSH SOCKS proxies, plus a [re]installation script.
// Proxy PAC File
// - Used to redirect certain addresses to the server through the SOCKS ssh port (1280 for this file), i.e.
// tunnel traffic through server.
// - Useful for easily accessing webpages from services running on a server (Jupyter notebooks, TensorBoard, Spark UI, etc.)
// that is otherwise locked down by a firewall.
// - To install on OS X/MacOS, go to "Settings->Network->Advanced->Proxies->Automatic Proxy Configuration"
// and paste the local file url (`file:///absolute/path/to/proxy.pac`).
// - Alternatively, use `./reinstall_proxy.sh`.
// - SSH to the server with `ssh -D 1280 ....`.
function FindProxyForURL(url, host) {
@dusenberrymw
dusenberrymw / setup_drives.sh
Last active January 5, 2017 22:47
Setup new hard drives for HDFS.
# sign in as root: `sudo -i -u root`
# use `df` to see disk usage
# use `lsblk` to see raw disks
parted -s /dev/sdb mklabel gpt;parted -s /dev/sdc mklabel gpt;parted -s /dev/sdd mklabel gpt;parted -s /dev/sde mklabel gpt;parted -s /dev/sdf mklabel gpt;parted -s /dev/sdg mklabel gpt;parted -s /dev/sdh mklabel gpt;parted -s /dev/sdi mklabel gpt;parted -s /dev/sdj mklabel gpt;parted -s /dev/sdk mklabel gpt;parted -s /dev/sdl mklabel gpt;parted -s /dev/sdm mklabel gpt
parted -s /dev/sdb mkpart primary 1 -- -1;parted -s /dev/sdc mkpart primary 1 -- -1;parted -s /dev/sdd mkpart primary 1 -- -1;parted -s /dev/sde mkpart primary 1 -- -1;parted -s /dev/sdf mkpart primary 1 -- -1;parted -s /dev/sdg mkpart primary 1 -- -1;parted -s /dev/sdh mkpart primary 1 -- -1;parted -s /dev/sdi mkpart primary 1 -- -1;parted -s /dev/sdj mkpart primary 1 -- -1;parted -s /dev/sdk mkpart primary 1 -- -1;parted -s /dev/sdl mkpart primary 1 -- -1;parted -s /dev/sdm mkpart primary 1 -- -1
mkdir /disk1;mkdir /disk2;mkdir /disk3;mkdir /disk4
@dusenberrymw
dusenberrymw / caffeinate.sh
Last active January 4, 2017 21:49
Keep OS X / macOS from sleeping, even while locked.
#!/usr/bin/env bash
caffeinate -di