Apache SystemML Committer Git Guide

SystemML Git Guide

Setup Git repo locally

  • Fork Apache SystemML to your personal GitHub account by browsing to [] and clicking "Fork".
  • Clone your personal GitHub fork of Apache SystemML:
    • git clone // assuming the use of SSH keys with GitHub
  • Add GitHub (read-only mirror) and Apache-owned (committer writeable) Git repositories as remotes:
    • cd incubator-systemml
    • git remote add apache-github
  • git remote add apache
dusenberrymw / custom.css
Last active February 13, 2019 14:09
Jupyter Solarized Dark Custom CSS (~/.jupyter/custom/custom.css)
Name: Base16 Solarized Dark
Author: Ethan Schoonover (
CodeMirror template adapted for IPython Notebook by Nikhil Sonnad (
CodeMirror template by Jan T. Sott (
Original Base16 color scheme by Chris Kempson (
{"paragraphs":[{"text":"%md\n## Quick Setup","dateUpdated":"2016-08-01T12:09:19-0700","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled":true,"editorMode":"ace/mode/markdown","editorHide":true},"settings":{"params":{},"forms":{}},"jobName":"paragraph_1470078505348_-349631352","id":"20160801-120825_354862280","result":{"code":"SUCCESS","type":"HTML","msg":"<h2>Quick Setup</h2>\n"},"dateCreated":"2016-08-01T12:08:25-0700","dateStarted":"2016-08-01T12:09:17-0700","dateFinished":"2016-08-01T12:09:17-0700","status":"FINISHED","progressUpdateIntervalMs":500,"$$hashKey":"object:1214"},{"text":"import org.apache.sysml.api.mlcontext._\nimport org.apache.sysml.api.mlcontext.ScriptFactory._\n\n// Create a SystemML MLContext object\nval ml = new MLContext(sc)","dateUpdated":"2016-08-01T12:12:09-0700","config":{"colWidth":12,"graph":{"mode":"table","height":300,"optionOpen":false,"keys":[],"values":[],"groups":[],"scatter":{}},"enabled"
# Magics
%matplotlib inline
%load_ext autoreload
%autoreload 2
# Imports
import math
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
dusenberrymw / notebook.json
Created September 5, 2016 23:23
Jupyter 2 space indent (~/.jupyter/nbconfig/notebook.json)
"CodeCell": {
"cm_config": {
"indentUnit": 2
dusenberrymw /
Last active October 11, 2018 11:34
Install OpenBLAS for use with Spark.
  • Install OpenBLAS
    • yum install libgfortran
    • Note: OpenBLAS contains both BLAS and LAPACK routines. So in the end, we can create softlinks for the standard BLAS and LAPACK shared library locations.
    • Download from Github:
    • make
    • make install
    • Create links:
      ln -sf /opt/OpenBLAS/lib/ /opt/OpenBLAS/lib/
      ln -sf /opt/OpenBLAS/lib/ /opt/OpenBLAS/lib/
dusenberrymw /
Last active June 28, 2024 12:37
Tips and tricks for Apache Spark.

Spark Tips & Tricks

Misc. Tips & Tricks

  • If values are integers in [0, 255], Parquet will automatically compress to use 1 byte unsigned integers, thus decreasing the size of saved DataFrame by a factor of 8.
  • Partition DataFrames to have evenly-distributed, ~128MB partition sizes (empirical finding). Always err on the higher side w.r.t. number of partitions.
  Pay particular attention to the number of partitions when using flatMap, especially if the following operation will result in high memory usage. The flatMap op usually results in a DataFrame with a [much] larger number of rows, yet the number of partitions will remain the same. Thus, if a subsequent op causes a large expansion of memory usage (i.e. converting a DataFrame of indices to a DataFrame of large Vectors), the memory usage per partition may become too high. In this case, it is beneficial to repartition the output of flatMap to a number of partitions that will safely allow for appropriate partition memory sizes, based upon the
dusenberrymw / proxy.pac
Last active February 7, 2024 23:50
Proxy PAC file template for selective SSH SOCKS proxies, plus a [re]installation script.
// Proxy PAC File
// - Used to redirect certain addresses to the server through the SOCKS ssh port (1280 for this file), i.e.
// tunnel traffic through server.
// - Useful for easily accessing webpages from services running on a server (Jupyter notebooks, TensorBoard, Spark UI, etc.)
// that is otherwise locked down by a firewall.
// - To install on OS X/MacOS, go to "Settings->Network->Advanced->Proxies->Automatic Proxy Configuration"
// and paste the local file url (`file:///absolute/path/to/proxy.pac`).
// - Alternatively, use `./`.
// - SSH to the server with `ssh -D 1280 ....`.
function FindProxyForURL(url, host) {
dusenberrymw /
Last active January 5, 2017 22:47
Setup new hard drives for HDFS.
# sign in as root: `sudo -i -u root`
# use `df` to see disk usage
# use `lsblk` to see raw disks
parted -s /dev/sdb mklabel gpt;parted -s /dev/sdc mklabel gpt;parted -s /dev/sdd mklabel gpt;parted -s /dev/sde mklabel gpt;parted -s /dev/sdf mklabel gpt;parted -s /dev/sdg mklabel gpt;parted -s /dev/sdh mklabel gpt;parted -s /dev/sdi mklabel gpt;parted -s /dev/sdj mklabel gpt;parted -s /dev/sdk mklabel gpt;parted -s /dev/sdl mklabel gpt;parted -s /dev/sdm mklabel gpt
parted -s /dev/sdb mkpart primary 1 -- -1;parted -s /dev/sdc mkpart primary 1 -- -1;parted -s /dev/sdd mkpart primary 1 -- -1;parted -s /dev/sde mkpart primary 1 -- -1;parted -s /dev/sdf mkpart primary 1 -- -1;parted -s /dev/sdg mkpart primary 1 -- -1;parted -s /dev/sdh mkpart primary 1 -- -1;parted -s /dev/sdi mkpart primary 1 -- -1;parted -s /dev/sdj mkpart primary 1 -- -1;parted -s /dev/sdk mkpart primary 1 -- -1;parted -s /dev/sdl mkpart primary 1 -- -1;parted -s /dev/sdm mkpart primary 1 -- -1
mkdir /disk1;mkdir /disk2;mkdir /disk3;mkdir /disk4
dusenberrymw /
Last active January 4, 2017 21:49
Keep OS X / macOS from sleeping, even while locked.
#!/usr/bin/env bash
caffeinate -di