Skip to content

Instantly share code, notes, and snippets.

# Tmux Configuration
# By Alex Gaudio
# April 15, 2012
# HACKS
#######
#######
## Unfortunately, the osx system clipboard integration sucks. But there's a well documented workaround.
# OS X: pbcopy and pbpaste workaround from ChrisJohnsen
# code available at: https://github.com/ChrisJohnsen/tmux-MacOSX-pasteboard.git
@adgaudio
adgaudio / add_to_security_group.py
Last active December 15, 2015 05:19 — forked from robbyt/secgroup.py
Inspired by https://gist.github.com/robbyt/2493423 This StarCluster plugin grants all tcp, udp and icmp privileges for 10.0.0.0/8 between the current cluster's security group and the given security group, in both directions for cidr block 10.0.0.0/8. This would be particularly useful for using StarCluster within Amazon VPC.
"""
Based on https://gist.github.com/robbyt/2493423
This StarCluster plugin grants all tcp, udp and icmp privileges for 10.0.0.0/8
between the current cluster's security group and the given security group,
in both directions
"""
from starcluster.clustersetup import ClusterSetup
from starcluster.logger import log
@adgaudio
adgaudio / ssh_tunnel.py
Last active December 15, 2015 07:18
SSH tunnel through a gateway to another machine. I know there are plenty of implementations, but none I found just worked and returned a "localhost:port" string like this does. I've been using this successfully for several months with no problems. However, I have noticed that sometimes, --encrypted is required (this may have something to do with…
"""SSH tunnel through a gateway to another machine.
USAGE:
python ./ssh_tunnel.py -h
or
>>> import ssh_tunnel
>>> ssh_tunnel.main('gateway_username', 'dest_host_addr')
@adgaudio
adgaudio / starcluster_dns_update.py
Last active August 29, 2015 13:56
StarCluster Plugin: Configure Nodes with Dynamic DNS
"""A starcluster plugin that registers starcluster nodes in DNS
It assumes that nsupdate command can be run from the same machine where you run this plugin.
"""
from starcluster import clustersetup
import subprocess
from os.path import join
# You should configure these to your needs
DNS_ZONE = "example.com"
@adgaudio
adgaudio / sparkR_install_notes.sh
Created April 26, 2014 05:00
SparkR install notes
# My SparkR install notes. SparkR gives R access to Apache Spark.
#
# For details about SparkR, see their site:
# http://amplab-extras.github.io/SparkR-pkg/
#
# Author: Alex Gaudio <adgaudio@gmail.com>
#
# Note the aweful hack where I symlink libjvm.so to /usr/lib. I did that to get rJava installed.
@adgaudio
adgaudio / Demo - R Python Spark.ipynb
Last active August 29, 2015 14:01
Demo - R Python Spark.ipynb
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@adgaudio
adgaudio / spark_serialization_demo.py
Last active October 24, 2016 23:25
This gist demonstrates that spark 0.9.1 (and I'm guessing also 1.0.0) don't serialize a logger instance properly when code runs on workers
"""
This gist demonstrates that spark 1.0.0 and 0.9.1
don't serialize a logger instance properly when code runs on workers.
run this code via:
spark-submit spark_serialization_demo.py
- or -
pyspark spark_serialization_demo.py
"""
import pyspark
@adgaudio
adgaudio / distributed_percentile_algorithm.py
Last active August 29, 2015 14:05
Distributed Percentile and Distributed Median - a proof of concept and example
"""
This example demonstrates a distributed algorithm to identify the
percentile of a distributed data set.
Because this is a toy implementation, the data isn't actually
distributed across multiple machines.
"""
import numpy as np
// these double forward slashes are comments. you can write whatever you
// want in the text after the double slashes.
x = 10;
y = 20;
z = 5;
cube([x,y,z], center=true);
cube([x,y,z], center=false);
@adgaudio
adgaudio / DecayCounter.java
Last active December 19, 2017 20:16
A weighted counter that remembers the most frequent and recent pairs on a 2-color graph.
import java.util.HashMap;
/* A weighted counter that remembers most frequent and recent pairs on a 2-color graph, where:
* - any pair (a_i, b_i) contains elements a_i from set A and elements b_i are from set B. A and B are disjoint.
*
* This counter basically implements a recurrence relation to maintain scores for each pair:
* score = memory * prev_score + (1-memory) * (+/-)1
*
* "memory" is a value between 0 and 1 chooses how much history history to take into account.
*