Skip to content

Instantly share code, notes, and snippets.

sorenmacbeth / ambrose.clj
Last active August 29, 2015 13:55
cascalog ambrose integration
View ambrose.clj
(defn ambrose?-
[& bindings]
(let [[name bindings] (flow/parse-exec-args bindings)
bindings (mapcat (partial apply normalize-sink-connection)
(partition 2 bindings))
flow (-> (apply compile-flow name bindings)
server (EmbeddedAmbroseCascadingNotifier.)]
(.addListener flow server)
lein cp | tr ':' '\n' > $TMPCP
while read line; do
find "$line" -name "*.jar" -exec sh -c 'jar -tf {}| grep -H --label {} '$1'' \;
done < "$TMPCP"

Keybase proof

I hereby claim:

  • I am sorenmacbeth on github.
  • I am sorenmacbeth ( on keybase.
  • I have a public key whose fingerprint is 09DB D06A E0D1 1A0F 8E64 0E02 5819 BF09 B48F 7899

To claim this, I am signing this object:

View patch-edid.rb
# Create display override file to force Mac OS X to use RGB mode for Display
# see
require 'base64'
data=`ioreg -l -d0 -w 0 -r -c AppleDisplay`
View tuning_storm_trident.asciidoc

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
from redis import Redis
import simplejson
class Resque(object):
"""Dirt simple Resque client in Python. Can be used to create jobs."""
redis_server = 'localhost:6379'
def __init__(self):
host, port = self.redis_server.split(':')
self.redis = Redis(host=host, port=int(port))
View tunkrank.rb
class TwitterUser
def calculate_tunkrank(p=0.05)
self.followers.inject(0.0) do |sum, follower|
sum + ((1.0 + (p * follower.tunkrank_score)) / (1.0 + follower.num_friends))
View gist:416502
SELECT 1.0 + SUM((1.0 + #{p} * tunkrank_score) / (1.0 + num_friends)) AS tunkrank_score
FROM twitter_users
INNER JOIN twitter_id_follows ON (twitter_users.twitter_id = twitter_id_follows.follower_twitter_id)
WHERE twitter_id_follows.user_twitter_id = #{twitter_id};
sorenmacbeth / gist:827971
Created February 15, 2011 18:37 — forked from michaelmontano/gist:535794
updated to whirr-0.3.0
View gist:827971
diff -Naur whirr-0.3.0-incubating/contrib/python/src/py/hadoop/cloud/ whirr-0.3.0-incubating-backtype/contrib/python/src/py/hadoop/cloud/
--- whirr-0.3.0-incubating/contrib/python/src/py/hadoop/cloud/ 2011-01-15 23:03:44.000000000 -0800
+++ whirr-0.3.0-incubating-backtype/contrib/python/src/py/hadoop/cloud/ 2011-02-15 11:51:49.000000000 -0800
@@ -296,7 +296,7 @@
opt.get('availability_zone'), opt.get('user_packages'),
opt.get('auto_shutdown'), opt.get('env'),
- opt.get('security_group'))
+ opt.get('security_group'), opt.get('spot_price'))
service.launch_master(template, config_dir, opt.get('client_cidr'))
View globhfs.clj
(ns gist.globhfs
(:import [cascading.tap GlobHfs]))
;; ### Bucket to Cluster
;;; To get tuples back out of our directory structure on S3, we employ
;; Cascading's [GlobHFS] ( tap, along with an
;; interface tailored for datasets stored in the MODIS sinusoidal
;; projection. For details on the globbing syntax, see
;; [here](