Skip to content

Instantly share code, notes, and snippets.

@sorenmacbeth
sorenmacbeth / ambrose.clj
Last active August 29, 2015 13:55
cascalog ambrose integration
(defn ambrose?-
[& bindings]
(let [[name bindings] (flow/parse-exec-args bindings)
bindings (mapcat (partial apply normalize-sink-connection)
(partition 2 bindings))
flow (-> (apply compile-flow name bindings)
flow/flow-def
flow/compile-hadoop)
server (EmbeddedAmbroseCascadingNotifier.)]
(.addListener flow server)
#!/bin/sh
TMPCP=/tmp/badcp.txt
lein cp | tr ':' '\n' > $TMPCP
while read line; do
find "$line" -name "*.jar" -exec sh -c 'jar -tf {}| grep -H --label {} '$1'' \;
done < "$TMPCP"

Keybase proof

I hereby claim:

  • I am sorenmacbeth on github.
  • I am sorenmacbeth (https://keybase.io/sorenmacbeth) on keybase.
  • I have a public key whose fingerprint is 09DB D06A E0D1 1A0F 8E64 0E02 5819 BF09 B48F 7899

To claim this, I am signing this object:

#!/usr/bin/ruby
# Create display override file to force Mac OS X to use RGB mode for Display
# see http://embdev.net/topic/284710
require 'base64'
data=`ioreg -l -d0 -w 0 -r -c AppleDisplay`
edids=data.scan(/IODisplayEDID.*?<([a-z0-9]+)>/i).flatten
vendorids=data.scan(/DisplayVendorID.*?([0-9]+)/i).flatten

Tuning Storm+Trident

Tuning a dataflow system is easy:

The First Rule of Dataflow Tuning:
* Ensure each stage is always ready to accept records, and
* Deliver each processed record promptly to its destination
from redis import Redis
import simplejson
class Resque(object):
"""Dirt simple Resque client in Python. Can be used to create jobs."""
redis_server = 'localhost:6379'
def __init__(self):
host, port = self.redis_server.split(':')
self.redis = Redis(host=host, port=int(port))
class TwitterUser
def calculate_tunkrank(p=0.05)
self.followers.inject(0.0) do |sum, follower|
sum + ((1.0 + (p * follower.tunkrank_score)) / (1.0 + follower.num_friends))
end
end
end
SELECT 1.0 + SUM((1.0 + #{p} * tunkrank_score) / (1.0 + num_friends)) AS tunkrank_score
FROM twitter_users
INNER JOIN twitter_id_follows ON (twitter_users.twitter_id = twitter_id_follows.follower_twitter_id)
WHERE twitter_id_follows.user_twitter_id = #{twitter_id};
@sorenmacbeth
sorenmacbeth / gist:827971
Created February 15, 2011 18:37 — forked from michaelmontano/gist:535794
updated to whirr-0.3.0
diff -Naur whirr-0.3.0-incubating/contrib/python/src/py/hadoop/cloud/cli.py whirr-0.3.0-incubating-backtype/contrib/python/src/py/hadoop/cloud/cli.py
--- whirr-0.3.0-incubating/contrib/python/src/py/hadoop/cloud/cli.py 2011-01-15 23:03:44.000000000 -0800
+++ whirr-0.3.0-incubating-backtype/contrib/python/src/py/hadoop/cloud/cli.py 2011-02-15 11:51:49.000000000 -0800
@@ -296,7 +296,7 @@
opt.get('user_data_file'),
opt.get('availability_zone'), opt.get('user_packages'),
opt.get('auto_shutdown'), opt.get('env'),
- opt.get('security_group'))
+ opt.get('security_group'), opt.get('spot_price'))
service.launch_master(template, config_dir, opt.get('client_cidr'))
(ns gist.globhfs
(:import [cascading.tap GlobHfs]))
;; ### Bucket to Cluster
;;
;;; To get tuples back out of our directory structure on S3, we employ
;; Cascading's [GlobHFS] (http://goo.gl/1Vwdo) tap, along with an
;; interface tailored for datasets stored in the MODIS sinusoidal
;; projection. For details on the globbing syntax, see
;; [here](http://goo.gl/uIEzu).