samklr / Job.scala
Created Jun 27, 2017
Spark Streaming Websocket Receiver
val ssc = new StreamingContext("local", "datastream", Seconds(15))
// create InputDStream
// interact with stream
Created Apr 20, 2015
Install Protobuf debian ...
#! /bin/bash
tar xzf protobuf-2.6.1.tar.gz
cd protobuf-2.6.1
sudo apt-get update
sudo apt-get install build-essential
sudo ./configure
sudo make
sudo make check
sudo make install
brew install redis
ln -sfv /usr/local/opt/redis/*.plist ~/Library/LaunchAgents # Enable Redis autostart
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.redis.plist # load Redis server
# homebrew.mxcl.redis.plist contains reference to redis.conf file location: /usr/local/etc/redis.conf
redis-server /usr/local/etc/redis.conf # Start Redis server using configuration file, Ctrl+C to stop
Last active Mar 31, 2020
Offset Management on HBase
Save offsets for each batch into HBase
def saveOffsets(TOPIC_NAME:String,GROUP_ID:String,offsetRanges:Array[OffsetRange],
hbaseTableName:String,batchTime: org.apache.spark.streaming.Time) ={
val hbaseConf = HBaseConfiguration.create()
val conn = ConnectionFactory.createConnection(hbaseConf)
val table = conn.getTable(TableName.valueOf(hbaseTableName))
val rowKey = TOPIC_NAME + ":" + GROUP_ID + ":" +String.valueOf(batchTime.milliseconds)
Last active Mar 26, 2020
Install Ipython Notebook on a VM and Launch it as a server in a Cloud Platform. Here, in Google Compute Engine.
##### Install a lot of stuff first #####
$sudo apt-get update
##install python
$ wget
$ sudo bash
##install necessary libs
$ sudo apt-get install -y python-matplotlib python-tornado ipython ipython-notebook python-setuptools python-pip
Last active Jan 11, 2020
Ansible GCE setup
#! /bin/sh
### Must have Gcloud sdk installed and configured
###Create a micro instance as ansible master
gcloud compute --project $PROJECT_NAME instances create "ansible" --zone "us-central1-b" --machine-type "f1-micro" --network "default" --maintenance-policy "MIGRATE" --scopes "" "" "" --tags "http-server" "https-server" --no-boot-disk-auto-delete
###or a centos like in the tutorial
gcloud compute --project $PROJECT_NAME instances create "ansible-master" --zone "us-central1-b" --machine-type "g1-small" --network "default" --maintenance-policy "MIGRATE" --scopes "" "" "" --tags "http-server" "https-server" --image "" --no-boot-disk-auto-de
play.modules.enabled += "com.samklr.KamonModule"
kamon {
environment {
service = "my-svc"
jaeger {
Last active Dec 5, 2019
Setup HDFS on Mesos, Run Spark Cluster dispatcher via Marathon

Setup Mesos-DNS

Scripts for setting up

sudo mkdir /etc/mesos-dns
sudo vi /etc/mesos-dns/config.json
