Marc Limotte mlimotte

## test_jobdef.clj
# Based on https://gist.github.com/gareth625/5d69cd883b3a154f0fa7
# Run it with `lemur run test_jobdef.clj`

(catch-args
[:run-step
 "Set as the name of the step"
 "lemur-is-awesome"])

(defcluster the-cluster
    :app "AnApp"

## GsonJson.scala
import com.google.gson.Gson
import scala.collection.JavaConversions

val gson = new Gson()

val mapPrototype = new java.util.HashMap[String,Any]()
def parseJson(json: String): Map[String,Any] = {
  // Note: mapAsScalaMap is a wrapper, the data is NOT copied
  scala.collection.JavaConversions.mapAsScalaMap(gson.fromJson(json, mapPrototype.getClass)).toMap
}

## MemoryJoin
package foo.cascalog;

import cascading.flow.FlowProcess;
import cascading.flow.hadoop.HadoopFlowProcess;
import cascading.operation.FunctionCall;
import cascading.operation.OperationCall;
import cascading.tuple.Tuple;
import cascading.tuple.TupleEntry;
import cascalog.CascalogFunction;
import org.apache.hadoop.conf.Configuration;

## ClojureFilterFP
/**
 * The majority of this class is copied form the Cascalog source (1.7.0-SNAPSHOT as of 9/17/2011).
 * This is a filter operation, where the FlowProcess object is exposed
 */

package com.weatherbill.hadoop;

import cascading.operation.Filter;
import cascading.operation.FilterCall;
import cascading.flow.FlowProcess;

## Streaming Lemur jobdef
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;;; Sample of a Jobdef for a Streaming job
;;;
;;; Example of common usage:
;;; lemur run strm-jobdef.clj --bucket my-bucket-name
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

(catch-args
  [:bucket "An s3 bucket, e.g. 'com.myco.bucket1'"]
  )

## keybase.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                mlimotte
                / keybase.md
            
            
              Created
              February 11, 2017 20:00
            
          
    Keybase proof

I hereby claim:

I am mlimotte on github.
I am mlimotte (https://keybase.io/mlimotte) on keybase.
I have a public key ASCPpX8cibderVDoBlFGbVy0_lZQQmxZKpKSBE4BzBNKqgo

To claim this, I am signing this object:

  
## merge-with-key.clj
(ns mlimotte.util)

; A variation on clojure.core/merge-with

(defn merge-with-key
  "Returns a map that consists of the rest of the maps conj-ed onto
  the first.  If a key occurs in more than one map, the mapping(s)
  from the latter (left-to-right) will be combined with the mapping in
  the result by calling (f key val-in-result val-in-latter)."
  [f & maps]

## s3-pusher.sh
#!/bin/bash -e

# 2010-09-19 Marc Limotte

# Run continuously (every 30 minutes) as a cron.
#
# Looks for directories in HDFS matching a certain pattern and moves them to S3, using Amazon's new
# distcp replacement, S3DistCp.
#
# It creates marker files (_directory_.done and _directory_.processing) at the S3 destination, so

## vault-aws.sh
#!/bin/bash

function vault-aws () {
  VAULT_PATH=$1
  if [ -z "$VAULT_PATH" ]; then
    echo "Missing VAULT_PATH argument.\nExample: `vault-aws documents-store`"
    exit 1
  fi
  if [ -z "$VAULT_ADDR" ]; then
    echo "Missing VAULT_ADDR env variable"

## aws_client_vpc_endpoint_setup_notes.md

      
              1 file
            
          
              1 fork
            
          
              1 comment
            
          
              2 stars
            
          
                mlimotte
                / aws_client_vpc_endpoint_setup_notes.md
            
            
              Last active
              June 15, 2022 02:54
            
              
                AWS Client VPN Endpoint Setup tips and checklist
              
          
    Overview

We have remote developers who occassionally need access to AWS servers QA and Staging databases (RDS mysql instances). The AWS servers (EC2, fargate) are in a private VPC. The RDS databases are in different VPCs, they have the "publicly accessible" attribute set, which means they get a pubilc DNS, but only a handful or IPs are whitelisted for that access; developers should get access over a VPN.
This is summarized as:
laptop --ClientVPN--> VPC _A_ --VPC Peer--> RDS in VPC _B_

I choose the Cliet VPN Endpoint so that AWS would manage the remote side of the tunnel.  I choose Viscosity (on a Mac) as our VPN client because it's easy to use and support split-dns and split-routing.  It's affordable, but not free.  Split DNS is important so that Amazon hostnames can be resolved to their internal IP addresses.  Split routing is important so that only the AWS destined traffic goes over the VPC tunnel and other internet traffic can go direct to internet.
	# Based on https://gist.github.com/gareth625/5d69cd883b3a154f0fa7
	# Run it with `lemur run test_jobdef.clj`

	(catch-args
	[:run-step
	"Set as the name of the step"
	"lemur-is-awesome"])

	(defcluster the-cluster
	:app "AnApp"
	import com.google.gson.Gson
	import scala.collection.JavaConversions

	val gson = new Gson()

	val mapPrototype = new java.util.HashMap[String,Any]()
	def parseJson(json: String): Map[String,Any] = {
	// Note: mapAsScalaMap is a wrapper, the data is NOT copied
	scala.collection.JavaConversions.mapAsScalaMap(gson.fromJson(json, mapPrototype.getClass)).toMap
	}
	package foo.cascalog;

	import cascading.flow.FlowProcess;
	import cascading.flow.hadoop.HadoopFlowProcess;
	import cascading.operation.FunctionCall;
	import cascading.operation.OperationCall;
	import cascading.tuple.Tuple;
	import cascading.tuple.TupleEntry;
	import cascalog.CascalogFunction;
	import org.apache.hadoop.conf.Configuration;
	/**
	* The majority of this class is copied form the Cascalog source (1.7.0-SNAPSHOT as of 9/17/2011).
	* This is a filter operation, where the FlowProcess object is exposed
	*/

	package com.weatherbill.hadoop;

	import cascading.operation.Filter;
	import cascading.operation.FilterCall;
	import cascading.flow.FlowProcess;
	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
	;;; Sample of a Jobdef for a Streaming job
	;;;
	;;; Example of common usage:
	;;; lemur run strm-jobdef.clj --bucket my-bucket-name
	;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

	(catch-args
	[:bucket "An s3 bucket, e.g. 'com.myco.bucket1'"]
	)
	(ns mlimotte.util)

	; A variation on clojure.core/merge-with

	(defn merge-with-key
	"Returns a map that consists of the rest of the maps conj-ed onto
	the first. If a key occurs in more than one map, the mapping(s)
	from the latter (left-to-right) will be combined with the mapping in
	the result by calling (f key val-in-result val-in-latter)."
	[f & maps]
	#!/bin/bash -e

	# 2010-09-19 Marc Limotte

	# Run continuously (every 30 minutes) as a cron.
	#
	# Looks for directories in HDFS matching a certain pattern and moves them to S3, using Amazon's new
	# distcp replacement, S3DistCp.
	#
	# It creates marker files (_directory_.done and _directory_.processing) at the S3 destination, so
	#!/bin/bash

	function vault-aws () {
	VAULT_PATH=$1
	if [ -z "$VAULT_PATH" ]; then
	echo "Missing VAULT_PATH argument.\nExample: `vault-aws documents-store`"
	exit 1
	fi
	if [ -z "$VAULT_ADDR" ]; then
	echo "Missing VAULT_ADDR env variable"