Skip to content

Instantly share code, notes, and snippets.

@bernhardschaefer
bernhardschaefer / spark-submit-streaming-yarn.sh
Last active March 21, 2022 05:04
spark-submit template for running Spark Streaming on YARN (referenced in https://www.inovex.de/blog/247-spark-streaming-on-yarn-in-production/)
#!/bin/bash
# Minimum TODOs on a per job basis:
# 1. define name, application jar path, main class, queue and log4j-yarn.properties path
# 2. remove properties not applicable to your Spark version (Spark 1.x vs. Spark 2.x)
# 3. tweak num_executors, executor_memory (+ overhead), and backpressure settings
# the two most important settings:
num_executors=6
executor_memory=3g
#!/bin/sh
BOOT2DOCKER_CERTS_DIR=/var/lib/boot2docker/certs
CERTS_DIR=/etc/ssl/certs
CAFILE=${CERTS_DIR}/ca-certificates.crt
for cert in $(/bin/ls -1 ${BOOT2DOCKER_CERTS_DIR}); do
SRC_CERT_FILE=${BOOT2DOCKER_CERTS_DIR}/${cert}
CERT_FILE=${CERTS_DIR}/${cert}
HASH_FILE=${CERTS_DIR}/$(/usr/local/bin/openssl x509 -noout -hash -in ${SRC_CERT_FILE} 2>/dev/null)
@samklr
samklr / DotProduct.scala
Created August 30, 2013 21:21
DotProduct matrix in scala and on spark
def dotProduct(vector: Array[Int], matrix: Array[Array[Int]]): Array[Int] = {
// ignore dimensionality checks for simplicity of example
(0 to (matrix(0).size - 1)).toArray.map( colIdx => {
val colVec: Array[Int] = matrix.map( rowVec => rowVec(colIdx) )
val elemWiseProd: Array[Int] = (vector zip colVec).map( entryTuple => entryTuple._1 * entryTuple._2 )
elemWiseProd.sum
} )
}

Setting up Flume NG, listening to syslog over UDP, with an S3 Sink

My goal was to set up Flume on my web instances, and write all events into s3, so I could easily use other tools like Amazon Elastic Map Reduce, and Amazon Red Shift.

I didn't want to have to deal with log rotation myself, so I setup Flume to read from a syslog UDP source. In this case, Flume NG acts as a syslog server, so as long as Flume is running, my web application can simply write to it in syslog format on the specified port. Most languages have plugins for this.

At the time of this writing, I've been able to get Flume NG up and running on 3 ec2 instances, and all writing to the same bucket.

Install Flume NG on instances

@drewolson
drewolson / reflection.go
Last active November 20, 2023 09:39
Golang Reflection Example
package main
import (
"fmt"
"reflect"
)
type Foo struct {
FirstName string `tag_name:"tag 1"`
LastName string `tag_name:"tag 2"`
@benelog
benelog / README.md
Created October 19, 2012 16:49
Open API client 개발
@keesun
keesun / AsyncServlet.java
Created January 16, 2012 16:32
Servlet 3.0's Asynchronous Support sample
package me.whiteship.board.modules.servlet.async;
import javax.servlet.AsyncContext;
import javax.servlet.AsyncEvent;
import javax.servlet.AsyncListener;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
@spudtrooper
spudtrooper / jar2java
Created October 31, 2011 13:45
Generates java source from a jar file (e.g. for use in javadoc)
#!/bin/sh
#
# Transforms a jar file into source and outputs source files to 'src'.
# Example:
#
# jar2java foo.jar ; --> source to 'src'
#
# After you can run javadoc on these files, e.g.
#
# javadoc -classpath foo.jar -d api `find src -name "*.java"`