Skip to content

Instantly share code, notes, and snippets.

View zoltanctoth's full-sized avatar

Zoltan C. Toth zoltanctoth

View GitHub Profile
@zoltanctoth
zoltanctoth / gist:5528402
Last active April 9, 2018 11:30
How to install twitter's elephant-bird on EMR
# Get a proper Maven
wget http://xenia.sote.hu/ftp/mirrors/www.apache.org/maven/maven-3/3.0.5/binaries/apache-maven-3.0.5-bin.tar.gz
tar xzf apache-maven-3.0.5-bin.tar.gz
export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH
echo 'export PATH=/home/hadoop/apache-maven-3.0.5/bin:$PATH' >> ~/.bash_profile
# Install a supported version of protobuf
sudo apt-get remove protobuf-compiler
wget https://protobuf.googlecode.com/files/protobuf-2.4.1.tar.gz
tar xzf protobuf-2.4.1.tar.gz
@zoltanctoth
zoltanctoth / OverwriteOutputDirTextOutputFormat.java
Created July 23, 2013 08:40
How to overwrite output files in a Java Mapreduce application
package com.prezi.hadoop;
import org.apache.hadoop.fs.FileAlreadyExistsException;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.mapreduce.JobContext;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import java.io.IOException;
/*
@zoltanctoth
zoltanctoth / ggplot2-demo.R
Last active January 5, 2016 05:02
Learn ggplot2 by example. This tutorial is especially useful and easy to follow if you went through Hadley Wickham's article on the Layered Grammar of Graphics. https://www.dropbox.com/s/enzoi6b5yfwpvhm/layered-grammar.pdf
library(ggplot2)
# Take a look at our example dataset
head(diamonds)
# Make a chart from scratch
x = ggplot() +
layer(
data = diamonds, mapping = aes(x=carat,y=price),
stat='identity', position="identity", geom="point"
@zoltanctoth
zoltanctoth / sparkR-RStudio-parallelize.R
Created September 1, 2015 12:44
Getting SparkR work in RStudio + a workaround for getting parallelize() work in SparkR
# Install Spark and SparkR
SPARK_INSTALL_DIR="/tmp/spark-1.5"
SNAPSHOT_NAME="spark-1.5.0-SNAPSHOT-bin-hadoop2.6"
if (Sys.getenv("SPARK_HOME") == ""){
if(!dir.exists(SPARK_INSTALL_DIR)){
dir.create(SPARK_INSTALL_DIR)
download.file(paste("http://people.apache.org/~pwendell/spark-nightly/spark-master-bin/latest/",SNAPSHOT_NAME,".tgz",sep=""),
paste(SPARK_INSTALL_DIR,"/",SNAPSHOT_NAME,".tgz",sep=""))
wd = getwd()
setwd(SPARK_INSTALL_DIR)
@zoltanctoth
zoltanctoth / pyspark-udf.py
Last active July 15, 2023 13:23
Writing an UDF for withColumn in PySpark
from pyspark.sql.types import StringType
from pyspark.sql.functions import udf
maturity_udf = udf(lambda age: "adult" if age >=18 else "child", StringType())
df = spark.createDataFrame([{'name': 'Alice', 'age': 1}])
df.withColumn("maturity", maturity_udf(df.age))
df.show()
@zoltanctoth
zoltanctoth / move-wordpress-to-different-domain.sh
Last active September 25, 2015 06:26
Moving wordpress to an other domain can be a hassle. Here is a script on how to do it in without the pain.
#!/bin/bash -xeu
# This script moves your wordrpress page under a different domain
# Zoltan C. Toth
export HISTCONTROL=ignorespace
ORIGIN_DOMAIN=teszt2.gyulahus.hu
TARGET_DOMAIN=teszt.gyulahus.hu
ORIGIN_DIR=/home/gyulahus/public_html/$ORIGIN_DOMAIN
TARGET_DIR=/home/gyulahus/public_html/$TARGET_DOMAIN
TARGET_DB=teszt2_gyh
@zoltanctoth
zoltanctoth / h2o-sparkling-water-deep-learning.scala
Created September 13, 2016 20:09
This is a Spark <-> H2O / Sparkling water deep learning prototype.
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.h2o.{H2OContext, H2OFrame}
import org.apache.spark.sql.DataFrame
import hex.deeplearning.DeepLearning
import water.app.SparkContextSupport
import hex.deeplearning.DeepLearningParameters
import hex.deeplearning.DeepLearningParameters.Activation
import org.apache.spark.h2o.{DoubleHolder, H2OContext, H2OFrame}
@zoltanctoth
zoltanctoth / spark-kafka.scala
Created February 6, 2017 20:09
How to use the Direct Kafka Source in Scala
object Anomymizer extends App {
val spark = SparkSession.builder
.master("local[3]")
.appName("Anonimizer")
.getOrCreate()
val salt = "SAALT"
def anonimizeStr(a:Any) = {
a match {
@zoltanctoth
zoltanctoth / spark-kafka.scala
Created February 6, 2017 20:09
How to use the Direct Kafka Source in Scala
object Anomymizer extends App {
val spark = SparkSession.builder
.master("local[3]")
.appName("Anonimizer")
.getOrCreate()
val salt = "SAALT"
def anonimizeStr(a:Any) = {
a match {
@zoltanctoth
zoltanctoth / spark-kafka.scala
Created February 6, 2017 20:09
How to use the Direct Kafka Source in Scala
object Anomymizer extends App {
val spark = SparkSession.builder
.master("local[3]")
.appName("Anonimizer")
.getOrCreate()
val salt = "SAALT"
def anonimizeStr(a:Any) = {
a match {