Skip to content

Instantly share code, notes, and snippets.

View minyk's full-sized avatar
😀
Containerize! all the things!

Drake Youngkun Min minyk

😀
Containerize! all the things!
View GitHub Profile
@minyk
minyk / custom_s3_endpoint_in_spark.md
Last active December 28, 2021 08:03 — forked from tobilg/custom_s3_endpoint_in_spark.md
Description on how to use a custom S3 endpoint (like Rados Gateway for Ceph)

Custom S3 endpoints with Spark

To be able to use custom endpoints with the latest Spark distribution, one needs to add an external package (hadoop-aws). Then, custum endpoints can be configured according to docs.

Use the hadoop-aws package

bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.2

SparkContext configuration

@minyk
minyk / SparkUtils.scala
Last active September 16, 2015 12:36 — forked from ibuenros/SparkUtils.scala
Spark productionizing utilities developed by Ooyala, shown in Spark Summit 2014
//==================================================================
// SPARK INSTRUMENTATION
//==================================================================
import com.codahale.metrics.{MetricRegistry, Meter, Gauge}
import org.apache.spark.{SparkEnv, Accumulator}
import org.apache.spark.metrics.source.Source
import org.joda.time.DateTime
import scala.collection.mutable