Skip to content

Instantly share code, notes, and snippets.

View sumitsu's full-sized avatar

Branden Smith sumitsu

View GitHub Profile
@sumitsu
sumitsu / HadoopConfUseNonChunkedDefaultS3ClientFactory.scala
Created December 18, 2019 22:17
Use NonChunkedDefaultS3ClientFactory for Hadoop configuration "fs.s3a.s3.client.factory.impl"
hadoopConf.set("fs.s3a.s3.client.factory.impl", classOf[NonChunkedDefaultS3ClientFactory].getName)
@sumitsu
sumitsu / NonChunkedDefaultS3ClientFactory.java
Created December 18, 2019 22:09
Extension of S3A DefaultS3ClientFactory which disables chunked encoding for AmazonS3 instances it builds
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.S3ClientOptions;
import org.apache.hadoop.fs.s3a.DefaultS3ClientFactory;
import java.io.IOException;
import java.net.URI;
public class NonChunkedDefaultS3ClientFactory extends DefaultS3ClientFactory {
@Override
@sumitsu
sumitsu / SampleSparkAppS3Spec.scala
Last active December 14, 2019 23:29
Setting AWS S3A properties on Hadoop Configuration associated with SparkSession for local unit testing with mock S3 server
import org.apache.hadoop.fs.s3a.S3AFileSystem
private val hadoopConf = spark.sparkContext.hadoopConfiguration
hadoopConf.set("fs.s3.impl", classOf[S3AFileSystem].getName)
hadoopConf.set("fs.s3a.endpoint", "http://localhost:9999")
hadoopConf.set("fs.s3a.access.key", "abc")
hadoopConf.set("fs.s3a.secret.key", "xyz")
hadoopConf.set("fs.s3a.attempts.maximum", "3")
hadoopConf.set("fs.s3a.path.style.access", "true")
@sumitsu
sumitsu / SparkSpec.scala
Created December 14, 2019 23:13
Creating a SparkSession for unit testing
val spark: SparkSession = {
val sparkConf = new SparkConf()
.set("spark.driver.host", "127.0.0.1")
.setMaster("local[2]")
.setAppName("TestSparkApp")
val sparkSession = SparkSession.builder.config(sparkConf).getOrCreate
sparkSession.sql("set spark.sql.caseSensitive=true")
sparkSession
}
@sumitsu
sumitsu / AmazonS3TestUtil.scala
Created December 14, 2019 02:11
Creating a local-mock AmazonS3Client via AmazonS3ClientBuilder
val MockS3ServerPortEnvVar: String = "MOCK_SERVER_PORT"
val MockS3ServerPortDefault: Int = 9999
val AwsEndpointUriStr: String =
s"http://localhost:${Option(System.getenv(MockS3ServerPortEnvVar)).getOrElse(MockS3ServerPortDefault)}/"
val TestBucketName: String =
s"s3-mocktest-demo-${UUID.randomUUID.toString}"
val MockAWSAccessKey: String = "abc"
val MockAWSSecretKey: String = "zyx"
def buildLocalMockTestS3Client(): AmazonS3 = {