To be able to use custom endpoints with the latest Spark distribution, one needs to add an external package (hadoop-aws
). Then, custum endpoints can be configured according to docs.
bin/spark-shell --packages org.apache.hadoop:hadoop-aws:2.7.2
Add this to your application, or in the spark-shell
:
sc.hadoopConfiguration.set("fs.s3a.endpoint", "<<ENDPOINT>>");
sc.hadoopConfiguration.set("fs.s3a.access.key","<<ACCESS_KEY>>");
sc.hadoopConfiguration.set("fs.s3a.secret.key","<<SECRET_KEY>>");
If your endpoint doesn't support HTTPS, then you'll need the following:
sc.hadoopConfiguration.set("fs.s3a.connection.ssl.enabled", "false");
You can use s3a urls like this:
s3a://<<BUCKET>>/<<FOLDER>>/<<FILE>>
Also, it is possible to use the credentials in the path:
s3a://<<ACCESS_KEY>>:<<SECRET_KEY>>@<<BUCKET>>/<<FOLDER>>/<<FILE>>
I am getting this error when I am reading S3 local instance.
The confs I have set are -
val sparkConf = new SparkConf().setAppName("testing") .set("fs.s3a.endpoint", "http://127.0.0.1:9090") .set("fs.s3a.multipart.size", "104857600) .set("fs.s3a.connection.ssl.enabled", "false") .set("fs.s3a.access.key", "adfadf") .set("fs.s3a.secret.key", "qerqer")
java.lang.IllegalArgumentException
at java.base/java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:1293)
at java.base/java.util.concurrent.ThreadPoolExecutor.(ThreadPoolExecutor.java:1215)
at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:280)