Skip to content

Instantly share code, notes, and snippets.

@vaskokj
Created January 31, 2023 17:06
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vaskokj/29904888ad20739bd95977da2fc1ce78 to your computer and use it in GitHub Desktop.
Save vaskokj/29904888ad20739bd95977da2fc1ce78 to your computer and use it in GitHub Desktop.
Spark Command: /usr/lib/jvm/java-1.11.0-openjdk-amd64/bin/java -cp spark-3.3.1-bin-hadoop3/conf/:spark-3.3.1-bin-hadoop3/jars/* -Dcom.amazonaws.services.s3.enableV4=true -Xmx1g org.apache.spark.deploy.worker.Worker --webui-port 8081 localhost:7077
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=cc8f0b1c-a563-48fe-981e-c3004e7e7bd6/commits.csv: com.amazonaws.services.s3.model.AmazonS3Exception: The authorization header is malformed; the region 'vpce' is wrong; expecting 'us-gov-west-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: <redacted>; S3 Extended Request ID: <redacted>; Proxy: null), S3 Extended Request ID: <redacted>=:AuthorizationHeaderMalformed: The authorization header is malformed; the region 'vpce' is wrong; expecting 'us-gov-west-1' (Service: Amazon S3; Status Code: 400; Error Code: AuthorizationHeaderMalformed; Request ID: <redacted>; S3 Extended Request ID: <redacted>=; Proxy: null)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3348)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:537)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:443)
at io.treeverse.clients.GarbageCollector.getCommitsDF(GarbageCollector.scala:95)
at io.treeverse.clients.GarbageCollector.getExpiredAddresses(GarbageCollector.scala:193)
at io.treeverse.clients.GarbageCollector$.markAddresses(GarbageCollector.scala:456)
at io.treeverse.clients.GarbageCollector$.main(GarbageCollector.scala:350)
at io.treeverse.clients.GarbageCollector.main(GarbageCollector.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
./spark-3.3.1-bin-hadoop3/bin/spark-submit --class io.treeverse.clients.GarbageCollector \
--packages org.apache.hadoop:hadoop-aws:3.3.1 \
--master spark://localhost:7077 \
-c spark.hadoop.lakefs.api.url=http://lakefs:8000/api/v1 \
-c spark.hadoop.lakefs.api.access_key=<lakeFS credentials> \
-c spark.hadoop.lakefs.api.secret_key=lakeFS credentials> \
-c spark.hadoop.fs.s3a.access.key=<AWS console credentials> \
-c spark.hadoop.fs.s3a.secret.key=<AWS console credentials> \
-c spark.hadoop.fs.s3a.session.token=AWS console credentials> \
-c spark.hadoop.fs.s3a.endpoint=http://vpce-<myvpcID>.s3.us-gov-west-1.vpce.amazonaws.com \
lakefs-spark-client-312-hadoop3-assembly-0.6.0.jar \
project us-gov-west-1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment