Skip to content

Instantly share code, notes, and snippets.

@vaskokj
Last active January 31, 2023 15:48
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save vaskokj/621cdcc328f4bbbf4586e96a3968a16b to your computer and use it in GitHub Desktop.
Save vaskokj/621cdcc328f4bbbf4586e96a3968a16b to your computer and use it in GitHub Desktop.
23/01/30 15:47:52 INFO MetricsSystemImpl: s3a-file-system metrics system started
23/01/30 15:47:53 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code.
23/01/30 15:47:53 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code.
23/01/30 15:47:53 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code.
23/01/30 15:47:54 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=74a7918d-2031-4a4d-b4f5-aac2d7d523d9/commits.csv.
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=74a7918d-2031-4a4d-b4f5-aac2d7d523d9/commits.csv: com.amazonaws.services.s3.model.AmazonS3Exception: The provided token is malformed or otherwise invalid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidToken; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null), S3 Extended Request ID: <Redacted>:InvalidToken: The provided token is malformed or otherwise invalid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidToken; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3348)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277)
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370)
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228)
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210)
at scala.Option.getOrElse(Option.scala:189)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:537)
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:443)
at io.treeverse.clients.GarbageCollector.getCommitsDF(GarbageCollector.scala:95)
at io.treeverse.clients.GarbageCollector.getExpiredAddresses(GarbageCollector.scala:193)
at io.treeverse.clients.GarbageCollector$.markAddresses(GarbageCollector.scala:456)
at io.treeverse.clients.GarbageCollector$.main(GarbageCollector.scala:350)
at io.treeverse.clients.GarbageCollector.main(GarbageCollector.scala)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958)
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The provided token is malformed or otherwise invalid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidToken; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null), S3 Extended Request ID: <Redacted>
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1828)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1412)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1374)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5167)
at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:963)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$7(S3AFileSystem.java:2116)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2107)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3322)
... 27 more
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=74a7918d-2031-4a4d-b4f5-aac2d7d523d9/commits.csv: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null), S3 Extended Request ID: <Redacted>:400 Bad Request: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243)
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3286)
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760)
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:784)
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782)
at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372)
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659)
at scala.util.Success.$anonfun$map$1(Try.scala:255)
at scala.util.Success.map(Try.scala:213)
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292)
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33)
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33)
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64)
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020)
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656)
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594)
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <redacted>; S3 Extended Request ID: <redacted>; Proxy: null), S3 Extended Request ID: <redacted>
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1828)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1412)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1374)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227)
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173)
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1360)
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2066)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2056)
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2032)
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3273)
... 20 more
23/01/30 15:47:54 INFO SparkContext: Invoking stop() from shutdown hook
23/01/30 15:47:54 INFO SparkUI: Stopped Spark web UI at http://<redacted>:4040
23/01/30 15:47:54 INFO StandaloneSchedulerBackend: Shutting down all executors
23/01/30 15:47:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down
23/01/30 15:47:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
23/01/30 15:47:54 INFO MemoryStore: MemoryStore cleared
23/01/30 15:47:54 INFO BlockManager: BlockManager stopped
23/01/30 15:47:54 INFO BlockManagerMaster: BlockManagerMaster stopped
23/01/30 15:47:54 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
23/01/30 15:47:54 INFO SparkContext: Successfully stopped SparkContext
23/01/30 15:47:54 INFO ShutdownHookManager: Shutdown hook called
23/01/30 15:47:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-9a7dfb52-1574-4339-832b-07f8112e41bc
23/01/30 15:47:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-f7821f0f-2528-42f7-a9a8-1ac55e362f8a
23/01/30 15:47:54 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system...
23/01/30 15:47:54 INFO MetricsSystemImpl: s3a-file-system metrics system stopped.
23/01/30 15:47:54 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete.
./spark-3.3.1-bin-hadoop3/bin/spark-submit --class io.treeverse.clients.GarbageCollector \
--packages org.apache.hadoop:hadoop-aws:3.3.1 \
--master spark://localhost:7077 \
-c spark.hadoop.lakefs.api.url=http://lakefs-appserver:8000/api/v1 \
-c spark.hadoop.lakefs.api.access_key=<MyLakeFS Access Key> \
-c spark.hadoop.lakefs.api.secret_key=<MyLakeFS Secret Key> \
-c spark.hadoop.fs.s3a.access.key=<My AWS Console Access Key> \
-c spark.hadoop.fs.s3a.secret.key=<My AWS Console secret key> \
-c spark.hadoop.fs.s3a.session.token=<My AWS session token> \
lakefs-spark-client-312-hadoop3-assembly-0.6.0.jar \
project us-gov-west-1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment