Last active
January 31, 2023 15:48
-
-
Save vaskokj/621cdcc328f4bbbf4586e96a3968a16b to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
23/01/30 15:47:52 INFO MetricsSystemImpl: s3a-file-system metrics system started | |
23/01/30 15:47:53 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code. | |
23/01/30 15:47:53 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code. | |
23/01/30 15:47:53 WARN BasicProfileConfigLoader: Your profile name includes a 'profile ' prefix. This is considered part of the profile name in the Java SDK, so you will need to include this prefix in your profile name when you reference this profile from your Java code. | |
23/01/30 15:47:54 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=74a7918d-2031-4a4d-b4f5-aac2d7d523d9/commits.csv. | |
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=74a7918d-2031-4a4d-b4f5-aac2d7d523d9/commits.csv: com.amazonaws.services.s3.model.AmazonS3Exception: The provided token is malformed or otherwise invalid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidToken; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null), S3 Extended Request ID: <Redacted>:InvalidToken: The provided token is malformed or otherwise invalid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidToken; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null) | |
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243) | |
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3348) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.isDirectory(S3AFileSystem.java:4277) | |
at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:54) | |
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370) | |
at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:228) | |
at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:210) | |
at scala.Option.getOrElse(Option.scala:189) | |
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:210) | |
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:537) | |
at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:443) | |
at io.treeverse.clients.GarbageCollector.getCommitsDF(GarbageCollector.scala:95) | |
at io.treeverse.clients.GarbageCollector.getExpiredAddresses(GarbageCollector.scala:193) | |
at io.treeverse.clients.GarbageCollector$.markAddresses(GarbageCollector.scala:456) | |
at io.treeverse.clients.GarbageCollector$.main(GarbageCollector.scala:350) | |
at io.treeverse.clients.GarbageCollector.main(GarbageCollector.scala) | |
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) | |
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) | |
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) | |
at java.base/java.lang.reflect.Method.invoke(Method.java:566) | |
at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) | |
at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:958) | |
at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) | |
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) | |
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) | |
at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1046) | |
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1055) | |
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) | |
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: The provided token is malformed or otherwise invalid. (Service: Amazon S3; Status Code: 400; Error Code: InvalidToken; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null), S3 Extended Request ID: <Redacted> | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1828) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1412) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1374) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) | |
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) | |
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) | |
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227) | |
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173) | |
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5167) | |
at com.amazonaws.services.s3.AmazonS3Client.listObjectsV2(AmazonS3Client.java:963) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$listObjects$7(S3AFileSystem.java:2116) | |
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:499) | |
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412) | |
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.listObjects(S3AFileSystem.java:2107) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3322) | |
... 27 more | |
Exception in thread "main" org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on s3a://bucket/lakefs/projects/project/_lakefs/retention/gc/commits/run_id=74a7918d-2031-4a4d-b4f5-aac2d7d523d9/commits.csv: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null), S3 Extended Request ID: <Redacted>:400 Bad Request: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <Redacted>; S3 Extended Request ID: <Redacted>; Proxy: null) | |
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:243) | |
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:170) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3286) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3185) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3053) | |
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1760) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:4263) | |
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4(DataSource.scala:784) | |
at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$4$adapted(DataSource.scala:782) | |
at org.apache.spark.util.ThreadUtils$.$anonfun$parmap$2(ThreadUtils.scala:372) | |
at scala.concurrent.Future$.$anonfun$apply$1(Future.scala:659) | |
at scala.util.Success.$anonfun$map$1(Try.scala:255) | |
at scala.util.Success.map(Try.scala:213) | |
at scala.concurrent.Future.$anonfun$map$1(Future.scala:292) | |
at scala.concurrent.impl.Promise.liftedTree1$1(Promise.scala:33) | |
at scala.concurrent.impl.Promise.$anonfun$transform$1(Promise.scala:33) | |
at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:64) | |
at java.base/java.util.concurrent.ForkJoinTask$RunnableExecuteAction.exec(ForkJoinTask.java:1426) | |
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:290) | |
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1020) | |
at java.base/java.util.concurrent.ForkJoinPool.scan(ForkJoinPool.java:1656) | |
at java.base/java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1594) | |
at java.base/java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:183) | |
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: Bad Request (Service: Amazon S3; Status Code: 400; Error Code: 400 Bad Request; Request ID: <redacted>; S3 Extended Request ID: <redacted>; Proxy: null), S3 Extended Request ID: <redacted> | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleErrorResponse(AmazonHttpClient.java:1828) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleServiceErrorResponse(AmazonHttpClient.java:1412) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1374) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1145) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:802) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:770) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:744) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:704) | |
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:686) | |
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:550) | |
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:530) | |
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5227) | |
at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5173) | |
at com.amazonaws.services.s3.AmazonS3Client.getObjectMetadata(AmazonS3Client.java:1360) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getObjectMetadata$6(S3AFileSystem.java:2066) | |
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:412) | |
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:375) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2056) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.getObjectMetadata(S3AFileSystem.java:2032) | |
at org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:3273) | |
... 20 more | |
23/01/30 15:47:54 INFO SparkContext: Invoking stop() from shutdown hook | |
23/01/30 15:47:54 INFO SparkUI: Stopped Spark web UI at http://<redacted>:4040 | |
23/01/30 15:47:54 INFO StandaloneSchedulerBackend: Shutting down all executors | |
23/01/30 15:47:54 INFO CoarseGrainedSchedulerBackend$DriverEndpoint: Asking each executor to shut down | |
23/01/30 15:47:54 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! | |
23/01/30 15:47:54 INFO MemoryStore: MemoryStore cleared | |
23/01/30 15:47:54 INFO BlockManager: BlockManager stopped | |
23/01/30 15:47:54 INFO BlockManagerMaster: BlockManagerMaster stopped | |
23/01/30 15:47:54 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! | |
23/01/30 15:47:54 INFO SparkContext: Successfully stopped SparkContext | |
23/01/30 15:47:54 INFO ShutdownHookManager: Shutdown hook called | |
23/01/30 15:47:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-9a7dfb52-1574-4339-832b-07f8112e41bc | |
23/01/30 15:47:54 INFO ShutdownHookManager: Deleting directory /tmp/spark-f7821f0f-2528-42f7-a9a8-1ac55e362f8a | |
23/01/30 15:47:54 INFO MetricsSystemImpl: Stopping s3a-file-system metrics system... | |
23/01/30 15:47:54 INFO MetricsSystemImpl: s3a-file-system metrics system stopped. | |
23/01/30 15:47:54 INFO MetricsSystemImpl: s3a-file-system metrics system shutdown complete. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
./spark-3.3.1-bin-hadoop3/bin/spark-submit --class io.treeverse.clients.GarbageCollector \ | |
--packages org.apache.hadoop:hadoop-aws:3.3.1 \ | |
--master spark://localhost:7077 \ | |
-c spark.hadoop.lakefs.api.url=http://lakefs-appserver:8000/api/v1 \ | |
-c spark.hadoop.lakefs.api.access_key=<MyLakeFS Access Key> \ | |
-c spark.hadoop.lakefs.api.secret_key=<MyLakeFS Secret Key> \ | |
-c spark.hadoop.fs.s3a.access.key=<My AWS Console Access Key> \ | |
-c spark.hadoop.fs.s3a.secret.key=<My AWS Console secret key> \ | |
-c spark.hadoop.fs.s3a.session.token=<My AWS session token> \ | |
lakefs-spark-client-312-hadoop3-assembly-0.6.0.jar \ | |
project us-gov-west-1 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment