You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ERROR [Thread-123]: compactor.Worker (Worker.java:run(191)) - Caught exception while trying to compact id:123,dbname:hive_acid,tableName:hive_acid_table,partName:hive_acid_part=part_name,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0. Marking failed to avoid repeated failures, java.io.IOException: Minor compactor job failed for Hadoop JobId:job_XXXXXX_XXXX at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)
Exception in mapreduce job
FATAL [IPC Server handler 11 on 12345] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_XXXXX - exited : org.apache.hadoop.fs.FileAlreadyExistsException: XXXXXXXXXXXXXXXXXX/base_00000XX/bucket_00000 for client already exists
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2811)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2698)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2582)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
Reason
during compaction hive trigger mapreduce job which create a base or delta file at TMP_LOCATION depending on the what compaction it is running.
ORC writing is memory intensive operation and situation become worse if you writing wide ORC table(too many columns). writing wide ORC require more
memory and small yarn container size are not good enough. In above scenario user is running mapreduce job with 2G container size which was not enough, as
memory start growing and reached beyond 2G of container, yarn physical memory checker kills the container which does not give the chance to container to
clean up TMP_LOCATION and subsequent task start failing with FileAlreadyExistsException.
Resoultion
try running the compaction with big yarn container size.
ALTER TABLE TABLENAME partition (PART_NAME='PART_VALUE') COMPACT 'MINOR' WITH OVERWRITE TBLPROPERTIES ('compactor.mapreduce.map.memory.mb'='4096')
Bonus
// run compaction job in debug mode
ALTER TABLE TABLENAME partition (PART_NAME='PART_VALUE') COMPACT 'MINOR' WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.log.level"="DEBUG","compactor.yarn.app.mapreduce.am.log.level"="DEBUG");