rajkrrsingh/Hive_Compaction_Failing.md

## Hive_Compaction_Failing.md

      
    Raw
  

              Hive_Compaction_Failing.md
            
          
    ENV

HDP263

Exception

Client

ERROR [Thread-123]: compactor.Worker (Worker.java:run(191)) - Caught exception while trying to compact id:123,dbname:hive_acid,tableName:hive_acid_table,partName:hive_acid_part=part_name,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.  Marking failed to avoid repeated failures,    java.io.IOException: Minor compactor job failed for Hadoop JobId:job_XXXXXX_XXXX     at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
     at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
     at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:422)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
     at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)
     

Exception in mapreduce job

FATAL [IPC Server handler 11 on 12345] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_XXXXX - exited : org.apache.hadoop.fs.FileAlreadyExistsException: XXXXXXXXXXXXXXXXXX/base_00000XX/bucket_00000 for client  already exists
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2811)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2698)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2582)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

Reason

during compaction hive trigger mapreduce job which create a base or delta file at TMP_LOCATION depending on the what compaction it is running.
ORC writing is memory intensive operation and situation become worse if you writing wide ORC table(too many columns). writing wide ORC require more
memory and small yarn container size are not good enough. In above scenario user is running mapreduce job with 2G container size which was not enough, as 
memory start growing and reached beyond 2G of container, yarn physical memory checker kills the container which does not give the chance to container to 
clean up TMP_LOCATION and subsequent task start failing with FileAlreadyExistsException.

Resoultion

try running the compaction with big yarn container size.
ALTER TABLE TABLENAME partition (PART_NAME='PART_VALUE') COMPACT 'MINOR' WITH OVERWRITE TBLPROPERTIES ('compactor.mapreduce.map.memory.mb'='4096') 

Bonus

// run compaction job in debug mode
ALTER TABLE TABLENAME partition (PART_NAME='PART_VALUE') COMPACT 'MINOR' WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.log.level"="DEBUG","compactor.yarn.app.mapreduce.am.log.level"="DEBUG");