Skip to content

Instantly share code, notes, and snippets.

@rajkrrsingh
Created June 27, 2018 23:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save rajkrrsingh/94a23b9468742c84f045d7fce4428469 to your computer and use it in GitHub Desktop.
Save rajkrrsingh/94a23b9468742c84f045d7fce4428469 to your computer and use it in GitHub Desktop.
hive compaction failing with FileAlreadyExistsException

ENV

HDP263

Exception

Client
ERROR [Thread-123]: compactor.Worker (Worker.java:run(191)) - Caught exception while trying to compact id:123,dbname:hive_acid,tableName:hive_acid_table,partName:hive_acid_part=part_name,state:^@,type:MAJOR,properties:null,runAs:null,tooManyAborts:false,highestTxnId:0.  Marking failed to avoid repeated failures,    java.io.IOException: Minor compactor job failed for Hadoop JobId:job_XXXXXX_XXXX     at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.launchCompactionJob(CompactorMR.java:314)
     at org.apache.hadoop.hive.ql.txn.compactor.CompactorMR.run(CompactorMR.java:269)
     at org.apache.hadoop.hive.ql.txn.compactor.Worker$1.run(Worker.java:175)
     at java.security.AccessController.doPrivileged(Native Method)
     at javax.security.auth.Subject.doAs(Subject.java:422)
     at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
     at org.apache.hadoop.hive.ql.txn.compactor.Worker.run(Worker.java:172)
     

Exception in mapreduce job

FATAL [IPC Server handler 11 on 12345] org.apache.hadoop.mapred.TaskAttemptListenerImpl: Task: attempt_XXXXX - exited : org.apache.hadoop.fs.FileAlreadyExistsException: XXXXXXXXXXXXXXXXXX/base_00000XX/bucket_00000 for client  already exists
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2811)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2698)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2582)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:736)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:409)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:640)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:982)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2351)
    at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2347)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2347)

    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)

Reason

during compaction hive trigger mapreduce job which create a base or delta file at TMP_LOCATION depending on the what compaction it is running.
ORC writing is memory intensive operation and situation become worse if you writing wide ORC table(too many columns). writing wide ORC require more
memory and small yarn container size are not good enough. In above scenario user is running mapreduce job with 2G container size which was not enough, as 
memory start growing and reached beyond 2G of container, yarn physical memory checker kills the container which does not give the chance to container to 
clean up TMP_LOCATION and subsequent task start failing with FileAlreadyExistsException.

Resoultion

try running the compaction with big yarn container size.
ALTER TABLE TABLENAME partition (PART_NAME='PART_VALUE') COMPACT 'MINOR' WITH OVERWRITE TBLPROPERTIES ('compactor.mapreduce.map.memory.mb'='4096') 

Bonus

// run compaction job in debug mode
ALTER TABLE TABLENAME partition (PART_NAME='PART_VALUE') COMPACT 'MINOR' WITH OVERWRITE TBLPROPERTIES ("compactor.mapreduce.map.log.level"="DEBUG","compactor.yarn.app.mapreduce.am.log.level"="DEBUG");
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment