Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save hendrysuwanda/212973cb3929319acfac867672e57244 to your computer and use it in GitHub Desktop.
Save hendrysuwanda/212973cb3929319acfac867672e57244 to your computer and use it in GitHub Desktop.
Why this simple hive query consume huge yarn resource

This is not partition table

Query

select * from lyr1_raw.CI_CUSTMAST_HS;

Total Record

11.957.465

Format File

Avro

HDFS Size (total 102.3 MB)

  • part-m-00000.avro 11.8 MB
  • part-m-00001.avro 12.3 MB
  • part-m-00002.avro 13.0 MB
  • part-m-00003.avro 13.0 MB
  • part-m-00004.avro 13.1 MB
  • part-m-00005.avro 13.1 MB
  • part-m-00006.avro 13.0 MB
  • part-m-00007.avro 13.0 MB

when we run above query with below config:

  • mapreduce.map.memory.mb=8192
  • mapreduce.map.java.opts=-Djava.net.preferIPv4Stack=true -Xmx6442450944 #6G
  • mapreduce.reduce.memory.mb=16384
  • mapreduce.reduce.java.opts=-Djava.net.preferIPv4Stack=true -Xmx12884901888 #12G
  • yarn.scheduler.minimum-allocation-mb=2048
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment