Skip to content

Instantly share code, notes, and snippets.

@kzhangkzhang
Last active November 6, 2018 03:24
Show Gist options
  • Save kzhangkzhang/7c33f779452348222561a31bd6faabcd to your computer and use it in GitHub Desktop.
Save kzhangkzhang/7c33f779452348222561a31bd6faabcd to your computer and use it in GitHub Desktop.
common Hive setting cheat sheet

Common Hiving Setting group by category

Dynamic Partition

Enable/Disable dynmaic partition inserts

  • hive.exec.dyanamic.partition=true

    ==> whether or not to allow dynamic partitions in DML/DDL

Use strict mode when in doubt

  • hive.exec.dynamic.partition.mode=strict

    ==> In strict mode, the user must specify at least one static partition in case the user accidentally overwrites all partitions. ==> In nonstrict mode all partitions are allowed to be dynamic.

Default maximum dynamic partitions = 1000

  • hive.exec.max.dynamic.partitions

    ==> Maximum number of dynamic partitions allowed to be created in total.

  • hive.exec.max.dynamic.partitions.pernode

    ==> Maximum number of dynamic partitions allowed to be created in each mapper/reducer node

Increase max number of files a data node can service in (hdfs-site.xml)

  • dfs.datanode.max.xcievers=4096

Hive Join Configuration

Map Join

Map join is a Hive feature that is used to speed up Hive queries. It lets a table to be loaded into memory so that a join could be performed within a mapper without using a Map/Reduce step. If queries frequently depend on small table joins, using map joins speed up queries’ execution. Map join is a type of join where a smaller table is loaded in memory and the join is done in the map phase of the MapReduce job. As no reducers are necessary, map joins are way faster than the regular joins.

Setting Description
hive.auto.convert.join=true When it is enabled, during joins, when a table with a size less than 25 MB (hive.mapjoin.smalltable.filesize) is found, the joins are converted to map-based joins.
hive.auto.convert.join.noconditionaltask=true hive.auto.convert.join.noconditionaltask.size=10000; When three or more tables are involved in the join condition. Using hive.auto.convert.join, Hive generates three or more map-side joins with an assumption that all tables are of smaller size. Using hive.auto.convert.join.noconditionaltask, you can combine three or more map-side joins into a single map-side join if size of n-1 table is less than 10 MB. (This rule is defined by hive.auto.convert.join.noconditionaltask.size.)
@kzhangkzhang
Copy link
Author

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment