Skip to content

Instantly share code, notes, and snippets.
Change execution engine = Tez, spark ( set Tez/Spark client jars into HADOOP_CLASSPATH)
Partitioning - PARTITIONED BY clause is used to divide the table into buckets.
Buckting - CLUSTERED BY clause is used to divide the table into buckets.
Map-Side join, Bucket-Map-Side join, Sorted Bucket-Map-Side join
Usage of suitable file format = ORC(Optimized Row Columnar) file formate
Vectorization along with ORC