Skip to content

Instantly share code, notes, and snippets.

@thanoojgithub
thanoojgithub / hiveQueryOptimizationTechniques.txt
Last active October 28, 2023 11:52
hive query optimization techniques
https://github.com/Thomas-George-T/Movies-Analytics-in-Spark-and-Scala
Change execution engine = Tez, spark ( set Tez/Spark client jars into HADOOP_CLASSPATH)
Partitioning - PARTITIONED BY clause is used to divide the table into buckets.
Buckting - CLUSTERED BY clause is used to divide the table into buckets.
Map-Side join, Bucket-Map-Side join, Sorted Bucket-Map-Side join
Usage of suitable file format = ORC(Optimized Row Columnar) file formate
Indexing
Vectorization along with ORC
CBO
@fabioperrella
fabioperrella / mongos
Created March 19, 2012 12:12
MongoDB init script for Mongos in Debian
#!/bin/sh
#
# init.d script with LSB support.
#
# Copyright (c) 2007 Javier Fernandez-Sanguino <jfs@debian.org>
#
# This is free software; you may redistribute it and/or modify
# it under the terms of the GNU General Public License as
# published by the Free Software Foundation; either version 2,
# or (at your option) any later version.