anjijava16/Hive_Best_practise.txt

## Hive_Best_practise.txt
•	DISTINCT and GROUP BY - Use only if it is necessary. Try to avoid it as it will degrade the performance.
•	PARTITION - Try to partition the table. Using the partition column in Filter will Improve the performance.
•	Rewrite - Do not use the same query as used in RDBMS. Rewrite the query completely to improve the performance.
•	Map Split Size - Try to reduce the Map Split Size. This will reduce the time taken by the query.
•	Map Join - Try to Map Join small tables so that joining it with large table will take less time.
•	Memory - Change the memory based on queries used.
•	Format ORC - Try to keep all the tables in ORC format which will improve the queries on that table.
•	Hive Execute parallel - For executing jobs in parallel.
•	CTAS - Try creating Managed tables instead of External tables.
•	Data Explosion - Try to fetch the filtered data set and join. Make sure that there is no cross join between large data set.
	• DISTINCT and GROUP BY - Use only if it is necessary. Try to avoid it as it will degrade the performance.
	• PARTITION - Try to partition the table. Using the partition column in Filter will Improve the performance.
	• Rewrite - Do not use the same query as used in RDBMS. Rewrite the query completely to improve the performance.
	• Map Split Size - Try to reduce the Map Split Size. This will reduce the time taken by the query.
	• Map Join - Try to Map Join small tables so that joining it with large table will take less time.
	• Memory - Change the memory based on queries used.
	• Format ORC - Try to keep all the tables in ORC format which will improve the queries on that table.
	• Hive Execute parallel - For executing jobs in parallel.
	• CTAS - Try creating Managed tables instead of External tables.
	• Data Explosion - Try to fetch the filtered data set and join. Make sure that there is no cross join between large data set.