- Pre-installed Hadoop
(1) Download hive tar:
wget http://archive.apache.org/dist/hive/h...
(2) Extract the tar file
tar -xzf apache-hive-3.1.2-bin.tar.gz
(3) Create Hive directories within HDFS. The directory ‘warehouse’ is the location to store the table or data related to hive.
hdfs dfs -mkdir -p /user/hive/warehouse
hdfs dfs -mkdir /tmp
(4) Set read/write permissions for table.
hdfs dfs -chmod g+w /user/hive/warehouse
hdfs dfs -chmod g+w /tmp
(5) Set Hadoop path in hive-env.sh and hive-config.sh
cd /apache-hive-3.1.2-bin/bin
sudo nano hive-env.sh
Add the following lineexport HADOOP_HOME=/home/hadoop/hadoop-3.2.2
Repeat the same in configcd /apache-hive-3.1.2-bin/conf/
sudo nano hive-env.sh.template
Add the following lines
export HADOOP_HOME=/home/hadoop/hadoop-3.2.2
HIVE_CONF_DIR="${HIVE_CONF_DIR:-$HIVE_HOME/conf}"
export HADOOP_HEAPSIZE=${HADOOP_HEAPSIZE:-256}
export HIVE_CONF_DIR=$HIVE_CONF_DIR
export HIVE_AUX_JARS_PATH=$HIVE_AUX_JARS_PATH
(6) By default, Hive uses Derby database. Initialize Derby database.
/home/hadoop/apache-hive-3.1.2-bin/bin/schematool -initSchema -dbType derby
This will throw an error cuz of a few file mismatches between Hadoop and Hive, we can fix that by doing the following
rm /home/hadoop/apache-hive-3.1.2-bin/lib/guava-19.0.jar
cp /home/hadoop/hadoop-3.2.2/share/hadoop/hdfs/lib/guava-27.0-jre.jar ~/home/hadoop/apache-hive-3.1.2-bin/lib/
Now Initialize the derby DB again
/home/hadoop/apache-hive-3.1.2-bin/bin/schematool -initSchema -dbType derby
(7) Start Hive
Add hive to path
sudo nano ~/.bashrc
export HIVE_HOME=/home/hadoop/apache-hive-3.1.2-bin
export PATH=$PATH:$HIVE_HOME/bin
source ~/.bashrc
hive