Skip to content

Instantly share code, notes, and snippets.

@ProtractorNinja
Created November 16, 2014 05:09
Show Gist options
  • Save ProtractorNinja/962fa73b757c48cb7cc7 to your computer and use it in GitHub Desktop.
Save ProtractorNinja/962fa73b757c48cb7cc7 to your computer and use it in GitHub Desktop.
Notes on Hive and Shark
# Notes on Hive and Shark
## Usage Instructions
1. Unpack .tar to home directory (not ready yet)
2. `qsub hive-and-shark.pbs`
3. `ssh <job root node`
4. `cd ~/hive-and-shark`
5. `source ./start-shark.sh` or `source ./start-hive.sh`
- Make sure to use `source`: don't just execute the script. `source` will add environment variables to your session.
## Files I had to edit
Not really mentioning steps outlined in [Running Shark on a Cluster](https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster).
1. `hive-and-shark.pbs`
2. `start-hive.sh` and `start-shark.sh`
3. `etc/core-site.xml` (fixed hadoop.tmp.dir value)
4. `spark-0.8.0/conf/spark-env.sh` (point to scala library)
5. `shark-0.8.0/conf/shark-env.sh` (env variables, etc)
6. `hive/conf/hive-env.sh` (point to configuration directory)
## Files that need to be updated per run
1. `spark-0.8.0/conf/spark-env.sh` (memory) (done with sed)
2. `shark-0.8.0/conf/shark-env.sh` (memory) (done with sed)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment