Cloudera's QuickStart VM Setup
- If needed, see Cloudera's "Setting up the QuickStart VM"
- Check CentOS distribution
$ cat /etc/redhat-release
- Determine the IP address of the VM (make sure the network is configured on VirtualBox first; see "Network" below)
$ ifconfig
Assumes that the VM is accessible from the local setup via SSH. Configure this on VirtualBox under "Network" -- "Attached To" -- "Host-only Adapter". Ref. on VirtualBox's networking.
- Create the public-private pair for access the server using
ssh-keygen
in the directory you specify (typically, this will be within~/.ssh/
; see the next step). - Create (or append to) an SSH config file --useful for managing multiple public-private key pairs for multiple services (ref. 1, ref. 2).
$ touch ~/.ssh/config
Append the following to the file, detailing the path of the relevant public-private pair. Use ssh nyubigdata
instead of ssh cloudera@192.168.99.100
.
host nyubigdata
HostName 192.168.99.100 # nyubigdata's IP address
User cloudera
IdentityFile ~/.ssh/path/to/pair/id_rsa
- Check the permissions on the keys using
ls -l
. Typically, they are as follows.
-rw------- 1 mamigot staff 3326 Aug 25 18:07 id_rsa
-rw-r--r-- 1 mamigot staff 749 Aug 25 18:07 id_rsa.pub
- Copy the new public key to Cloudera's VM.
$ ssh-copy-id -i ~/.ssh/nyu_big_data_vbox/id_rsa cloudera@192.168.99.100
- Permissions (ref. 1)
$ chmod g-w /home/cloudera/
$ chmod 700 ~/.ssh
$ chmod 600 ~/.ssh/authorized_keys
- Edit SSH configurations (ref. 1)
$ sudo nano /etc/ssh/sshd_config
$
$ # Check in the changes
$ sudo service sshd restart
$ sudo service sshd stop
$
$ # Monitor logs while SSH-ing via another session
$ sudo /usr/sbin/sshd -d
$
$ # Check in the changes
$ sudo service sshd restart
See this post on Stack Overflow.
Keep in mind that "the mounting" has to be done every time the VM is rebooted, therefore it's convenient to source the command automatically by placing the following script, mount_shared.sh
, in /etc/profile.d/
.
$ # "dev" is the name of the host directory I want to share (see "Settings" -- "Shared Folders" in VirtualBox)
$ # "~share/" is where "dev/" will be placed on the guest VM
$ sudo mount -t vboxsf dev ~/share/
Want two network adapters: one to connect the VM guest to the host (e.g. SSH) and another one for the VM to have access to the Internet.
On VirtualBox, "Settings" -- "Network":
- Adapter 1 (connect to the guest from the host)
- Attached to: Host-only Adapter
- Name: vboxnet0
- Adapter 2 (connect to the Internet)
- NAT
Only Python 2.6 comes with CentOS. See DigitalOcean's guide to install Python 2.7 (follow a similar process except for Python 2.7.10 as opposed to Python 2.7.6). This will save Python in /usr/local/bin/python2.7
.
As for pip
, install it with the system's default python using the following (will be saved in /usr/bin/pip
):
$ sudo yum install python-pip