Skip to content

Instantly share code, notes, and snippets.

@maldins46
Last active July 1, 2021 15:14
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save maldins46/1f5ef925622d04754ac7cd2f38535fda to your computer and use it in GitHub Desktop.
Save maldins46/1f5ef925622d04754ac7cd2f38535fda to your computer and use it in GitHub Desktop.
Connect to Unibo Hadoop cluster via Unix terminal

Connect to Unibo Hadoop Cluster via Unix terminal, for lab sessions

During our Big Data lab sessions, we use PuTTY and WinSCP to connect to the Hadoop cluster. This solution cannot be accomplished from students with a Linux/MacOS device, as these programs are not available outside Windows.

On live sessions, it is easy to do this using Guacamole; but if you want to use Spark outside of lab hours, different solutions are required.

As I'm following lessions with a MacOS device, I was searching a rapid solution to do this, without installing too much external software. I found a way to go using ssh and scp via terminal (tested with bash or zsh), that are pre-installed into the majority of MacOS and Linux systems.

TLDR

You can directly connect to the cluster with this one-liner:

ssh -t 'riccardo.maldini2@studio.unibo.it'@isi-alfa.csr.unibo.it "ssh rmaldini@isi-vclust4.csr.unibo.it"

Use your Unibo and cluster credentials, end enter the respective passwords when asked. Here you can use foe example spark2 and spark2-submit.

To use Hue Web Interface and Cloudera Manager into your browser, open another connection with this command, then configure proxy settings of your browser it to use a SOCKS proxy with host name 127.0.0.1 and port 8080:

$ ssh -D 8080 'riccardo.maldini2@studio.unibo.it'@isi-alfa.csr.unibo.it

A more detailed explanation of what is happening here is given below.

Connect to a remote session

You can manage to do it in two steps. Fistly, you have to connect to Unibo servers with your credentials via ssh. I use this command from my Terminal. You should change my address with your institutional address:

$ ssh 'riccardo.maldini2@studio.unibo.it'@isi-alfa.csr.unibo.it

Sometimes the server refuses the connection, so you may have to retry sending the command. If the connection is successful, the server asks you the password (insert your institutional credentials)... and you are in! You should see a Debian bash shell, pointing to your personal Unibo server space, with a prefix like STUDENTI\username:~$.

Here you can connect to the cluster used for Hadoop operations, by opening another SSH connection from the inner bash terminal. Try this command to connect to the cluster, using your assigned node credentials:

STUDENTI\username:~$ ssh rmaldini@isi-vclust4.csr.unibo.it

Aaaand you're done again! If you see a bash prefix like [rmaldini@isi-vclust4 ~]$, you're in. Here you can use spark2-shell and spark2-submit commands. Go in :paste mode to use directly your code inside the Spark Shell.

The one-liner reported in the first section just executes the two steps together.

How to load a file into the server

If you want to use spark2-submit and load your jar from your laptop, the way is more tricker. At least, the solution I found is. I'm sure there are better and more direcy ways, so help me to modify the gist if you find something better.

In my solution, i use scp (Secure Copy), from my laptop terminal:

$ scp ~/myfile.jar 'riccardo.maldini2@studio.unibo.it'@isi-alfa.csr.unibo.it:.

This loads the file in your personal Unibo server space. Now, if you want to use the file with Spark, you should load the file into the cluster node space. So, load again the file, this time from the remote server terminal (not the cluster one):

STUDENTI\user:~$ scp myfile.jar rmaldini@isi-vclust4.csr.unibo.it:.

Now your file should be ready to be used from the cluster node with spark2-submit.

Hue Web Interface and Cloudera Manager

You have to login into the remote machine using:

$ ssh -D 8080 'riccardo.maldini2@studio.unibo.it'@isi-alfa.csr.unibo.it

Leave the terminal open; now, go to your browser's proxy settings, and configure it to use a SOCKS proxy with host name 127.0.0.1 and port 8080. All pages you load in your web browser will be tunnelled through the SSH connection, by now. You should now be able to access the private web page in the same way you would from the remote host.

This means you can access:

  • Hue web Interface, at http://isi-vclust0.csr.unibo.it:8889;
  • Cloudera Manager at http://137.204.72.233:7180;
  • Spark Manager, at http://isi-vclust0.csr.unibo.it:18089
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment