There is an NYU HPC Hadoop cluster (Dumbo) available for homework and projects - this is available to students registered for the course at no charge.
Support The NYU HPC IT team provides support for Dumbo - you can reach them at hpc@nyu.edu for assistance with the cluster; you can also use our class Forum on NYU Classes to get help.
Getting an account To get an account, follow these instructions (you can select your Course Professor for sponsor): https://wikis.nyu.edu/display/NYUHPC/Getting+or+renewing+an+HPC+account
Logging In Once you have an account, instructions for logging in are here: https://wikis.nyu.edu/display/NYUHPC/Clusters+-+Dumbo#Clusters-Dumbo-LOGGING_INLoggingIn
More Information You can read about Dumbo here: https://wikis.nyu.edu/display/NYUHPC/Clusters+-+Dumbo
If you want to try Dumbo, here are steps I've used to log into Dumbo. Use the Forum if you encounter any difficulties.
-
Execute these two steps to log into Dumbo, remember to replace 'yourNetID' with your own net ID.
-
ssh yourNetID@gw.hpc.nyu.edu
(You can skip this step if you are logged into the VPN - vpn.nyu.edu) -
ssh -Y yourNetID@dumbo.es.its.nyu.edu
-
-
Use an editor, such as vi, to create a text file in the local (non HDFS) file system
vi myTestData.txt
-
Next, put your data file into HDFS
hdfs dfs -ls / hdfs dfs -ls /user hdfs dfs -ls /user/yourNetID hdfs dfs -mkdir /user/yourNetID/class1 hdfs dfs -put myTestData.txt /user/yourNetID/class1 hdfs dfs -cat /user/yourNetID/class1/myTestData.txt
If the above steps worked, your Dumbo Hadoop account is ready to use.
Reference http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
-
In the already open terminal window type the following command to start the Spark shell - you shouldn't see any errors (warnings can be ignored):
$ spark-shell — Start the Scala version of the Spark REPL
After some output from the shell, you should see a scala> prompt
-
Some Commands you can try:
scala> :help — In the Spark shell, try the help command scala> sc[TAB] — View the commands available in the Spark Context (sc) scala> sc.version — View the version of Spark that is running in the shell scala> val myConstant: Int = 2016 scala> myConstant scala> my[TAB] scala> myConstant.[TAB] scala> myConstant.to[TAB] scala> myConstant.toFloat scala> myConstant — Note that myConstant has not changed; it’s still an Int scala> myConstant.toFloat.toInt scala> val myString = myConstant — Note the type inferred for myString scala> :type val myString2 = myConstant — Use the :type command to view the type that is inferred for myString2