satyajeetmaharana/Readme.md

## Readme.md

      
    Raw
  

              Readme.md
            
          
    How to run Spark on NYU Dumbo Cluster

NYU’s Hadoop Cluster, Dumbo

There is an NYU HPC Hadoop cluster (Dumbo) available for homework and projects - this is available to students registered for the course at no charge.
Support
The NYU HPC IT team provides support for Dumbo - you can reach them at hpc@nyu.edu for assistance with the cluster; you can also use our class Forum on NYU Classes to get help.
Getting an account
To get an account, follow these instructions (you can select your Course Professor for sponsor): https://wikis.nyu.edu/display/NYUHPC/Getting+or+renewing+an+HPC+account
Logging In
Once you have an account, instructions for logging in are here: https://wikis.nyu.edu/display/NYUHPC/Clusters+-+Dumbo#Clusters-Dumbo-LOGGING_INLoggingIn
More Information
You can read about Dumbo here: https://wikis.nyu.edu/display/NYUHPC/Clusters+-+Dumbo
Dumbo - Logging In, Testing HDFS

If you want to try Dumbo, here are steps I've used to log into Dumbo.
Use the Forum if you encounter any difficulties.


Execute these two steps to log into Dumbo, remember to replace 'yourNetID' with your own net ID.


ssh yourNetID@gw.hpc.nyu.edu (You can skip this step if you are logged into the VPN - vpn.nyu.edu)


ssh -Y yourNetID@dumbo.es.its.nyu.edu


Use an editor, such as vi, to create a text file in the local (non HDFS) file system
vi myTestData.txt


Next, put your data file into HDFS
hdfs dfs -ls /
hdfs dfs -ls /user
hdfs dfs -ls /user/yourNetID
hdfs dfs -mkdir /user/yourNetID/class1
hdfs dfs -put myTestData.txt /user/yourNetID/class1 
hdfs dfs -cat /user/yourNetID/class1/myTestData.txt


If the above steps worked, your Dumbo Hadoop account is ready to use.
Reference
http://hadoop.apache.org/docs/r2.6.0/hadoop-project-dist/hadoop-common/FileSystemShell.html
Using Spark REPL


In the already open terminal window type the following command to start the Spark shell - you shouldn't see any errors (warnings can be ignored):
$ spark-shell          — Start the Scala version of the Spark REPL
After some output from the shell, you should see a scala> prompt


Some Commands you can try:
scala> :help    — In the Spark shell, try the help command
scala> sc[TAB]	— View the commands available in the Spark Context (sc) 
scala> sc.version  — View the version of Spark that is running in the shell
scala> val myConstant: Int = 2016 scala> myConstant
scala> my[TAB]
scala> myConstant.[TAB]
scala> myConstant.to[TAB]
scala> myConstant.toFloat 
scala> myConstant		— Note that myConstant has not changed; it’s still an Int
scala> myConstant.toFloat.toInt
scala> val myString = myConstant		— Note the type inferred for myString
scala> :type val myString2 = myConstant		— Use the :type command to view the type that is inferred for myString2