Skip to content

Instantly share code, notes, and snippets.

@csbond007
Last active October 24, 2016 18:21
Show Gist options
  • Save csbond007/f5cbb19280e9ede10777b19223dda9af to your computer and use it in GitHub Desktop.
Save csbond007/f5cbb19280e9ede10777b19223dda9af to your computer and use it in GitHub Desktop.
Reference : http://www.nodalpoint.com/development-and-deployment-of-spark-applications-with-scala-eclipse-and-sbt-part-1-installation-configuration/
Make a Directory spark_sbt_eclipse_cassandra
SBT Installation : http://www.scala-sbt.org/0.13/docs/Installing-sbt-on-Linux.html
curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
sudo yum install sbt
///////////////////////////////////////////////////////////////////////////////////////
sbteclipse plugin + sbt-assembly plugin
mkdir -p ~/.sbt/0.13/plugins
sudo gedit ~/.sbt/0.13/plugins/plugins.sbt
addSbtPlugin("com.typesafe.sbteclipse" % "sbteclipse-plugin" % "4.0.0")
addSbtPlugin("com.eed3si9n" % "sbt-assembly" % "0.13.0")
//////////////////////////////////////////////////////////////////////////////////////
SCALA IDE
sudo wget http://downloads.typesafe.com/scalaide-pack/4.1.1-vfinal-luna-211-20150728/scala-SDK-4.1.1-vfinal-2.11-linux.gtk.x86_64.tar.gz
sudo gunzip scala-SDK-4.1.1-vfinal-2.11-linux.gtk.x86_64.tar.gz
sudo tar -xvf scala-SDK-4.1.1-vfinal-2.11-linux.gtk.x86_64.tar
For running Eclipse
cd eclipse
./eclipse // Runs Eclipse Luna
//////////////////////////////////////////////////////////////////////////////////
sudo wget http://www.scala-lang.org/files/archive/scala-2.11.8.tgz
sudo tar xvf scala-2.11.8.tgz
sudo mv scala-2.11.8 /usr/lib
sudo ln -s /usr/lib/scala-2.11.8 /usr/lib/scala
export PATH=$PATH:/usr/lib/scala/bin
scala -version
///////////////////////////////////////////////////////////////////////////////////////
http://spark.apache.org/downloads.html
Download and extract through Archive Manager
cd spark-2.0.1-bin-hadoop2.7
export PATH=$PATH:/home/ksaha/spark_sbt_eclipse_cassandra/spark-2.0.1-bin-hadoop2.7/bin
//////// Sample SBT Project ///////////////////////////////////////////////////////
mkdir SampleApp
cd SampleApp
mkdir -p src/main/scala
In the directory ~/SampleApp/src/main/scala we create the following Scala file SampleApp.scala (using just a text editor for now):
cd src/main/scala
sudo gedit SampleApp.scala
/* SampleApp.scala:
This application simply counts the number of lines that contain "val" from itself
*/
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SampleApp {
def main(args: Array[String]) {
val txtFile = "/home/osboxes/SampleApp/src/main/scala/SampleApp.scala"
val conf = new SparkConf().setAppName("Sample Application")
val sc = new SparkContext(conf)
val txtFileLines = sc.textFile(txtFile , 2).cache()
val numAs = txtFileLines .filter(line => line.contains("val")).count()
println("Lines with val: %s".format(numAs))
}
}
//////////////////////////////////////////////////////////////////////////////////////////////////
In the directory ~/SampleApp we create a configuration file sample.sbt containing the following:
sudo gedit sample.sbt
name := "Sample Project"
version := "1.0"
scalaVersion := "2.11.8"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.1"
//////////////////////////////////////////////////////////////////
The resulting directory structure should be as shown below:
~/SampleApp$ find .
.
./sample.sbt
./src
./src/main
./src/main/scala
./src/main/scala/SampleApp.scala
/////////////////////////////////////////////////
sbt package
spark-submit --class "SampleApp" --master local[2] target/scala-2.11/sample-project_2.11-1.0.jar
sbt eclipse
/// Open up Eclipse now
Import -> General -> Existing Projects into Workspace -> browse and select SampleApp -> Finish
Change configuration to add master in code SampleApp
val conf = new SparkConf().setAppName("Sample Application").setMaster("local[2]")
cd /home/ksaha/spark_sbt_eclipse_cassandra/SampleApp/src/main/scala
ls -l
sudo chmod 777 *
ls -l // write permissions will be enabled now
/// Running Code //////////////////////////////////////
Run Configurations -> Scala Application
Set Main Class as "SampleApp"
Set Name as "Sample_Application"
Run the Sample_Application
/////////////////////////////////////////////////////////////////////////////////////////////////////
// WARN TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure
// that workers are registered and have sufficient resources
sudo vim /tmp/post.txt
frameworkId=33ea2954-5fd5-494e-b4ad-8f1cb77fde51-0008 // framework id
curl -d@/tmp/post.txt -X POST http://10.10.40.138:5050/master/teardown
// Java 8
http://tecadmin.net/install-java-8-on-centos-rhel-and-fedora/
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment