jungleBadger/process.md

## process.md

      
    Raw
  

              process.md
            
          
    Preparing to run on Cluster:

Variables:


These variables are used accross the guide


$HOME = Home root folder
$SPACE_NAME = Your squad/unique_id to create a new branch project space
$TMP_FILE_NAME = Auto generated .tmp file to be used on deploy
$SPARK_MASTER_POD_NAME = Pod name that is running the Spark Master instance

Step by step:


This process is intented to make a build of epm project and run it on Spark master cluster pod.

1. Generate a deploy .jar file following these steps:


Go to the project's folder (cd /vagrant/scala/com/ibm/epm/magichat)
Enter your space folder (cd /$SPACE_NAME)
Run bazel build command (bazel build //scala/com/ibm/epm/magichat/$SPACE_NAME:$SPACE_NAME_deploy.jar)

This will generate a .exp file under $HOME/.cedp/creds.exp
and a .jar file under $HOME/vagrant/bazel-bin/scala/com/ibm/epm/magichat/$SPACE_NAME/$SPACE_NAME_deploy.jar
 ** Be careful to add the _deploy suffix after the $SPACE_NAME on run command


2. Generate tmp credentials


Go machine's to root folder (cd)
Go to tools folder (cd /vagrant/tools)
execute the cedp_credentials.sh file with the cache param (./cedp_credentials.sh cache)

This will generate a .tmp file with a random name under /tmp folder


Go back to root folder (cd)
Go to tmp folder (cd tmp)
Locate the .tmp file (ls *.tmp)
Copy this file name ($TMP_FILE_NAME)


3. Locate Master spark pod


Go back to machine's root folder (cd)
Run kubectl command  to list available pods (kubectl get pods)
Copy the spark-master pod name ($SPARK_MASTER_POD_NAME)


4. Copy files from vagrant machine to kubernetes master spark pod


Go to root folder (cd)
Move the generated .tmp file to Spark master pod: (kubectl cp $HOME/tmp/$TMP_FILE_NAME.tmp $SPARK_MASTER_POD_NAME:/spark/Spark/TMP_FILE_NAME.tmp)
Move the generated .exp file to Spark Master pod: (kubectl cp $HOME/vagrant/.cedp/creds.exp $SPARK_MASTER_POD_NAME:/root/.cedp/creds.exp)
Move the generated .jar file to Spark Master pod (kubectl cp $HOME/vagrant/bazel-bin/scala/com/ibm/epm/magichat/$SPACE_NAME/SPACE_NAME_deploy.jar $SPARK_MASTER_POD_NAME:spark/Spark/jars/$SPACE_NAME_deploy.jar)


5. Enter in Spark Cluster


Enter Spark master pod terminal (kubectl exec -it $SPARK_MASTER_POD_NAME /bin/bash)


6. Run spark-submit command to run the command on Cluster


Run spark submit command
(spark-submit --conf spark.driver.extraClassPath=/spark/Spark/jars/$SPACE_NAME_deploy.jar /spark/Spark/jars/$SPACE_NAME_deploy.jar --cf=/spark/Spark/secrets/$TMP_FILE_NAME.tmp)


Troubleshoot

Error loading spark submit command


Syntom: WARNING:root:could not open file '/etc/apt/sources.list.d/apt_dockerproject_org_repo.list'
Solution: You are running spark-submit command from within vagrant/local machine


Syntom: Error: Cannot load main class from JAR file:/dcerag_deploy.jar
Solution: Set the absolute path to .jar file