Skip to content

Instantly share code, notes, and snippets.

@jungleBadger
Last active May 7, 2018 20:04
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jungleBadger/646e367b7113dd19d6866c31fc493107 to your computer and use it in GitHub Desktop.
Save jungleBadger/646e367b7113dd19d6866c31fc493107 to your computer and use it in GitHub Desktop.
CEDP/EPM Cluster run process

Preparing to run on Cluster:

Variables:

These variables are used accross the guide

  • $HOME = Home root folder
  • $SPACE_NAME = Your squad/unique_id to create a new branch project space
  • $TMP_FILE_NAME = Auto generated .tmp file to be used on deploy
  • $SPARK_MASTER_POD_NAME = Pod name that is running the Spark Master instance

Step by step:

This process is intented to make a build of epm project and run it on Spark master cluster pod.

1. Generate a deploy .jar file following these steps:
  • Go to the project's folder (cd /vagrant/scala/com/ibm/epm/magichat)
  • Enter your space folder (cd /$SPACE_NAME)
  • Run bazel build command (bazel build //scala/com/ibm/epm/magichat/$SPACE_NAME:$SPACE_NAME_deploy.jar)
    • This will generate a .exp file under $HOME/.cedp/creds.exp
    • and a .jar file under $HOME/vagrant/bazel-bin/scala/com/ibm/epm/magichat/$SPACE_NAME/$SPACE_NAME_deploy.jar  ** Be careful to add the _deploy suffix after the $SPACE_NAME on run command
tmp
2. Generate tmp credentials
  • Go machine's to root folder (cd)
  • Go to tools folder (cd /vagrant/tools)
  • execute the cedp_credentials.sh file with the cache param (./cedp_credentials.sh cache)
    • This will generate a .tmp file with a random name under /tmp folder
  • Go back to root folder (cd)
  • Go to tmp folder (cd tmp)
  • Locate the .tmp file (ls *.tmp)
  • Copy this file name ($TMP_FILE_NAME)
build
3. Locate Master spark pod
  • Go back to machine's root folder (cd)
  • Run kubectl command to list available pods (kubectl get pods)
  • Copy the spark-master pod name ($SPARK_MASTER_POD_NAME)
getpods
4. Copy files from vagrant machine to kubernetes master spark pod
  • Go to root folder (cd)
  • Move the generated .tmp file to Spark master pod: (kubectl cp $HOME/tmp/$TMP_FILE_NAME.tmp $SPARK_MASTER_POD_NAME:/spark/Spark/TMP_FILE_NAME.tmp)
  • Move the generated .exp file to Spark Master pod: (kubectl cp $HOME/vagrant/.cedp/creds.exp $SPARK_MASTER_POD_NAME:/root/.cedp/creds.exp)
  • Move the generated .jar file to Spark Master pod (kubectl cp $HOME/vagrant/bazel-bin/scala/com/ibm/epm/magichat/$SPACE_NAME/SPACE_NAME_deploy.jar $SPARK_MASTER_POD_NAME:spark/Spark/jars/$SPACE_NAME_deploy.jar)
copyfiles
5. Enter in Spark Cluster
  • Enter Spark master pod terminal (kubectl exec -it $SPARK_MASTER_POD_NAME /bin/bash)
entersparkmaster
6. Run spark-submit command to run the command on Cluster
  • Run spark submit command (spark-submit --conf spark.driver.extraClassPath=/spark/Spark/jars/$SPACE_NAME_deploy.jar /spark/Spark/jars/$SPACE_NAME_deploy.jar --cf=/spark/Spark/secrets/$TMP_FILE_NAME.tmp)
screenshot 2018-05-07 16 56 33

Troubleshoot

Error loading spark submit command

Syntom: WARNING:root:could not open file '/etc/apt/sources.list.d/apt_dockerproject_org_repo.list'

Solution: You are running spark-submit command from within vagrant/local machine

Syntom: Error: Cannot load main class from JAR file:/dcerag_deploy.jar

Solution: Set the absolute path to .jar file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment