Skip to content

Instantly share code, notes, and snippets.

@holyglenn
Last active June 28, 2016 20:20
Show Gist options
  • Save holyglenn/dc3a2b8a5d496735a0a297b0d5ec3479 to your computer and use it in GitHub Desktop.
Save holyglenn/dc3a2b8a5d496735a0a297b0d5ec3479 to your computer and use it in GitHub Desktop.
Petuum Automated Installation Script

Petuum Quick Start

We now provide a script to help you setup Bösen and Strads systems on a single machine with just 1 command. After setting it up, you can run two demo applications to verify that they are working. If you seek further deployment or prefer a more detailed hands-on experience, please refer to this full installation guide. Also check out Poseiden, the multi-GPU distributed deep learning framework of Petuum.

Before start, run the following commands to prepare necessary environment. If you do not have sudo privilege, please contact your administor for help. After getting these ready, you are good to run Petuum with or without sudo.

sudo apt-get -y update && sudo apt-get -y install g++ make autoconf git \
  libtool uuid-dev openssh-server cmake libopenmpi-dev openmpi-bin libssl-dev \
  libnuma-dev python-dev python-numpy python-scipy python-yaml protobuf-compiler \
  subversion libxml2-dev libxslt-dev zlibc zlib1g zlib1g-dev libbz2-1.0 \
  libbz2-dev libgoogle-glog-dev libzmq3-dev libyaml-cpp-dev \
  subversion libxml2-dev libxslt-dev zlibc zlib1g zlib1g-dev libbz2-1.0 libbz2-dev 

If you have sudo privilege, run the following command to install Petuum's dependencies.

sudo apt-get -y install libgoogle-glog-dev libzmq3-dev libyaml-cpp-dev \
  libgoogle-perftools-dev libsnappy-dev libsparsehash-dev

Then run the setup command, which takes approximately 10 minutes to setup Petuum on a 2-core machine.

python petuum.py setup

The script will enable passwordless ssh connection to localhost using default id_rsa.pub key or generate one if without. Then it will download and compile Petuum's source code and its customized dependencies.

After compilation, to run the Multi-class Logistic Regression demo (in Bosen system), run

python petuum.py run_mlr

The app launches locally and trains multi-class logistic regression model using a subset of the Covertype dataset. You should see something like below. The numbers will be slightly different as it's executed indeterministically with multi-threads.

40 400 0.253846 0.61287 520 0.180000 50 7.43618
I0701 00:35:00.550900  9086 mlr_engine.cpp:298] Final eval: 40 400 train-0-1: 0.253846 train-entropy: 0.61287 num-train-used: 520 test-0-1: 0.180000 num-test-used: 50 time: 7.43618
I0701 00:35:00.551867  9086 mlr_engine.cpp:425] Loss up to 40 (exclusive) is saved to /home/ubuntu/petuum/app/mlr/out.loss in 0.000955387
I0701 00:35:00.552652  9086 mlr_sgd_solver.cpp:160] Saved weight to /home/ubuntu/petuum/app/mlr/out.weight
I0701 00:35:00.553907  9031 mlr_main.cpp:150] MLR finished and shut down!

To run the MedLDA supervised topic model (in STRADS system), run

python petuum.py run_lda

The app launches 3 workers locally and trains with 20 newsgroup dataset. You will see outputs like below. Once all workers have reported "Ready to exit program", you may Ctrl-C to terminate the program.

......
Rank (2) Ready to exit program from main function in ldall.cpp
I1222 20:38:31.271615  2687 trainer.cpp:464] (rank:0) Dict written into /tmp/dump_dict
I1222 20:38:31.271632  2687 trainer.cpp:465] (rank:0) Total num of words: 53485
I1222 20:38:46.930896  2687 trainer.cpp:487] (rank:0) Model written into /tmp/dump_model
Rank (0) Ready to exit program from main function in ldall.cpp

Use the following command to display top 10 words in each of the topics that's just generated.

python petuum.py display_topics

If you don't have sudo, run the setup command with --no-sudo argument. In addition to the sudo setup command, the script will compile and install Petuum's dependencies in its local folder. This setup process takes about 20 minutes.

python petuum.py setup --no-sudo

Then you can run Petuum's demo applications as stated above.

from __future__ import print_function
import sys, os, time
import os.path
from os.path import dirname
from os.path import join
import argparse
import numpy as np
default_dir = join(os.environ['HOME'], "petuum_test")
# with sudo privilege
def perform(cmd):
print (cmd)
os.system(cmd)
def download_petuum(args):
# Download Petum Source Code.
perform("mkdir -p %s" % args.install_path)
perform("cd %s; git clone -b stable https://github.com/petuum/bosen.git" \
% args.install_path)
perform("cd %s; git clone https://github.com/petuum/strads.git" \
% args.install_path)
perform("cd %s; git clone https://github.com/petuum/third_party.git" \
% join(args.install_path, "bosen"))
def compile_petuum(arg):
# Compile the STRADS, BOSEN_THIRD_PARTY and BOSEN software.
if args.sudo:
perform("cd %s; make path && make third_party_minimal -j4 " \
"&& make st_lib -j4" % join(args.install_path, "strads"))
perform("cd %s; make" % join(args.install_path, "bosen/third_party"))
else:
perform("cd %s; make" % join(args.install_path, "strads"))
perform("cd %s; make third_party_all"
% join(args.install_path, "bosen/third_party"))
perform("cd %s; cp defns.mk.template defns.mk; make -j4"
% join(args.install_path, "bosen"))
def self_passwordless_ssh():
default_key = join(os.environ['HOME'],".ssh/id_rsa.pub")
if not os.path.isfile(default_key):
perform("ssh-keygen")
perform("cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys")
perform("chmod 0600 ~/.ssh/authorized_keys")
def setup_petuum(args):
self_passwordless_ssh()
download_petuum(args)
compile_petuum(args)
def run_mlr(args):
# Run the application of Multiclass Logistic Regression
mlr_path = join(args.install_path, "bosen/app/mlr")
cmd = "make -j4 && cp script/launch.py.template script/launch.py && " \
"chmod +x script/launch.py && ./script/launch.py && " \
"echo You have just run Mutli-Class Logistic Regression!"
perform("cd %s; %s" % (mlr_path, cmd) )
def run_lda(args):
# Run the application of MedLDA
lda_path = join(args.install_path, "strads/apps/medlda_release")
cmd = "make -j4 && python split.py 20news.train 3 && " \
"python split.py 20news.test 3 && python single.py && " \
"echo You have just run Supervised Topic Model!"
perform("cd %s; %s" % (lda_path, cmd))
def display_topics():
# Display Top 10 words in each topic.
dictfile = open('/tmp/dump_dict','r')
dict = [word.rstrip() for word in dictfile]
modelfile = open('/tmp/dump_model','r')
for model in modelfile:
vec = np.array(model.split())
wordvec = vec.astype(np.float)
index = wordvec.argsort()[-10:][::-1]
print ([dict[i] for i in index])
if __name__ == "__main__":
print ("Default Directory for Setup: %s." % default_dir)
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers(title='subcommands', dest='subparser')
setup_parser = subparsers.add_parser(
'setup', help="Download & Compile Petuum in a local folder.")
setup_parser.add_argument('--install_path', default=default_dir,
help="local folder to put Petuum. %s by default." % default_dir)
setup_parser.add_argument('--sudo', dest='sudo', action='store_true',
help="by default, assume you would have sudo")
setup_parser.add_argument('--no-sudo', dest='sudo', action='store_false',
help="append this if you don't have sudo")
setup_parser.set_defaults(sudo=True)
mlr_parser = subparsers.add_parser(
'run_mlr',
help="Run example Petuum App: Multi-Class Logistic Regression.")
mlr_parser.add_argument('--install_path', default=default_dir,
help="local folder to put Petuum. %s by default." % default_dir)
lda_parser = subparsers.add_parser(
'run_lda',
help="Run example Petuum App: MedLDA, a supervised topic model.")
lda_parser.add_argument('--install_path', default=default_dir,
help="local folder to put Petuum. %s by default." % default_dir)
topic_parser = subparsers.add_parser(
'display_topics',
help="Show MedLDA Topic Result: Top 10 Words of Each Result.")
args = parser.parse_args()
if args.subparser == 'setup':
setup_petuum(args)
print ("Petuum is setup at default location %s." % default_dir)
elif args.subparser == 'run_mlr':
run_mlr(args)
elif args.subparser == 'run_lda':
run_lda(args)
elif args.subparser == 'display_topics':
display_topics()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment