Skip to content

Instantly share code, notes, and snippets.

@jmrr
Last active February 13, 2021 09:51
Show Gist options
  • Save jmrr/e0a1e36d0468e15140aca5341ac4571a to your computer and use it in GitHub Desktop.
Save jmrr/e0a1e36d0468e15140aca5341ac4571a to your computer and use it in GitHub Desktop.
Installing prediction.io commands on a CentOS Linux machine

Install dependencies

yum install -y \
  bzip2 \
  git \
  java-1.8.0-openjdk \
  java-1.8.0-openjdk-devel \
  python-setuptools python-dev python-numpy \
  install mysql-connector-python \
  easy_install predictionio \
  tar \
  unzip \
  && \
  yum clean all

Set environmental variables

You can discover Java JDK for JAVA_HOME with which javac and do ls -l to follow the symlinks

JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.101-3.b13.el7_2.x86_64
PIO_HOME=/opt/predictionio
PATH=$PATH:$PIO_HOME/bin

Download to /opt

cd /opt
git clone https://github.com/apache/incubator-predictionio.git predictionio
cd predictionio

NOTE: During the transition to the Apache ASF predictionion the URLs in the pio documentation might be unreliable, you can use action-ml fork in the meantime:

git clone https://github.com/actionml/PredictionIO.git predictionio

Make sure you are on the master branch or use the tag for the desired release version.

Download vendors

mkdir -p vendors
cd vendors
curl https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz | tar xvz && mv elasticsearch-1.4.4 elasticsearch
curl http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz | tar xvz && mv spark-1.5.1 spark
curl http://archive.apache.org/dist/hbase/1.1.2/hbase-1.1.2-bin.tar.gz | tar xvz && mv hbase-1.1.2 hbase

Modify the configuration

Go to cd $PIO_HOME && vim conf/pio_env.sh and add

#!/usr/bin/env bash

# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
SPARK_HOME=$PIO_HOME/vendors/spark

# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp

# PredictionIO Storage Configuration

# Storage Repositories
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH

PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE

PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS


# Elasticsearch Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
#PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
#PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch

# Local File System Example
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models

# HBase Example
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase

Create systemd services for vendors and pio event server and deploy

Refer to infrastructure repo to retrieve the templates name.service to place them in /usr/lib/systemd/system/ for:

  • elasticsearch.service
  • hbase.service
  • pioeventserver.service
  • pio.service

Start the services

Use systemctl start service, and check status with journalctl -xe.

Check status

# Event server
curl -XGET PIO_ADDRESS:7070

# PIO Engine server
curl -XGET PIO_ADDRESS:8000
@sonali1192
Copy link

where will I find the infrastructure repo?
please help me with that

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment