Skip to content

Instantly share code, notes, and snippets.

@fones
Last active March 24, 2017 12:41
Show Gist options
  • Save fones/934406cd24127cc9543b58029f84c551 to your computer and use it in GitHub Desktop.
Save fones/934406cd24127cc9543b58029f84c551 to your computer and use it in GitHub Desktop.
Run Prediction.IO on AWS

Start

Go to home dir /home/ec2-user/

Install & Configure Java

sudo yum install java-1.8.0-openjdk
sudo yum install java-1.8.0-openjdk-devel
sudo /usr/sbin/alternatives --config java

Download PIO

Mirror: https://www.apache.org/dyn/closer.cgi/incubator/predictionio/0.10.0-incubating/apache-predictionio-0.10.0-incubating.tar.gz

wget http://ftp.piotrkosoft.net/pub/mirrors/ftp.apache.org/incubator/predictionio/0.10.0-incubating/apache-predictionio-0.10.0-incubating.tar.gz

Build

tar zxvf apache-predictionio-0.10.0-incubating.tar.gz
cd apache-predictionio-0.10.0-incubating
./make-distribution.sh

Unpack

tar zxvf PredictionIO-0.10.0-incubating.tar.gz
mkdir PredictionIO-0.10.0-incubating/vendors

Download Spark

wget http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz
tar zxvfC spark-1.5.1-bin-hadoop2.6.tgz PredictionIO-0.10.0-incubating/vendors

Download Elastic Search

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.4.4.tar.gz
tar zxvfC elasticsearch-1.4.4.tar.gz PredictionIO-0.10.0-incubating/vendors

edit configuration

vi PredictionIO-0.10.0-incubating/vendors/elasticsearch-1.4.4/config/elasticsearch.yml

set network.host: 127.0.0.1

Download HBase

wget http://archive.apache.org/dist/hbase/hbase-1.0.0/hbase-1.0.0-bin.tar.gz
tar zxvfC hbase-1.0.0-bin.tar.gz PredictionIO-0.10.0-incubating/vendors
mkdir PredictionIO-0.10.0-incubating/vendors/vendors/hbase-1.0.0/data

and configure by editing PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/conf/hbase-site.xml. Check other file in this gist.

Configure

Export or add to .bashrc

export JAVA_OPTS="-Xmx4g"
export PIO_HOME=/home/ec2-user/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.0-incubating
export PATH=$PATH:/home/ec2-user/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.0-incubating/bin

Edit PredictionIO-0.10.0-incubating/conf/pio-env.sh. Check other file in this gist.

Run

PredictionIO-0.10.0-incubating/bin/pio-start-all
PredictionIO-0.10.0-incubating/bin/pio status

you should get

[INFO] [Console$] Inspecting PredictionIO...
[INFO] [Console$] PredictionIO 0.10.0-incubating is installed at /home/ec2-user/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.0-incubating
[INFO] [Console$] Inspecting Apache Spark...
[INFO] [Console$] Apache Spark is installed at /home/ec2-user/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.0-incubating/vendors/spark-1.5.1-bin-hadoop2.6
[INFO] [Console$] Apache Spark 1.5.1 detected (meets minimum requirement of 1.3.0)
[INFO] [Console$] Inspecting storage backend connections...
[INFO] [Storage$] Verifying Meta Data Backend (Source: ELASTICSEARCH)...
[INFO] [Storage$] Verifying Model Data Backend (Source: LOCALFS)...
[INFO] [Storage$] Verifying Event Data Backend (Source: HBASE)...
[INFO] [Storage$] Test writing to Event Store (App Id 0)...
[INFO] [HBLEvents] The namespace pio_event doesn't exist yet. Creating now...
[INFO] [HBLEvents] The table pio_event:events_0 doesn't exist yet. Creating now...
[INFO] [HBLEvents] Removing table pio_event:events_0...
[INFO] [Console$] (sleeping 5 seconds for all messages to show up...)
[INFO] [Console$] Your system is all ready to go.

Visit http://<YOUR_MACHINE>:7070 you should get

{"status":"alive"}
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
/**
*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
-->
<configuration>
<property>
<name>hbase.rootdir</name>
<value>/home/ec2-user/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/data</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/ec2-user/apache-predictionio-0.10.0-incubating/PredictionIO-0.10.0-incubating/vendors/hbase-1.0.0/zookeeper</value>
</property>
</configuration>
#!/usr/bin/env bash
#
# Copy this file as pio-env.sh and edit it for your site's configuration.
#
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# PredictionIO Main Configuration
#
# This section controls core behavior of PredictionIO. It is very likely that
# you need to change these to fit your site.
# SPARK_HOME: Apache Spark is a hard dependency and must be configured.
SPARK_HOME=$PIO_HOME/vendors/spark-1.5.1-bin-hadoop2.6
POSTGRES_JDBC_DRIVER=$PIO_HOME/lib/postgresql-9.4-1204.jdbc41.jar
MYSQL_JDBC_DRIVER=$PIO_HOME/lib/mysql-connector-java-5.1.37.jar
# ES_CONF_DIR: You must configure this if you have advanced configuration for
# your Elasticsearch setup.
# ES_CONF_DIR=/opt/elasticsearch
# HADOOP_CONF_DIR: You must configure this if you intend to run PredictionIO
# with Hadoop 2.
# HADOOP_CONF_DIR=/opt/hadoop
# HBASE_CONF_DIR: You must configure this if you intend to run PredictionIO
# with HBase on a remote cluster.
# HBASE_CONF_DIR=$PIO_HOME/vendors/hbase-1.0.0/conf
# Filesystem paths where PredictionIO uses as block storage.
PIO_FS_BASEDIR=$HOME/.pio_store
PIO_FS_ENGINESDIR=$PIO_FS_BASEDIR/engines
PIO_FS_TMPDIR=$PIO_FS_BASEDIR/tmp
# PredictionIO Storage Configuration
#
# This section controls programs that make use of PredictionIO's built-in
# storage facilities. Default values are shown below.
#
# For more information on storage configuration please refer to
# http://predictionio.incubator.apache.org/system/anotherdatastore/
# Storage Repositories
# Default is to use PostgreSQL
PIO_STORAGE_REPOSITORIES_METADATA_NAME=pio_meta
PIO_STORAGE_REPOSITORIES_METADATA_SOURCE=ELASTICSEARCH
PIO_STORAGE_REPOSITORIES_EVENTDATA_NAME=pio_event
PIO_STORAGE_REPOSITORIES_EVENTDATA_SOURCE=HBASE
PIO_STORAGE_REPOSITORIES_MODELDATA_NAME=pio_model
PIO_STORAGE_REPOSITORIES_MODELDATA_SOURCE=LOCALFS
# Storage Data Sources
# PostgreSQL Default Settings
# Please change "pio" to your database name in PIO_STORAGE_SOURCES_PGSQL_URL
# Please change PIO_STORAGE_SOURCES_PGSQL_USERNAME and
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD accordingly
# PIO_STORAGE_SOURCES_PGSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_PGSQL_URL=jdbc:postgresql://localhost/pio
# PIO_STORAGE_SOURCES_PGSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_PGSQL_PASSWORD=pio
# MySQL Example
# PIO_STORAGE_SOURCES_MYSQL_TYPE=jdbc
# PIO_STORAGE_SOURCES_MYSQL_URL=jdbc:mysql://localhost/pio
# PIO_STORAGE_SOURCES_MYSQL_USERNAME=pio
# PIO_STORAGE_SOURCES_MYSQL_PASSWORD=pio
# Elasticsearch Example
PIO_STORAGE_SOURCES_ELASTICSEARCH_TYPE=elasticsearch
# PIO_STORAGE_SOURCES_ELASTICSEARCH_CLUSTERNAME=<elasticsearch_cluster_name>
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOSTS=localhost
PIO_STORAGE_SOURCES_ELASTICSEARCH_PORTS=9300
PIO_STORAGE_SOURCES_ELASTICSEARCH_HOME=$PIO_HOME/vendors/elasticsearch-1.4.4
# Local File System Example
PIO_STORAGE_SOURCES_LOCALFS_TYPE=localfs
PIO_STORAGE_SOURCES_LOCALFS_PATH=$PIO_FS_BASEDIR/models
# HBase Example
PIO_STORAGE_SOURCES_HBASE_TYPE=hbase
PIO_STORAGE_SOURCES_HBASE_HOME=$PIO_HOME/vendors/hbase-1.0.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment