Skip to content

Instantly share code, notes, and snippets.

@Ben-Epstein
Created June 3, 2020 22:51
Show Gist options
  • Save Ben-Epstein/1ad1861af051be3a433971b1db2bc99a to your computer and use it in GitHub Desktop.
Save Ben-Epstein/1ad1861af051be3a433971b1db2bc99a to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{"cells":[{"metadata":{},"cell_type":"markdown","source":"<style>\n#s {\n}\nh1, h2, h3, h4, h5, h6, table, button, a, p, blockquote {\nfont-family:Geneva;\n}\n\n.log {\ntransition: all .2s ease-in-out;\n}\n\n.log:hover {a\ntransform: scale(1.05);\n}\n</style>\n<div id='s' style='width:90%'>\n<center><img class='log' src='https://splicemachine.com/wp-content/uploads/splice-logo-1.png' width='20%' style='z-index:5'></center>\n<center><h1 class='log' style='font-size:40px; color:black;'>Welcome to Splice Machine MLManager</h1></center>\n<center><h2 class = 'log' style='font-size:25px; color:grey;'>The data platform for intelligent applications</center>\n<center><img class='log' src='https://splice-demo.s3.amazonaws.com/splice-machine-data-science-process-h2o.png' width='40%' style='z-index:5'></center>\n</div>\n \n "},{"metadata":{},"cell_type":"markdown","source":"# In this notebook, we're going to take a look at using MLManager with [H2O](https://www.h2o.ai/) + [Spark](https://spark.apache.org/)\n<h2 style='font-size:25px; font-weight:bold'>What is <a href=http://docs.h2o.ai/sparkling-water/2.1/latest-stable/doc/pysparkling.html>PySparkling Water?</a> What is <a href=https://splicemachine.com/product/ml-manager/>MLManager?</a></h2>\n<style>\nblockquote{\n font-size: 15px;\n background: #f9f9f9;\n border-left: 10px solid #ccc;\n margin: .5em 10px;\n padding: 30em, 10px;\n quotes: \"\\201C\"\"\\201D\"\"\\2018\"\"\\2019\";\n padding: 10px 20px;\n line-height: 1.4;\n}\n\nblockquote:before {\n content: open-quote;\n display: inline;\n height: 0;\n line-height: 0;\n left: -10px;\n position: relative;\n top: 30px;\n bottom:30px;\n color: #ccc;\n font-size: 3em;\n display:none;\n\n}\n\np{\n margin: 0;\n}\n\nfooter{\n margin:0;\n text-align: right;\n font-size: 1em;\n font-style: italic;\n}\n</style>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>PySparkling</span></b> <br><br>PySparkling Water is an awesome H2O extension that allows you to run H2O clusters on top of existing Spark clusters. With Splice Machine, this integration is taken care of for you, so it's simple to start modeling with your new favorite library</i></br><footer>Splice Machine</footer></blockquote><br>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>MLManager (+MLFlow)</span></b><br><br>As a data scientist constantly creating new models and testing new features, it is necessary to effectively track and manage those different ML runs. MLManager + MLFlow allows you to track entire <code>experiments</code> and individual <code>run</code> parameters and metrics. The way you organize your flow is unique to you, and the intuitive Python API allows you to organize your development process and run with it.<br>\n <center><img class='log' src='https://s3.amazonaws.com/splice-demo/mlflow+ui.png' width='40%' style='z-index:5'></center>"},{"metadata":{},"cell_type":"markdown","source":"# Let's get started\n## In this notebook, we will see how to use Spark, H2O and MLManager to predict sentiment analysis of Amazon reviews, tracking everything in the [MLFlow UI](/mlflow) and deploy our models to production\nThis is an adaptation of the original [H2O Demo](http://docs.h2o.ai/h2o-tutorials/latest-stable/h2o-world-2017/nlp/index.html)"},{"metadata":{},"cell_type":"markdown","source":"## Important imports and setup\n* Create our Spark Session\n* Create our Native Spark Data Source\n* Create our PySparkling Water cluster\n* Import our MLManager functionality"},{"metadata":{"scrolled":true,"trusted":false},"cell_type":"code","source":"pip install --upgrade git+https://www.github.com/splicemachine/pysplice@DBAAS-3990","execution_count":3,"outputs":[{"name":"stdout","output_type":"stream","text":"Collecting git+https://www.github.com/splicemachine/pysplice@DBAAS-3990\n Cloning https://www.github.com/splicemachine/pysplice (to revision DBAAS-3990) to /tmp/pip-req-build-fbn95kw6\n Running command git clone -q https://www.github.com/splicemachine/pysplice /tmp/pip-req-build-fbn95kw6\n Running command git checkout -b DBAAS-3990 --track origin/DBAAS-3990\n Switched to a new branch 'DBAAS-3990'\n Branch 'DBAAS-3990' set up to track remote branch 'DBAAS-3990' from 'origin'.\nRequirement already satisfied, skipping upgrade: py4j==0.10.8.1 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (0.10.8.1)\nCollecting pytest==5.1.3\n Downloading pytest-5.1.3-py3-none-any.whl (224 kB)\n\u001b[K |████████████████████████████████| 224 kB 9.3 MB/s eta 0:00:01\n\u001b[?25hRequirement already satisfied, skipping upgrade: mlflow==1.6.0 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (1.6.0)\nRequirement already satisfied, skipping upgrade: mleap==0.15.0 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (0.15.0)\nRequirement already satisfied, skipping upgrade: graphviz==0.13 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (0.13)\nRequirement already satisfied, skipping upgrade: requests in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (2.23.0)\nRequirement already satisfied, skipping upgrade: gorilla==0.3.0 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (0.3.0)\nRequirement already satisfied, skipping upgrade: tqdm==4.43.0 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (4.43.0)\nRequirement already satisfied, skipping upgrade: pyspark-dist-explore==0.1.8 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (0.1.8)\nRequirement already satisfied, skipping upgrade: numpy==1.18.2 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (1.18.2)\nRequirement already satisfied, skipping upgrade: pandas==1.0.3 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (1.0.3)\nRequirement already satisfied, skipping upgrade: scipy==1.4.1 in /opt/conda/lib/python3.7/site-packages (from splicemachine==2.0.0) (1.4.1)\nRequirement already satisfied, skipping upgrade: wcwidth in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (0.2.3)\nRequirement already satisfied, skipping upgrade: py>=1.5.0 in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (1.8.1)\nRequirement already satisfied, skipping upgrade: more-itertools>=4.0.0 in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (8.3.0)\nRequirement already satisfied, skipping upgrade: attrs>=17.4.0 in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (19.3.0)\nRequirement already satisfied, skipping upgrade: packaging in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (20.4)\nRequirement already satisfied, skipping upgrade: atomicwrites>=1.0 in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (1.4.0)\nRequirement already satisfied, skipping upgrade: pluggy<1.0,>=0.12 in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (0.13.1)\nRequirement already satisfied, skipping upgrade: importlib-metadata>=0.12; python_version < \"3.8\" in /opt/conda/lib/python3.7/site-packages (from pytest==5.1.3->splicemachine==2.0.0) (1.6.0)\nRequirement already satisfied, skipping upgrade: sqlalchemy in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (1.3.8)\nRequirement already satisfied, skipping upgrade: python-dateutil in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (2.8.1)\nRequirement already satisfied, skipping upgrade: click>=7.0 in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (7.1.2)\nRequirement already satisfied, skipping upgrade: gitpython>=2.1.0 in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (3.1.3)\nRequirement already satisfied, skipping upgrade: sqlparse in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (0.3.1)\nRequirement already satisfied, skipping upgrade: cloudpickle in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (1.2.2)\nRequirement already satisfied, skipping upgrade: alembic in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (1.4.2)\nRequirement already satisfied, skipping upgrade: querystring-parser in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (1.2.4)\nRequirement already satisfied, skipping upgrade: Flask in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (1.1.2)\nRequirement already satisfied, skipping upgrade: protobuf>=3.6.0 in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (3.9.1)\nRequirement already satisfied, skipping upgrade: databricks-cli>=0.8.7 in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (0.11.0)\nRequirement already satisfied, skipping upgrade: entrypoints in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (0.3)\nRequirement already satisfied, skipping upgrade: six>=1.10.0 in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (1.15.0)\nRequirement already satisfied, skipping upgrade: pyyaml in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (5.3.1)\nRequirement already satisfied, skipping upgrade: docker>=4.0.0 in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (4.2.1)\nRequirement already satisfied, skipping upgrade: prometheus-flask-exporter in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (0.13.0)\nRequirement already satisfied, skipping upgrade: simplejson in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (3.17.0)\nRequirement already satisfied, skipping upgrade: gunicorn; platform_system != \"Windows\" in /opt/conda/lib/python3.7/site-packages (from mlflow==1.6.0->splicemachine==2.0.0) (20.0.4)\nRequirement already satisfied, skipping upgrade: scikit-learn>=0.18.dev0 in /opt/conda/lib/python3.7/site-packages (from mleap==0.15.0->splicemachine==2.0.0) (0.21.3)\nRequirement already satisfied, skipping upgrade: chardet<4,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests->splicemachine==2.0.0) (3.0.4)\nRequirement already satisfied, skipping upgrade: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->splicemachine==2.0.0) (2.9)\nRequirement already satisfied, skipping upgrade: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->splicemachine==2.0.0) (2020.4.5.1)\nRequirement already satisfied, skipping upgrade: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->splicemachine==2.0.0) (1.25.9)\nRequirement already satisfied, skipping upgrade: matplotlib in /opt/conda/lib/python3.7/site-packages (from pyspark-dist-explore==0.1.8->splicemachine==2.0.0) (3.1.1)\nRequirement already satisfied, skipping upgrade: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas==1.0.3->splicemachine==2.0.0) (2020.1)\nRequirement already satisfied, skipping upgrade: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging->pytest==5.1.3->splicemachine==2.0.0) (2.4.7)\nRequirement already satisfied, skipping upgrade: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=0.12; python_version < \"3.8\"->pytest==5.1.3->splicemachine==2.0.0) (3.1.0)\nRequirement already satisfied, skipping upgrade: gitdb<5,>=4.0.1 in /opt/conda/lib/python3.7/site-packages (from gitpython>=2.1.0->mlflow==1.6.0->splicemachine==2.0.0) (4.0.5)\nRequirement already satisfied, skipping upgrade: python-editor>=0.3 in /opt/conda/lib/python3.7/site-packages (from alembic->mlflow==1.6.0->splicemachine==2.0.0) (1.0.4)\nRequirement already satisfied, skipping upgrade: Mako in /opt/conda/lib/python3.7/site-packages (from alembic->mlflow==1.6.0->splicemachine==2.0.0) (1.1.0)\nRequirement already satisfied, skipping upgrade: Jinja2>=2.10.1 in /opt/conda/lib/python3.7/site-packages (from Flask->mlflow==1.6.0->splicemachine==2.0.0) (2.11.2)\nRequirement already satisfied, skipping upgrade: Werkzeug>=0.15 in /opt/conda/lib/python3.7/site-packages (from Flask->mlflow==1.6.0->splicemachine==2.0.0) (1.0.1)\nRequirement already satisfied, skipping upgrade: itsdangerous>=0.24 in /opt/conda/lib/python3.7/site-packages (from Flask->mlflow==1.6.0->splicemachine==2.0.0) (1.1.0)\nRequirement already satisfied, skipping upgrade: setuptools in /opt/conda/lib/python3.7/site-packages (from protobuf>=3.6.0->mlflow==1.6.0->splicemachine==2.0.0) (47.1.1.post20200529)\n"},{"name":"stdout","output_type":"stream","text":"Requirement already satisfied, skipping upgrade: tabulate>=0.7.7 in /opt/conda/lib/python3.7/site-packages (from databricks-cli>=0.8.7->mlflow==1.6.0->splicemachine==2.0.0) (0.8.7)\nRequirement already satisfied, skipping upgrade: websocket-client>=0.32.0 in /opt/conda/lib/python3.7/site-packages (from docker>=4.0.0->mlflow==1.6.0->splicemachine==2.0.0) (0.57.0)\nRequirement already satisfied, skipping upgrade: prometheus-client in /opt/conda/lib/python3.7/site-packages (from prometheus-flask-exporter->mlflow==1.6.0->splicemachine==2.0.0) (0.8.0)\nRequirement already satisfied, skipping upgrade: joblib>=0.11 in /opt/conda/lib/python3.7/site-packages (from scikit-learn>=0.18.dev0->mleap==0.15.0->splicemachine==2.0.0) (0.15.1)\nRequirement already satisfied, skipping upgrade: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->pyspark-dist-explore==0.1.8->splicemachine==2.0.0) (0.10.0)\nRequirement already satisfied, skipping upgrade: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->pyspark-dist-explore==0.1.8->splicemachine==2.0.0) (1.2.0)\nRequirement already satisfied, skipping upgrade: smmap<4,>=3.0.1 in /opt/conda/lib/python3.7/site-packages (from gitdb<5,>=4.0.1->gitpython>=2.1.0->mlflow==1.6.0->splicemachine==2.0.0) (3.0.4)\nRequirement already satisfied, skipping upgrade: MarkupSafe>=0.9.2 in /opt/conda/lib/python3.7/site-packages (from Mako->alembic->mlflow==1.6.0->splicemachine==2.0.0) (1.1.1)\nBuilding wheels for collected packages: splicemachine\n Building wheel for splicemachine (setup.py) ... \u001b[?25ldone\n\u001b[?25h Created wheel for splicemachine: filename=splicemachine-2.0.0-py3-none-any.whl size=54439 sha256=e7604f8e3bc139361bd06316074976078f66e39aa4b0f4949235929a21d1a00e\n Stored in directory: /tmp/pip-ephem-wheel-cache-4vcdsiuk/wheels/0c/58/ec/d4dd5d3e14310dcd82c132198196b6973a4139393dee4ddd79\nSuccessfully built splicemachine\nInstalling collected packages: pytest, splicemachine\n Attempting uninstall: pytest\n Found existing installation: pytest 5.4.3\n Uninstalling pytest-5.4.3:\n Successfully uninstalled pytest-5.4.3\n Attempting uninstall: splicemachine\n Found existing installation: splicemachine 2.0.0\n Uninstalling splicemachine-2.0.0:\n Successfully uninstalled splicemachine-2.0.0\nSuccessfully installed pytest-5.1.3 splicemachine-2.0.0\nNote: you may need to restart the kernel to use updated packages.\n"}]},{"metadata":{"scrolled":true,"trusted":false},"cell_type":"code","source":"from pyspark.sql import SparkSession\nfrom splicemachine.spark import PySpliceContext\nfrom splicemachine.mlflow_support import *\nfrom pysparkling import *\nimport h2o\n\n# Spark Session\nspark = SparkSession.builder.config('spark.driver.memoryOverhead',1000).config('spark.driver.memory','2g').getOrCreate()\n#spark.scheduler.minRegisteredResourcesRatio=1\n# Native Spark Data Source\nsplice = PySpliceContext(spark)\n# Register Splice so we can access database functions\nmlflow.register_splice_context(splice)\n# Create H2O Cluster\nconf = H2OConf().setInternalClusterMode()\nhc = H2OContext.getOrCreate(conf)","execution_count":1,"outputs":[{"name":"stdout","output_type":"stream","text":"Connecting to H2O server at http://10.128.24.161:54321 ... successful.\n"},{"data":{"text/html":"<div style=\"overflow:auto\"><table style=\"width:50%\"><tr><td>H2O cluster uptime:</td>\n<td>09 secs</td></tr>\n<tr><td>H2O cluster timezone:</td>\n<td>UTC</td></tr>\n<tr><td>H2O data parsing timezone:</td>\n<td>UTC</td></tr>\n<tr><td>H2O cluster version:</td>\n<td>3.28.1.2</td></tr>\n<tr><td>H2O cluster version age:</td>\n<td>2 months and 17 days </td></tr>\n<tr><td>H2O cluster name:</td>\n<td>sparkling-water-jovyan_spark-application-1591213257609</td></tr>\n<tr><td>H2O cluster total nodes:</td>\n<td>1</td></tr>\n<tr><td>H2O cluster free memory:</td>\n<td>3.833 Gb</td></tr>\n<tr><td>H2O cluster total cores:</td>\n<td>16</td></tr>\n<tr><td>H2O cluster allowed cores:</td>\n<td>5</td></tr>\n<tr><td>H2O cluster status:</td>\n<td>locked, healthy</td></tr>\n<tr><td>H2O connection url:</td>\n<td>http://10.128.24.161:54321</td></tr>\n<tr><td>H2O connection proxy:</td>\n<td>None</td></tr>\n<tr><td>H2O internal security:</td>\n<td>False</td></tr>\n<tr><td>H2O API Extensions:</td>\n<td>XGBoost, Algos, Amazon S3, Sparkling Water REST API Extensions, AutoML, Core V3, TargetEncoder, Core V4</td></tr>\n<tr><td>Python version:</td>\n<td>3.7.6 final</td></tr></table></div>","text/plain":"-------------------------- -------------------------------------------------------------------------------------------------------\nH2O cluster uptime: 09 secs\nH2O cluster timezone: UTC\nH2O data parsing timezone: UTC\nH2O cluster version: 3.28.1.2\nH2O cluster version age: 2 months and 17 days\nH2O cluster name: sparkling-water-jovyan_spark-application-1591213257609\nH2O cluster total nodes: 1\nH2O cluster free memory: 3.833 Gb\nH2O cluster total cores: 16\nH2O cluster allowed cores: 5\nH2O cluster status: locked, healthy\nH2O connection url: http://10.128.24.161:54321\nH2O connection proxy:\nH2O internal security: False\nH2O API Extensions: XGBoost, Algos, Amazon S3, Sparkling Water REST API Extensions, AutoML, Core V3, TargetEncoder, Core V4\nPython version: 3.7.6 final\n-------------------------- -------------------------------------------------------------------------------------------------------"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nSparkling Water Context:\n * Sparkling Water Version: 3.28.1.2-1-2.4\n * H2O name: sparkling-water-jovyan_spark-application-1591213257609\n * cluster size: 1\n * list of used nodes:\n (executorId, host, port)\n ------------------------\n (1,10.128.26.72,54321)\n ------------------------\n\n Open H2O Flow in browser: http://10.128.24.161:54321 (CMD + click in Mac OSX)\n\n \n"}]},{"metadata":{},"cell_type":"markdown","source":"# Great! Now let's import our data\n<style>\nblockquote{\n font-size: 15px;\n background: #f9f9f9;\n border-left: 10px solid #ccc;\n margin: .5em 10px;\n padding: 30em, 10px;\n quotes: \"\\201C\"\"\\201D\"\"\\2018\"\"\\2019\";\n padding: 10px 20px;\n line-height: 1.4;\n}\n\nblockquote:before {\n content: open-quote;\n display: inline;\n height: 0;\n line-height: 0;\n left: -10px;\n position: relative;\n top: 30px;\n bottom:30px;\n color: #ccc;\n font-size: 3em;\n display:none;\n\n}\n\np{\n margin: 0;\n}\n\nfooter{\n margin:0;\n text-align: right;\n font-size: 1em;\n font-style: italic;\n}\n</style>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>Importing Data</span></b> <br><br>There are a few easy ways to get data into Splice Machine, and we'll demonstrate 2 of them here. You can use the built-in <code>%%sql</code> magic to import data directly from external sources, such as S3, or you can use H2O to directly read the data from S3, create a table from that dataframe, and insert the data directly using the <code>PySpliceContext</code> you created in the cell above. </i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{},"cell_type":"markdown","source":"## Option 1: Direct Import from SQL\n<style>\nblockquote{\n font-size: 15px;\n background: #f9f9f9;\n border-left: 10px solid #ccc;\n margin: .5em 10px;\n padding: 30em, 10px;\n quotes: \"\\201C\"\"\\201D\"\"\\2018\"\"\\2019\";\n padding: 10px 20px;\n line-height: 1.4;\n}\n\nblockquote:before {\n content: open-quote;\n display: inline;\n height: 0;\n line-height: 0;\n left: -10px;\n position: relative;\n top: 30px;\n bottom:30px;\n color: #ccc;\n font-size: 3em;\n display:none;\n\n}\n\np{\n margin: 0;\n}\n\nfooter{\n margin:0;\n text-align: right;\n font-size: 1em;\n font-style: italic;\n}\n</style>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>SQL Import</span></b> <br><br>This method is simple: Create your table, point it to a an S3 location, and run the import command</i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%sql\nDROP TABLE IF EXISTS AMAZON_REVIEWS;\nCREATE TABLE AMAZON_REVIEWS(\n PRODUCTID VARCHAR(250),\n USERID VARCHAR(250),\n SUMMARY VARCHAR(500),\n SCORE INT,\n HELPFULNESSDENOMINATOR BIGINT,\n ID INT,\n PROFILENAME VARCHAR(500),\n HELPFULNESSNUMERATOR BIGINT,\n REVIEW_TIME BIGINT,\n REVIEW VARCHAR(15000),\n PRIMARY KEY(ID)\n);\n\n\n-- Import the data\ncall SYSCS_UTIL.IMPORT_DATA (\n null,\n 'AMAZON_REVIEWS',\n null,\n 's3a://splice-demo/AmazonReviews.csv',\n ',',\n null,\n null,\n null,\n null,\n -1,\n 's3a://splice-demo/bad',\n null, \n null);","execution_count":87,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"ac47520b-4fe1-40ee-89ca-60d7bcce515d","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"815f20e0-d3f9-4685-969b-06c0aa5220ed","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"f72ac661-b8cd-4649-bee8-7b5d82ade17e","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"%%sql\nselect top 10 * from AMAZON_REVIEWS","execution_count":81,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"d6e9ab38-cf68-4a11-9ed7-e055ba959df5","version_major":2}},"metadata":{}}]},{"metadata":{"scrolled":true,"trusted":false},"cell_type":"code","source":"# Get data from table into Spark Dataframe\ndf2 = splice.df('select * from REPLACE_ME_DBSCHEMA.amazon_reviews')\nhdf = hc.asH2OFrame(df2)\nhdf.describe()","execution_count":5,"outputs":[{"name":"stdout","output_type":"stream","text":"Rows:99999\nCols:10\n\n\n"},{"data":{"text/html":"<table>\n<thead>\n<tr><th> </th><th>PRODUCTID </th><th>USERID </th><th>SUMMARY </th><th>SCORE </th><th>HELPFULNESSDENOMINATOR </th><th>ID </th><th>PROFILENAME </th><th>HELPFULNESSNUMERATOR </th><th>REVIEW_TIME </th><th>REVIEW </th></tr>\n</thead>\n<tbody>\n<tr><td>type </td><td>string </td><td>string </td><td>string </td><td>int </td><td>int </td><td>int </td><td>string </td><td>int </td><td>int </td><td>string </td></tr>\n<tr><td>mins </td><td>NaN </td><td>NaN </td><td>NaN </td><td>1.0 </td><td>0.0 </td><td>3.0 </td><td>NaN </td><td>0.0 </td><td>940809600.0 </td><td>NaN </td></tr>\n<tr><td>mean </td><td>NaN </td><td>NaN </td><td>NaN </td><td>4.1861018610186 </td><td>2.2364723647236353 </td><td>284618.29231292347</td><td>NaN </td><td>1.7454674546745361 </td><td>1296171870.326685</td><td>NaN </td></tr>\n<tr><td>maxs </td><td>NaN </td><td>NaN </td><td>NaN </td><td>5.0 </td><td>878.0 </td><td>568436.0 </td><td>NaN </td><td>866.0 </td><td>1351209600.0 </td><td>NaN </td></tr>\n<tr><td>sigma </td><td>NaN </td><td>NaN </td><td>NaN </td><td>1.309542187819563</td><td>8.805400733729053 </td><td>164159.3591659324 </td><td>NaN </td><td>8.171450255812838 </td><td>48107386.40835964</td><td>NaN </td></tr>\n<tr><td>zeros </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>47593 </td><td>0 </td><td>0 </td><td>53553 </td><td>0 </td><td>0 </td></tr>\n<tr><td>missing</td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td><td>0 </td></tr>\n<tr><td>0 </td><td>B000TO7U64 </td><td>A3CM2BMS75UZ4U</td><td>walden pancake syrup review </td><td>5.0 </td><td>0.0 </td><td>275757.0 </td><td>Ronnie Stewart </td><td>0.0 </td><td>1249171200.0 </td><td>for a sugar free syrup i found this to be very good. let some frindes at work try it they liked it to. </td></tr>\n<tr><td>1 </td><td>B002ANCCK6 </td><td>APM0IV2TBRW1A </td><td>Pet Food </td><td>5.0 </td><td>1.0 </td><td>5717.0 </td><td>PetFood </td><td>1.0 </td><td>1306454400.0 </td><td>My finicky calico just LOVES this food. One of the few wet foods she will actually eat. It is loved by our other three cats as well. Will definitely be buying this again. </td></tr>\n<tr><td>2 </td><td>B0014ATRV8 </td><td>A1WZY9GOEE7IAZ</td><td>Very Good! </td><td>5.0 </td><td>0.0 </td><td>530678.0 </td><td>alidep </td><td>0.0 </td><td>1309219200.0 </td><td>The tea taste is very good, I&#x27;ve taken it yesterday for the first time. I&#x27;m taking it because of my pregnancy, many friends told me about its precious toning for the uterus. </td></tr>\n<tr><td>3 </td><td>B00141QYSQ </td><td>A1YS02UZZGRDCT</td><td>Do Not Buy </td><td>1.0 </td><td>2.0 </td><td>41471.0 </td><td>Evan Eberhardt </td><td>2.0 </td><td>1348358400.0 </td><td>These are made in China (do not buy ANY pet food from China). Dogswell has been using propylene glycol to soften their treats (what are they thinkng?). Do not purchase or support this company in any way until they clean up their act. And for whatever reason Amazon doesn&#x27;t allow returns of this item, so I had to toss mine out. Bad business all around on this one. </td></tr>\n<tr><td>4 </td><td>B002LVA88U </td><td>A1OGO9DYD7D6PN</td><td>Tasty Little Jelly Beans </td><td>4.0 </td><td>3.0 </td><td>299328.0 </td><td>Erin </td><td>3.0 </td><td>1318723200.0 </td><td>I used these Jelly Beans to help me train for my first marathon. They taste good and seem to work although I did not seem to get as much of a boost as some of my friends say they have. I would only eat 2-4 at a time so maybe I need to eat a few more to get the full energy boost.&lt;br /&gt;&lt;br /&gt;I agree with some of the other reviewers in that, although the little bag is resealable, it is still sometimes hard to open while you are jogging/running/walking.&lt;br /&gt;&lt;br /&gt;Nice product. </td></tr>\n<tr><td>5 </td><td>B001E5DYU8 </td><td>AS3T5DITRJFLS </td><td>Some of the best espresso out there! </td><td>5.0 </td><td>2.0 </td><td>172934.0 </td><td>James Petree &quot;Paladin&quot;</td><td>2.0 </td><td>1186617600.0 </td><td>I bought this to come with my Capresso 114 machine, since I hadn&#x27;t used pods before, and I was very pleased! The aroma is wonderful, these pods make excellent crema and the flavor is outstanding! I&#x27;m just glad I bought 2 tins because one is already empty. </td></tr>\n<tr><td>6 </td><td>B000E63L6U </td><td>A2SCW8P8HFJFKU</td><td>this is not the old Celestial Seasonings Red Zinger! </td><td>1.0 </td><td>11.0 </td><td>460241.0 </td><td>Ann Cardinal </td><td>11.0 </td><td>1237680000.0 </td><td>I was so excited to find my old favorite tea on amazon. I have been looking for it in supermarkets for months, but thought it had been discontinued. However I wish I had listened to the earlier review. This is not the same formula as it was, the peppermint overpowers all the other flavors in this tea. It used to have a citrusy flavor, tangy and lovely, now all you can smell or taste is peppermint and I don&#x27;t like peppermint tea. </td></tr>\n<tr><td>7 </td><td>B001EO69SS </td><td>A19Q6EHPUDOMBE</td><td>Nuitrition not included. </td><td>1.0 </td><td>2.0 </td><td>335387.0 </td><td>Voltaire </td><td>2.0 </td><td>1321747200.0 </td><td>I purchased this thinking &quot;the Italians are good with food, this should be delicious.&quot; If you know anything about food, it looses its nutrition the more it&#x27;s cooked. This product comes pre-cooked, par-boiled. Black rice is a rich, nutty, food that is so filling it&#x27;s like a whole meal. Foodies love it because it balance out the usual plate of protein, vegetables, and starch with flavorful starch. It can be eaten alone, with milk as a breakfast, with beans in the forest or as a centerpiece of exotic meal.&lt;br /&gt;However, the reason we love this stuff is because it&#x27;s high in vitamins. Not this stuff. Par-boiling then drying it out destroys much of the vitamins and flavor.&lt;br /&gt;&lt;br /&gt;Stay far, far away if you want vitamins and flavor.</td></tr>\n<tr><td>8 </td><td>B002QWHJOU </td><td>ARWGHMIEH0IRP </td><td>My puppies love them and it helps keep their teeth clean!</td><td>5.0 </td><td>0.0 </td><td>329647.0 </td><td>EddieT </td><td>0.0 </td><td>1329350400.0 </td><td>Got these large Greenies originally when we noticed our 60+lb Boxer mix puppies&#x27; teeth were getting a little dirty and brushing didn&#x27;t too much. We give them each one Greenie in the morning when we leave for work, they love them. The boy gobbles it down, but his sister savors every bit. They also keep the buildup on their teeth down quite a bit, they&#x27;re not going to make them shine, but they&#x27;ll keep them pretty clean! </td></tr>\n<tr><td>9 </td><td>B0007A0AQM </td><td>A29OX3XJ0QDI02</td><td>Doesn&#x27;t like </td><td>1.0 </td><td>6.0 </td><td>143590.0 </td><td>Mocha272 </td><td>4.0 </td><td>1316044800.0 </td><td>My dog refuses to eat it and it smells pretty bad. I can&#x27;t recommend it as treat when my dog refuses to touch it. </td></tr>\n</tbody>\n</table>"},"metadata":{},"output_type":"display_data"}]},{"metadata":{},"cell_type":"markdown","source":"## Option 2: Import from H2O\n<style>\nblockquote{\n font-size: 15px;\n background: #f9f9f9;\n border-left: 10px solid #ccc;\n margin: .5em 10px;\n padding: 30em, 10px;\n quotes: \"\\201C\"\"\\201D\"\"\\2018\"\"\\2019\";\n padding: 10px 20px;\n line-height: 1.4;\n}\n\nblockquote:before {\n content: open-quote;\n display: inline;\n height: 0;\n line-height: 0;\n left: -10px;\n position: relative;\n top: 30px;\n bottom:30px;\n color: #ccc;\n font-size: 3em;\n display:none;\n\n}\n\np{\n margin: 0;\n}\n\nfooter{\n margin:0;\n text-align: right;\n font-size: 1em;\n font-style: italic;\n}\n</style>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>H2O Import</span></b> <br><br>This method is also straightforward, and may be preferable to Data Scientists: Import your data using H2O, and then use the <code>PySpliceContext</code> to create the table from the dataframe and insert the data directly. You'll notice that this method doesn't directly involve any SQL.</i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"scrolled":true,"trusted":false},"cell_type":"code","source":"data_path = \"https://splice-demo.s3.amazonaws.com/AmazonReviews.csv\"\n# Load data into H2O\nreviews = h2o.import_file(data_path)\nreviews.head()","execution_count":6,"outputs":[{"name":"stdout","output_type":"stream","text":"Parse progress: |█████████████████████████████████████████████████████████| 100%\n"},{"data":{"text/html":"<table>\n<thead>\n<tr><th>ProductId </th><th>UserId </th><th>Summary </th><th style=\"text-align: right;\"> Score</th><th style=\"text-align: right;\"> HelpfulnessDenominator</th><th style=\"text-align: right;\"> Id</th><th>ProfileName </th><th style=\"text-align: right;\"> HelpfulnessNumerator</th><th style=\"text-align: right;\"> Time</th><th>Text </th></tr>\n</thead>\n<tbody>\n<tr><td>B00141QYSQ </td><td>A1YS02UZZGRDCT</td><td>Do Not Buy </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 41471</td><td>Evan Eberhardt </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">1.34836e+09</td><td>These are made in China (do not buy ANY pet food from China). Dogswell has been using propylene glycol to soften their treats (what are they thinkng?). Do not purchase or support this company in any way until they clean up their act. And for whatever reason Amazon doesn&#x27;t allow returns of this item, so I had to toss mine out. Bad business all around on this one. </td></tr>\n<tr><td>B0089SPEO2 </td><td>A3JOYNYL458QHP</td><td>Less lemon and less zing </td><td style=\"text-align: right;\"> 3</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\"> 28582</td><td>coleridge </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.32391e+09</td><td>Everything is ok, except it just isn&#x27;t as good as it is in the bags. Just considerably more bland -- less lemon and less zing. Boring. </td></tr>\n<tr><td>B001PMCDK2 </td><td>A14TTMM0Z03Y2W</td><td>my cat goes crazy for these! </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">389965</td><td>Lindsay S. Bradford </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.3106e+09 </td><td>Best cat treat ever. There isn&#x27;t anything comparable to the love my cat has for these treats, he snubs away any other kind now.&lt;br /&gt;I know he likes to manipulate me with his cattiness but these treats are my way of manipulating him to come sit on my lap and have some chill time. :) </td></tr>\n<tr><td>B002Q8JOSI </td><td>A17UQD2RSSQH5X</td><td>My dogs tell me these treats are YUMMY</td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">212536</td><td>in the dark </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.31613e+09</td><td>My two Corgis were thoroughly spoiled by my late husband (I spent a year and a half dieting them down a combined total of 25 pounds!)&lt;br /&gt;&lt;br /&gt;They are accustomed to the finest of fare, and they absolutely love the Wellness brand of treats. </td></tr>\n<tr><td>B00176G870 </td><td>A2F2MZW8EOGH5J</td><td>Yummy to the tummy </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">115971</td><td>daemoncycler &quot;When you arrive at a fork in th...</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.33479e+09</td><td>We used to have drive down to the specialty pet food store for this product. So glad we discovered Amazon. As far as I can tell it is no more expensive &amp; in some cases less - Prime membership is awesome. Loving Pets treats are some of the best according to my dog. They do not develop that nasty smell like some dog treats do. </td></tr>\n<tr><td>B001CHFUGY </td><td>A2M8VROSDPU4JT</td><td>Very good coffee </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">434484</td><td>Officefan &quot;Officefankt&quot; </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.27725e+09</td><td>I really liked this coffee, it was just as good as everyone claimed it was. Strong, bold and flavorful! I would recommend! </td></tr>\n<tr><td>B0041CIR62 </td><td>A16I6WJUEBJ1C3</td><td>okay but not as healthy as it appears </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">138997</td><td>doctorsirena &quot;doctorsirena&quot; </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.34369e+09</td><td>I am always looking for healthier, whole grain versions of foods I enjoy. Unfortunately, these Peacock brand noodles are yet another food masquerading as healthy. The product title in big letters on the front says &quot;Brown Rice Vermicelli&quot;, making the consumer think &quot;this is made with brown rice, so it should be a healthy choice&quot;. But the first indication that it is not is when looking at the fiber content on the nutrition facts - only 0.6g per 2oz serving. Then onto the ingredients list to see why so low... contains brown rice, sago starch and water. The sago starch comes from palms and must not have much (if any) fiber.&lt;br /&gt;&lt;br /&gt;The Annie Chun&#x27;s Maifun Brown Rice Noodles (sold on Amazon and in my local healthy grocer) has become one of my staples and is my frame of reference when comparing to the Peacock brand. The Annie Chun&#x27;s product is made with 100% whole grain, with ingredients brown rice flour and water. Per 2oz serving, it has 4g fiber and pretty much the same calories and other nutrients as the Peacock brand.&lt;br /&gt;&lt;br /&gt;If you do try this Peacock brand noodles and have not used rice noodles before, you will need to seek guidance elsewhere on preparation. As others have pointed out, the Peacock package gives almost no directions on how to prepare the product, aside from a brief mention in the recipes (in the header text it does say that they are &quot;easy-to-cook&quot; but does not say how). It also contains a very strange recipe for rice noodles: Aglio Olio style - this is an Italian recipe for noodles with olive oil/garlic/sprinkled with grated cheese that I think would not be very tasty. The second recipe appears to be for a soup with veggie strips. Neither recipe gives amounts or much direction. In comparison, the Annie Chun&#x27;s package gives clear, specific directions on rice noodle preparation and two recipes.&lt;br /&gt;&lt;br /&gt;I use rice noodles = maifun = rice sticks = sometimes called vermicelli for making the Vietnamese salad &quot;bun tofu&quot;, to serve with stir-fried veggies or in lettuce rolls. They can also be used in spring rolls/egg rolls. When cooking with thin rice noodles, be careful not to oversoak/overcook/overmix or they tend to disintegrate. Asian rice noodle vermicelli (maifun) are not the same as Italian vermicelli and are not readily interchangeable. If making an Italian recipe, the best results would be expected from Italian pasta and not maifun.&lt;br /&gt;&lt;br /&gt;A few final notes... Both Peacock and Annie Chun&#x27;s brown rice noodles are gluten free. The Peacock is made in Singapore and the Annie Chun&#x27;s in Thailand. The Peacock noodles do taste fine (kind of bland), but so do the Annie Chun&#x27;s. At this time, they are both approximately the same price. Peacock come in an plastic bag with some noodle crushage upon shipping; Annie Chun&#x27;s are perfect upon removal from their cellophane bag in a box. Overall, I highly recommend the Annie Chun&#x27;s Maifun as a healthier option over the Peacock brand. On a related note, the Annie Chun&#x27;s soba and brown rice pad thai noodles are also excellent.&lt;br /&gt;&lt;br /&gt;Rating for this product: 2.5 stars rounded down to 2 stars.</td></tr>\n<tr><td>B001R3BQFW </td><td>AM50E42AFUVNL </td><td>Taste great. </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">126555</td><td>T. Higley &quot;Tina&quot; </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.32356e+09</td><td>I have tried many different drink mix, this is the best tasting by far. It does not have the after taste of the sweetener and I really like it, it is pretty strong, so I use a big water bottle (20 oz) for one tube, it still a little stronger than I like, but it is just my taste. </td></tr>\n<tr><td>B005HGAV8I </td><td>A2I5KDNOESGJ1H</td><td>variety galore </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">438837</td><td>TJ </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.33402e+09</td><td>This is my favorite item to order for my Keurig. There are so many flavors, my finicky palate never gets bored! The only downside is there are probably 5-6 decaf varieties. I don&#x27;t drink decaf (I REQUIRE copious amounts of caffeine), so they sit on the shelf... </td></tr>\n<tr><td>B000GFYRHQ </td><td>A3A7YUR6FS6ZCI</td><td>Bigelow Earl Grey Green Tea </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">245379</td><td>Tea Lover </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.17841e+09</td><td>Tastes like Earl Grey, but it&#x27;s green tea so it&#x27;s healthier. </td></tr>\n</tbody>\n</table>"},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":""},"execution_count":6,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## H2O offers great functions to convert H2OFrames into Pandas and Spark DataFrames"},{"metadata":{"trusted":false},"cell_type":"code","source":"# Spark DataFrame\ndf = hc.asSparkFrame(reviews, copyMetadata=False)\ndf.limit(100).show()\ndel df._h2o_frame\nprint(type(df))\n# Pandas DataFrame\npdf = reviews.head().as_data_frame()\ndisplay(pdf)\nprint(type(pdf))","execution_count":7,"outputs":[{"name":"stdout","output_type":"stream","text":"+----------+--------------+--------------------+-----+----------------------+------+--------------------+--------------------+----------+--------------------+\n| ProductId| UserId| Summary|Score|HelpfulnessDenominator| Id| ProfileName|HelpfulnessNumerator| Time| Text|\n+----------+--------------+--------------------+-----+----------------------+------+--------------------+--------------------+----------+--------------------+\n|B00141QYSQ|A1YS02UZZGRDCT| Do Not Buy| 1| 2| 41471| Evan Eberhardt| 2|1348358400|These are made in...|\n|B0089SPEO2|A3JOYNYL458QHP|Less lemon and le...| 3| 0| 28582| coleridge| 0|1323907200|Everything is ok,...|\n|B001PMCDK2|A14TTMM0Z03Y2W|my cat goes crazy...| 5| 0|389965| Lindsay S. Bradford| 0|1310601600|Best cat treat ev...|\n|B002Q8JOSI|A17UQD2RSSQH5X|My dogs tell me t...| 5| 1|212536| in the dark| 1|1316131200|My two Corgis wer...|\n|B00176G870|A2F2MZW8EOGH5J| Yummy to the tummy| 5| 0|115971|daemoncycler \"Whe...| 0|1334793600|We used to have d...|\n|B001CHFUGY|A2M8VROSDPU4JT| Very good coffee| 5| 1|434484|Officefan \"Office...| 1|1277251200|I really liked th...|\n|B0041CIR62|A16I6WJUEBJ1C3|okay but not as h...| 2| 1|138997|doctorsirena \"doc...| 1|1343692800|I am always looki...|\n|B001R3BQFW| AM50E42AFUVNL| Taste great.| 5| 0|126555| T. Higley \"Tina\"| 0|1323561600|I have tried many...|\n|B005HGAV8I|A2I5KDNOESGJ1H| variety galore| 5| 1|438837| TJ| 1|1334016000|This is my favori...|\n|B000GFYRHQ|A3A7YUR6FS6ZCI|Bigelow Earl Grey...| 5| 0|245379| Tea Lover| 0|1178409600|Tastes like Earl ...|\n|B001KUUNHE|A35R32TA60XD57|Best simple cat food| 5| 0| 64358| M. Torma| 0|1338940800|My cat that can e...|\n|B000RHXLDO|A1P8ZVG6MJX9O1|Organic is Better...| 5| 5| 75530| Mark Emdee| 4|1277164800|UPDATE: I now no ...|\n|B00013UQOG|A2AJNKK1S0W8F0| Lots of Roses!| 4| 1|312517| James R. Kelley| 1|1331337600|Much more roses t...|\n|B003E6COLK|A1S4RN19B0G9TQ|The best Gummy Be...| 5| 0|444328|S. Moon \"Moons Girl\"| 0|1313971200|I recommend these...|\n|B009PFJUF2|A16HJRHRHNSUZ6|Looks very pretty...| 5| 1|286942| Danielle Tietz| 1|1350432000|My neighborhood r...|\n|B000VK08OC|A15DZG6P6YB869|Fresh and delicious!| 5| 3|292099| P. Banks| 3|1313107200|We purchased this...|\n|B005K4Q4KG|A13L66J35SMYE5| Not good| 1| 2|243065| Elizabeth Ramsoram| 1|1330646400|This does not tas...|\n|B000LKZD4W|A2MQ5M9IJAXX6C|Great high-protei...| 5| 0|152409| Carol Mathis| 0|1238025600|This is a great p...|\n|B00213ERI0|A20GEEXSF3DULO|Gluten Free yummi...| 5| 0| 11177| Vicki \"Noah's Mom\"| 0|1274832000|to all of you glu...|\n|B0000D9MXL|A16OEO4JDUIP48|A fantastic Cheddar!| 5| 2|213880|D. Kennedy \"zibed...| 2|1234569600|The epitome of an...|\n+----------+--------------+--------------------+-----+----------------------+------+--------------------+--------------------+----------+--------------------+\nonly showing top 20 rows\n\n<class 'pyspark.sql.dataframe.DataFrame'>\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>ProductId</th>\n <th>UserId</th>\n <th>Summary</th>\n <th>Score</th>\n <th>HelpfulnessDenominator</th>\n <th>Id</th>\n <th>ProfileName</th>\n <th>HelpfulnessNumerator</th>\n <th>Time</th>\n <th>Text</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>B00141QYSQ</td>\n <td>A1YS02UZZGRDCT</td>\n <td>Do Not Buy</td>\n <td>1</td>\n <td>2</td>\n <td>41471</td>\n <td>Evan Eberhardt</td>\n <td>2</td>\n <td>1348358400</td>\n <td>These are made in China (do not buy ANY pet fo...</td>\n </tr>\n <tr>\n <th>1</th>\n <td>B0089SPEO2</td>\n <td>A3JOYNYL458QHP</td>\n <td>Less lemon and less zing</td>\n <td>3</td>\n <td>0</td>\n <td>28582</td>\n <td>coleridge</td>\n <td>0</td>\n <td>1323907200</td>\n <td>Everything is ok, except it just isn't as good...</td>\n </tr>\n <tr>\n <th>2</th>\n <td>B001PMCDK2</td>\n <td>A14TTMM0Z03Y2W</td>\n <td>my cat goes crazy for these!</td>\n <td>5</td>\n <td>0</td>\n <td>389965</td>\n <td>Lindsay S. Bradford</td>\n <td>0</td>\n <td>1310601600</td>\n <td>Best cat treat ever. There isn't anything comp...</td>\n </tr>\n <tr>\n <th>3</th>\n <td>B002Q8JOSI</td>\n <td>A17UQD2RSSQH5X</td>\n <td>My dogs tell me these treats are YUMMY</td>\n <td>5</td>\n <td>1</td>\n <td>212536</td>\n <td>in the dark</td>\n <td>1</td>\n <td>1316131200</td>\n <td>My two Corgis were thoroughly spoiled by my la...</td>\n </tr>\n <tr>\n <th>4</th>\n <td>B00176G870</td>\n <td>A2F2MZW8EOGH5J</td>\n <td>Yummy to the tummy</td>\n <td>5</td>\n <td>0</td>\n <td>115971</td>\n <td>daemoncycler \"When you arrive at a fork in th...</td>\n <td>0</td>\n <td>1334793600</td>\n <td>We used to have drive down to the specialty pe...</td>\n </tr>\n <tr>\n <th>5</th>\n <td>B001CHFUGY</td>\n <td>A2M8VROSDPU4JT</td>\n <td>Very good coffee</td>\n <td>5</td>\n <td>1</td>\n <td>434484</td>\n <td>Officefan \"Officefankt\"</td>\n <td>1</td>\n <td>1277251200</td>\n <td>I really liked this coffee, it was just as goo...</td>\n </tr>\n <tr>\n <th>6</th>\n <td>B0041CIR62</td>\n <td>A16I6WJUEBJ1C3</td>\n <td>okay but not as healthy as it appears</td>\n <td>2</td>\n <td>1</td>\n <td>138997</td>\n <td>doctorsirena \"doctorsirena\"</td>\n <td>1</td>\n <td>1343692800</td>\n <td>I am always looking for healthier, whole grain...</td>\n </tr>\n <tr>\n <th>7</th>\n <td>B001R3BQFW</td>\n <td>AM50E42AFUVNL</td>\n <td>Taste great.</td>\n <td>5</td>\n <td>0</td>\n <td>126555</td>\n <td>T. Higley \"Tina\"</td>\n <td>0</td>\n <td>1323561600</td>\n <td>I have tried many different drink mix, this is...</td>\n </tr>\n <tr>\n <th>8</th>\n <td>B005HGAV8I</td>\n <td>A2I5KDNOESGJ1H</td>\n <td>variety galore</td>\n <td>5</td>\n <td>1</td>\n <td>438837</td>\n <td>TJ</td>\n <td>1</td>\n <td>1334016000</td>\n <td>This is my favorite item to order for my Keuri...</td>\n </tr>\n <tr>\n <th>9</th>\n <td>B000GFYRHQ</td>\n <td>A3A7YUR6FS6ZCI</td>\n <td>Bigelow Earl Grey Green Tea</td>\n <td>5</td>\n <td>0</td>\n <td>245379</td>\n <td>Tea Lover</td>\n <td>0</td>\n <td>1178409600</td>\n <td>Tastes like Earl Grey, but it's green tea so i...</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" ProductId UserId Summary Score \\\n0 B00141QYSQ A1YS02UZZGRDCT Do Not Buy 1 \n1 B0089SPEO2 A3JOYNYL458QHP Less lemon and less zing 3 \n2 B001PMCDK2 A14TTMM0Z03Y2W my cat goes crazy for these! 5 \n3 B002Q8JOSI A17UQD2RSSQH5X My dogs tell me these treats are YUMMY 5 \n4 B00176G870 A2F2MZW8EOGH5J Yummy to the tummy 5 \n5 B001CHFUGY A2M8VROSDPU4JT Very good coffee 5 \n6 B0041CIR62 A16I6WJUEBJ1C3 okay but not as healthy as it appears 2 \n7 B001R3BQFW AM50E42AFUVNL Taste great. 5 \n8 B005HGAV8I A2I5KDNOESGJ1H variety galore 5 \n9 B000GFYRHQ A3A7YUR6FS6ZCI Bigelow Earl Grey Green Tea 5 \n\n HelpfulnessDenominator Id \\\n0 2 41471 \n1 0 28582 \n2 0 389965 \n3 1 212536 \n4 0 115971 \n5 1 434484 \n6 1 138997 \n7 0 126555 \n8 1 438837 \n9 0 245379 \n\n ProfileName HelpfulnessNumerator \\\n0 Evan Eberhardt 2 \n1 coleridge 0 \n2 Lindsay S. Bradford 0 \n3 in the dark 1 \n4 daemoncycler \"When you arrive at a fork in th... 0 \n5 Officefan \"Officefankt\" 1 \n6 doctorsirena \"doctorsirena\" 1 \n7 T. Higley \"Tina\" 0 \n8 TJ 1 \n9 Tea Lover 0 \n\n Time Text \n0 1348358400 These are made in China (do not buy ANY pet fo... \n1 1323907200 Everything is ok, except it just isn't as good... \n2 1310601600 Best cat treat ever. There isn't anything comp... \n3 1316131200 My two Corgis were thoroughly spoiled by my la... \n4 1334793600 We used to have drive down to the specialty pe... \n5 1277251200 I really liked this coffee, it was just as goo... \n6 1343692800 I am always looking for healthier, whole grain... \n7 1323561600 I have tried many different drink mix, this is... \n8 1334016000 This is my favorite item to order for my Keuri... \n9 1178409600 Tastes like Earl Grey, but it's green tea so i... "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"<class 'pandas.core.frame.DataFrame'>\n"}]},{"metadata":{},"cell_type":"markdown","source":"## Nice!\n<style>\nblockquote{\n font-size: 15px;\n background: #f9f9f9;\n border-left: 10px solid #ccc;\n margin: .5em 10px;\n padding: 30em, 10px;\n quotes: \"\\201C\"\"\\201D\"\"\\2018\"\"\\2019\";\n padding: 10px 20px;\n line-height: 1.4;\n}\n\nblockquote:before {\n content: open-quote;\n display: inline;\n height: 0;\n line-height: 0;\n left: -10px;\n position: relative;\n top: 30px;\n bottom:30px;\n color: #ccc;\n font-size: 3em;\n display:none;\n\n}\n\np{\n margin: 0;\n}\n\nfooter{\n margin:0;\n text-align: right;\n font-size: 1em;\n font-style: italic;\n}\n</style>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>Create Table and Insert Data</span></b> <br><br>Now that we have our Spark DataFrame, we can create a table and insert data using <code>splice.createTable</code> and <code>splice.insert</code><br><b>Note: </b>If your code is hanging on the <code>insert</code> your cluser may be out of memory. Try configuring your Spark or H2O cluster with more memory. Read about that <a href=https://docs.h2o.ai/sparkling-water/2.1/latest-stable/doc/configuration/configuration_properties.html>here</a> and <a href=https://spark.apache.org/docs/latest/configuration.html#available-properties>here</a></footer></blockquote><br>\n"},{"metadata":{"trusted":false},"cell_type":"code","source":"help(splice.createTable)\nprint('----------------------------------------------------------------------------------------------------------------')\nhelp(splice.insert)","execution_count":8,"outputs":[{"name":"stdout","output_type":"stream","text":"Help on method createTable in module splicemachine.spark.context:\n\ncreateTable(dataframe, schema_table_name, primary_keys=None, create_table_options=None, to_upper=False, drop_table=False) method of splicemachine.spark.context.PySpliceContext instance\n Creates a schema.table from a dataframe\n :param dataframe: The Spark DataFrame to base the table off\n :param schema_table_name: str The schema.table to create\n :param primary_keys: List[str] the primary keys. Default None\n :param create_table_options: str The additional table-level SQL options default None\n :param to_upper: bool If the dataframe columns should be converted to uppercase before table creation\n If False, the table will be created with lower case columns. Default False\n :param drop_table: bool whether to drop the table if it exists. Default False. If False and the table exists,\n the function will throw an exception.\n\n----------------------------------------------------------------------------------------------------------------\nHelp on method insert in module splicemachine.spark.context:\n\ninsert(dataframe, schema_table_name, to_upper=False) method of splicemachine.spark.context.PySpliceContext instance\n Insert a dataframe into a table (schema.table).\n \n :param dataframe: (DF) The dataframe you would like to insert\n :param schema_table_name: (string) The table in which you would like to insert the DF\n :param to_upper: bool If the dataframe columns should be converted to uppercase before table creation\n If False, the table will be created with lower case columns. Default False\n\n"}]},{"metadata":{"scrolled":true,"trusted":false},"cell_type":"code","source":"# Create the table\nschema = 'splice'\ndf = df.withColumnRenamed('Time', 'Review_Time')\ndf = df.withColumnRenamed('Text', 'Review')\nsplice.createTable(df, f'{schema}.AMAZON_REVIEWS_H2O', to_upper=True, drop_table=True)","execution_count":9,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Now we can insert our dataframe"},{"metadata":{"trusted":false},"cell_type":"code","source":"print('Inserting... ', end='')\nsplice.insert(df, f'{schema}.AMAZON_REVIEWS_H2O',to_upper=True) # Use to_upper to give the SQL table uppercase columns\nprint('Done.')","execution_count":11,"outputs":[{"name":"stdout","output_type":"stream","text":"Inserting... Done.\n"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"%%sql\nselect top 10 varchar(Summary) Summary, Score, HelpfulnessDenominator, Id from AMAZON_REVIEWS_H2O;","execution_count":12,"outputs":[{"data":{"application/vnd.jupyter.widget-view+json":{"model_id":"85df984e-6b47-4232-894a-6e64552e15ba","version_major":2,"version_minor":0},"method":"display_data"},"metadata":{},"output_type":"display_data"}]},{"metadata":{},"cell_type":"markdown","source":"# Awesome! Let's get modeling\n<style>\nblockquote{\n font-size: 15px;\n background: #f9f9f9;\n border-left: 10px solid #ccc;\n margin: .5em 10px;\n padding: 30em, 10px;\n quotes: \"\\201C\"\"\\201D\"\"\\2018\"\"\\2019\";\n padding: 10px 20px;\n line-height: 1.4;\n}\n\nblockquote:before {\n content: open-quote;\n display: inline;\n height: 0;\n line-height: 0;\n left: -10px;\n position: relative;\n top: 30px;\n bottom:30px;\n color: #ccc;\n font-size: 3em;\n display:none;\n\n}\n\np{\n margin: 0;\n}\n\nfooter{\n margin:0;\n text-align: right;\n font-size: 1em;\n font-style: italic;\n}\n</style>\n<blockquote><p class='quotation'><b><br><span style='font-size:25px'>Modeling</span></b> <br><br>We're going to try three different ways to approach this problem, and track it all with MLManager. \n <ol>\n <li>No Text Model: We will try to predict the customer reviews without using the text from the review. Just the Numeric Columns</li>\n <li>Using the reviews: We will use Word2Vec to create vectors from the text of the reviews. We will then train a model on that word embedding feature-vector</li>\n <li>Using the review summaries: We will use Word2Vec to create vectors from the text of the review summaries</li>\n </ol>\n Which do you think will perform the best?\n </i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{},"cell_type":"markdown","source":"## First Attempt\nLet's create a simple model using the non-review columns\n<br>\n<blockquote>\n First, let's start our mlflow experiment! We can start a run and log import parameters, tags, and metrics as they come<br>\nNext, we can turn this into a binary-classification problem by turning score into a positive or negative review. We will say that 4 and 5 start reviews are positive, but you can change this and try other things!\n</br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"trusted":false},"cell_type":"code","source":"# Set our mlflow experiment\nmlflow.set_experiment('Sentiment Analysis')\n# Look at our dataframe\nreviews[\"PositiveReview\"] = (reviews[\"Score\"] >= 4).ifelse(\"1\", \"0\")\nreviews[\"PositiveReview\"].table()\n","execution_count":13,"outputs":[{"name":"stdout","output_type":"stream","text":"INFO: 'Sentiment Analysis' does not exist. Creating a new experiment\n"},{"data":{"text/html":"<table>\n<thead>\n<tr><th style=\"text-align: right;\"> PositiveReview</th><th style=\"text-align: right;\"> Count</th></tr>\n</thead>\n<tbody>\n<tr><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\"> 21791</td></tr>\n<tr><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 78209</td></tr>\n</tbody>\n</table>"},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":""},"execution_count":13,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## Let's see our Data Correlation"},{"metadata":{"trusted":false},"cell_type":"code","source":"import pandas as pd\nimport matplotlib.pyplot as plt\nimport numpy as np\nimport seaborn as sns\nfrom pyspark.sql.types import FloatType, IntegerType\n\n%matplotlib inline\n\npdf = reviews[['ProductId', 'UserId', 'HelpfulnessNumerator', 'HelpfulnessDenominator', 'Time','PositiveReview']].as_data_frame()\ncorr = pdf.corr()\n\nticks = [i for i in range(len(corr.columns))]\n# Generate a mask for the upper triangle\n\n# Set up the matplotlib figure\nf, ax = plt.subplots(figsize=(11, 9))\n\n# Color Scheme\ncmap = \"coolwarm\"\n\n# Draw the heatmap with the mask and correct aspect ratio\nsns.heatmap(corr, cmap=cmap, vmax=.3, center=0,\n square=False, linewidths=.5, cbar_kws={\"shrink\": .5})\n\nplt.xticks(ticks, corr.columns)\nplt.yticks(ticks, corr.columns)\nplt.title('Sentiment Data correlation heatmap')\nplt.show()\n","execution_count":14,"outputs":[{"name":"stderr","output_type":"stream","text":"PILLOW_VERSION is deprecated and will be removed in a future release. Use __version__ instead.\npandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.\n"},{"data":{"image/png":"\n","text/plain":"<Figure size 792x648 with 2 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"}]},{"metadata":{},"cell_type":"markdown","source":"## We can see some of our features have decent correlation (remember that we aren't using the reviews yet). Let's try a basic model\n### First, let's log some important information in our <code>run</code>\n<blockquote>H2O Has a <a href=https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html>lot</a> of algorithms, so we'll use a Gradient Boosting Estimator<br>We'll log some things like our feature vector, label, train/test/validation split, training time, and even the model and notebook themselves directly to <a href='/mlflow'>mlflow</a></blockquote>"},{"metadata":{"trusted":false},"cell_type":"code","source":"%%time\nfrom h2o.estimators import H2OGradientBoostingEstimator\nfrom splicemachine.mlflow_support.utilities import get_user\n\nRATIOS = [0.7,0.15]\n\n# Start our run to keep track of important information\nmlflow.start_run(run_name='simple_run')\n\npredictors = ['ProductId', 'UserId', 'HelpfulnessNumerator', 'HelpfulnessDenominator', 'Time']\nresponse = 'PositiveReview'\n\n# lp is short for log_param\n# lm is short for log_metric\nmlflow.lp('predictors', predictors)\nmlflow.lp('label', response)\nmlflow.lp('source data table', f'{get_user()}.AMAZON_REVIEWS')\n\n# Train Test Split\ntrain,test,valid = reviews.split_frame(ratios=RATIOS)\n# Log our ratios\nmlflow.lp('ratios',RATIOS)\n\ngbm_baseline = H2OGradientBoostingEstimator(stopping_metric = \"AUC\", stopping_tolerance = 0.001,\n stopping_rounds = 5, score_tree_interval = 10,\n model_id = \"gbm_baseline.hex\"\n )\n\nmlflow.lp('model_type', gbm_baseline.__class__)\n\n# Code block to time training\nwith mlflow.timer('train_time'):\n gbm_baseline.train(x = predictors, y = response, \n training_frame = train, validation_frame = test\n )\n# Log the model params to mlflow\nmlflow.log_params(gbm_baseline.get_params())\n# Log the model to MLFlow\nmlflow.log_model(gbm_baseline, 'baseline_model')\n# Log the training notebook to MLFlow\nmlflow.log_artifact('MLManager H2O Demo.ipynb', 'training_notebook')\ngbm_baseline","execution_count":15,"outputs":[{"name":"stdout","output_type":"stream","text":"Starting Code Block train_time... gbm Model Build progress: |███████████████████████████████████████████████| 100%\nDone.\nCode Block train_time:\nRan in 3.391 secs\nRan in 0.057 mins\nSaving artifact of size: 1579.965 KB to Splice Machine DB\nSaving artifact of size: 777.102 KB to Splice Machine DB\nCPU times: user 577 ms, sys: 74.8 ms, total: 651 ms\nWall time: 9.24 s\nModel Details\n=============\nH2OGradientBoostingEstimator : Gradient Boosting Machine\nModel Key: gbm_baseline.hex\n\n\nModel Summary: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>number_of_trees</th>\n <th>number_of_internal_trees</th>\n <th>model_size_in_bytes</th>\n <th>min_depth</th>\n <th>max_depth</th>\n <th>mean_depth</th>\n <th>min_leaves</th>\n <th>max_leaves</th>\n <th>mean_leaves</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>50.0</td>\n <td>50.0</td>\n <td>24560.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>20.0</td>\n <td>32.0</td>\n <td>29.62</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" number_of_trees number_of_internal_trees model_size_in_bytes \\\n0 50.0 50.0 24560.0 \n\n min_depth max_depth mean_depth min_leaves max_leaves mean_leaves \n0 5.0 5.0 5.0 20.0 32.0 29.62 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n\nModelMetricsBinomial: gbm\n** Reported on train data. **\n\nMSE: 0.1374345189782943\nRMSE: 0.370721619248587\nLogLoss: 0.43951689132292565\nMean Per-Class Error: 0.31278195917903695\nAUC: 0.7540414210595864\nAUCPR: 0.9015791977529423\nGini: 0.5080828421191728\n\nConfusion Matrix (Act/Pred) for max f1 @ threshold = 0.5357817933564483: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>Error</th>\n <th>Rate</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>4288.0</td>\n <td>10939.0</td>\n <td>0.7184</td>\n <td>(10939.0/15227.0)</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>1586.0</td>\n <td>53052.0</td>\n <td>0.029</td>\n <td>(1586.0/54638.0)</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Total</td>\n <td>5874.0</td>\n <td>63991.0</td>\n <td>0.1793</td>\n <td>(12525.0/69865.0)</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" 0 1 Error Rate\n0 0 4288.0 10939.0 0.7184 (10939.0/15227.0)\n1 1 1586.0 53052.0 0.029 (1586.0/54638.0)\n2 Total 5874.0 63991.0 0.1793 (12525.0/69865.0)"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nMaximum Metrics: Maximum metrics at their respective thresholds\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>metric</th>\n <th>threshold</th>\n <th>value</th>\n <th>idx</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>max f1</td>\n <td>0.535782</td>\n <td>0.894419</td>\n <td>262.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>max f2</td>\n <td>0.227611</td>\n <td>0.949461</td>\n <td>357.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>max f0point5</td>\n <td>0.743701</td>\n <td>0.864134</td>\n <td>173.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>max accuracy</td>\n <td>0.587676</td>\n <td>0.821957</td>\n <td>240.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>max precision</td>\n <td>0.950400</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>max recall</td>\n <td>0.079913</td>\n <td>1.000000</td>\n <td>398.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>max specificity</td>\n <td>0.950400</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>max absolute_mcc</td>\n <td>0.627212</td>\n <td>0.391565</td>\n <td>222.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>max min_per_class_accuracy</td>\n <td>0.815919</td>\n <td>0.680108</td>\n <td>115.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>max mean_per_class_accuracy</td>\n <td>0.784375</td>\n <td>0.687218</td>\n <td>146.0</td>\n </tr>\n <tr>\n <th>10</th>\n <td>max tns</td>\n <td>0.950400</td>\n <td>15227.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>11</th>\n <td>max fns</td>\n <td>0.950400</td>\n <td>54632.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>12</th>\n <td>max fps</td>\n <td>0.067846</td>\n <td>15227.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>13</th>\n <td>max tps</td>\n <td>0.079913</td>\n <td>54638.000000</td>\n <td>398.0</td>\n </tr>\n <tr>\n <th>14</th>\n <td>max tnr</td>\n <td>0.950400</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>15</th>\n <td>max fnr</td>\n <td>0.950400</td>\n <td>0.999890</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>16</th>\n <td>max fpr</td>\n <td>0.067846</td>\n <td>1.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>17</th>\n <td>max tpr</td>\n <td>0.079913</td>\n <td>1.000000</td>\n <td>398.0</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" metric threshold value idx\n0 max f1 0.535782 0.894419 262.0\n1 max f2 0.227611 0.949461 357.0\n2 max f0point5 0.743701 0.864134 173.0\n3 max accuracy 0.587676 0.821957 240.0\n4 max precision 0.950400 1.000000 0.0\n5 max recall 0.079913 1.000000 398.0\n6 max specificity 0.950400 1.000000 0.0\n7 max absolute_mcc 0.627212 0.391565 222.0\n8 max min_per_class_accuracy 0.815919 0.680108 115.0\n9 max mean_per_class_accuracy 0.784375 0.687218 146.0\n10 max tns 0.950400 15227.000000 0.0\n11 max fns 0.950400 54632.000000 0.0\n12 max fps 0.067846 15227.000000 399.0\n13 max tps 0.079913 54638.000000 398.0\n14 max tnr 0.950400 1.000000 0.0\n15 max fnr 0.950400 0.999890 0.0\n16 max fpr 0.067846 1.000000 399.0\n17 max tpr 0.079913 1.000000 398.0"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nGains/Lift Table: Avg response rate: 78.21 %, avg score: 78.19 %\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>group</th>\n <th>cumulative_data_fraction</th>\n <th>lower_threshold</th>\n <th>lift</th>\n <th>cumulative_lift</th>\n <th>response_rate</th>\n <th>score</th>\n <th>cumulative_response_rate</th>\n <th>cumulative_score</th>\n <th>capture_rate</th>\n <th>cumulative_capture_rate</th>\n <th>gain</th>\n <th>cumulative_gain</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>1</td>\n <td>0.010005</td>\n <td>0.920836</td>\n <td>1.249420</td>\n <td>1.249420</td>\n <td>0.977110</td>\n <td>0.928429</td>\n <td>0.977110</td>\n <td>0.928429</td>\n <td>0.012500</td>\n <td>0.012500</td>\n <td>24.941984</td>\n <td>24.941984</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2</td>\n <td>0.020010</td>\n <td>0.914269</td>\n <td>1.242103</td>\n <td>1.245761</td>\n <td>0.971388</td>\n <td>0.917213</td>\n <td>0.974249</td>\n <td>0.922821</td>\n <td>0.012427</td>\n <td>0.024928</td>\n <td>24.210259</td>\n <td>24.576122</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>3</td>\n <td>0.030001</td>\n <td>0.911259</td>\n <td>1.232891</td>\n <td>1.241475</td>\n <td>0.964183</td>\n <td>0.912706</td>\n <td>0.970897</td>\n <td>0.919453</td>\n <td>0.012317</td>\n <td>0.037245</td>\n <td>23.289051</td>\n <td>24.147508</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>4</td>\n <td>0.041065</td>\n <td>0.908798</td>\n <td>1.214175</td>\n <td>1.234120</td>\n <td>0.949547</td>\n <td>0.909789</td>\n <td>0.965145</td>\n <td>0.916849</td>\n <td>0.013434</td>\n <td>0.050679</td>\n <td>21.417542</td>\n <td>23.411968</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>5</td>\n <td>0.050025</td>\n <td>0.906579</td>\n <td>1.196983</td>\n <td>1.227468</td>\n <td>0.936102</td>\n <td>0.907585</td>\n <td>0.959943</td>\n <td>0.915190</td>\n <td>0.010725</td>\n <td>0.061404</td>\n <td>19.698347</td>\n <td>22.746810</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>6</td>\n <td>0.100222</td>\n <td>0.899416</td>\n <td>1.189724</td>\n <td>1.208564</td>\n <td>0.930425</td>\n <td>0.903090</td>\n <td>0.945159</td>\n <td>0.909129</td>\n <td>0.059720</td>\n <td>0.121124</td>\n <td>18.972388</td>\n <td>20.856364</td>\n </tr>\n <tr>\n <th>6</th>\n <td></td>\n <td>7</td>\n <td>0.150390</td>\n <td>0.890206</td>\n <td>1.172162</td>\n <td>1.196420</td>\n <td>0.916690</td>\n <td>0.895343</td>\n <td>0.935662</td>\n <td>0.904530</td>\n <td>0.058805</td>\n <td>0.179930</td>\n <td>17.216182</td>\n <td>19.642047</td>\n </tr>\n <tr>\n <th>7</th>\n <td></td>\n <td>8</td>\n <td>0.200029</td>\n <td>0.877669</td>\n <td>1.168813</td>\n <td>1.189569</td>\n <td>0.914072</td>\n <td>0.883597</td>\n <td>0.930304</td>\n <td>0.899336</td>\n <td>0.058018</td>\n <td>0.237948</td>\n <td>16.881303</td>\n <td>18.956947</td>\n </tr>\n <tr>\n <th>8</th>\n <td></td>\n <td>9</td>\n <td>0.301939</td>\n <td>0.855812</td>\n <td>1.136273</td>\n <td>1.171581</td>\n <td>0.888624</td>\n <td>0.865949</td>\n <td>0.916236</td>\n <td>0.888067</td>\n <td>0.115799</td>\n <td>0.353746</td>\n <td>13.627306</td>\n <td>17.158083</td>\n </tr>\n <tr>\n <th>9</th>\n <td></td>\n <td>10</td>\n <td>0.401245</td>\n <td>0.840171</td>\n <td>1.112817</td>\n <td>1.157037</td>\n <td>0.870280</td>\n <td>0.846429</td>\n <td>0.904862</td>\n <td>0.877762</td>\n <td>0.110509</td>\n <td>0.464256</td>\n <td>11.281682</td>\n <td>15.703709</td>\n </tr>\n <tr>\n <th>10</th>\n <td></td>\n <td>11</td>\n <td>0.500265</td>\n <td>0.830372</td>\n <td>1.084427</td>\n <td>1.142665</td>\n <td>0.848077</td>\n <td>0.835059</td>\n <td>0.893623</td>\n <td>0.869309</td>\n <td>0.107379</td>\n <td>0.571635</td>\n <td>8.442719</td>\n <td>14.266510</td>\n </tr>\n <tr>\n <th>11</th>\n <td></td>\n <td>12</td>\n <td>0.601789</td>\n <td>0.815788</td>\n <td>1.072815</td>\n <td>1.130881</td>\n <td>0.838996</td>\n <td>0.823077</td>\n <td>0.884407</td>\n <td>0.861510</td>\n <td>0.108917</td>\n <td>0.680552</td>\n <td>7.281506</td>\n <td>13.088111</td>\n </tr>\n <tr>\n <th>12</th>\n <td></td>\n <td>13</td>\n <td>0.699993</td>\n <td>0.795914</td>\n <td>1.019820</td>\n <td>1.115300</td>\n <td>0.797551</td>\n <td>0.807826</td>\n <td>0.872222</td>\n <td>0.853978</td>\n <td>0.100150</td>\n <td>0.780702</td>\n <td>1.982003</td>\n <td>11.530008</td>\n </tr>\n <tr>\n <th>13</th>\n <td></td>\n <td>14</td>\n <td>0.800000</td>\n <td>0.761463</td>\n <td>0.985873</td>\n <td>1.099121</td>\n <td>0.771003</td>\n <td>0.781952</td>\n <td>0.859568</td>\n <td>0.844974</td>\n <td>0.098594</td>\n <td>0.879296</td>\n <td>-1.412671</td>\n <td>9.912058</td>\n </tr>\n <tr>\n <th>14</th>\n <td></td>\n <td>15</td>\n <td>0.899993</td>\n <td>0.586119</td>\n <td>0.822746</td>\n <td>1.068414</td>\n <td>0.643430</td>\n <td>0.676802</td>\n <td>0.835555</td>\n <td>0.826290</td>\n <td>0.082269</td>\n <td>0.961565</td>\n <td>-17.725361</td>\n <td>6.841429</td>\n </tr>\n <tr>\n <th>15</th>\n <td></td>\n <td>16</td>\n <td>1.000000</td>\n <td>0.060736</td>\n <td>0.384320</td>\n <td>1.000000</td>\n <td>0.300558</td>\n <td>0.382875</td>\n <td>0.782051</td>\n <td>0.781945</td>\n <td>0.038435</td>\n <td>1.000000</td>\n <td>-61.567961</td>\n <td>0.000000</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" group cumulative_data_fraction lower_threshold lift \\\n0 1 0.010005 0.920836 1.249420 \n1 2 0.020010 0.914269 1.242103 \n2 3 0.030001 0.911259 1.232891 \n3 4 0.041065 0.908798 1.214175 \n4 5 0.050025 0.906579 1.196983 \n5 6 0.100222 0.899416 1.189724 \n6 7 0.150390 0.890206 1.172162 \n7 8 0.200029 0.877669 1.168813 \n8 9 0.301939 0.855812 1.136273 \n9 10 0.401245 0.840171 1.112817 \n10 11 0.500265 0.830372 1.084427 \n11 12 0.601789 0.815788 1.072815 \n12 13 0.699993 0.795914 1.019820 \n13 14 0.800000 0.761463 0.985873 \n14 15 0.899993 0.586119 0.822746 \n15 16 1.000000 0.060736 0.384320 \n\n cumulative_lift response_rate score cumulative_response_rate \\\n0 1.249420 0.977110 0.928429 0.977110 \n1 1.245761 0.971388 0.917213 0.974249 \n2 1.241475 0.964183 0.912706 0.970897 \n3 1.234120 0.949547 0.909789 0.965145 \n4 1.227468 0.936102 0.907585 0.959943 \n5 1.208564 0.930425 0.903090 0.945159 \n6 1.196420 0.916690 0.895343 0.935662 \n7 1.189569 0.914072 0.883597 0.930304 \n8 1.171581 0.888624 0.865949 0.916236 \n9 1.157037 0.870280 0.846429 0.904862 \n10 1.142665 0.848077 0.835059 0.893623 \n11 1.130881 0.838996 0.823077 0.884407 \n12 1.115300 0.797551 0.807826 0.872222 \n13 1.099121 0.771003 0.781952 0.859568 \n14 1.068414 0.643430 0.676802 0.835555 \n15 1.000000 0.300558 0.382875 0.782051 \n\n cumulative_score capture_rate cumulative_capture_rate gain \\\n0 0.928429 0.012500 0.012500 24.941984 \n1 0.922821 0.012427 0.024928 24.210259 \n2 0.919453 0.012317 0.037245 23.289051 \n3 0.916849 0.013434 0.050679 21.417542 \n4 0.915190 0.010725 0.061404 19.698347 \n5 0.909129 0.059720 0.121124 18.972388 \n6 0.904530 0.058805 0.179930 17.216182 \n7 0.899336 0.058018 0.237948 16.881303 \n8 0.888067 0.115799 0.353746 13.627306 \n9 0.877762 0.110509 0.464256 11.281682 \n10 0.869309 0.107379 0.571635 8.442719 \n11 0.861510 0.108917 0.680552 7.281506 \n12 0.853978 0.100150 0.780702 1.982003 \n13 0.844974 0.098594 0.879296 -1.412671 \n14 0.826290 0.082269 0.961565 -17.725361 \n15 0.781945 0.038435 1.000000 -61.567961 \n\n cumulative_gain \n0 24.941984 \n1 24.576122 \n2 24.147508 \n3 23.411968 \n4 22.746810 \n5 20.856364 \n6 19.642047 \n7 18.956947 \n8 17.158083 \n9 15.703709 \n10 14.266510 \n11 13.088111 \n12 11.530008 \n13 9.912058 \n14 6.841429 \n15 0.000000 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n\nModelMetricsBinomial: gbm\n** Reported on validation data. **\n\nMSE: 0.14223001006483346\nRMSE: 0.3771339418095824\nLogLoss: 0.4525533321024546\nMean Per-Class Error: 0.32591627240012655\nAUC: 0.7316894311747509\nAUCPR: 0.8839714249918257\nGini: 0.4633788623495019\n\nConfusion Matrix (Act/Pred) for max f1 @ threshold = 0.45593713188449025: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>Error</th>\n <th>Rate</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>677.0</td>\n <td>2672.0</td>\n <td>0.7979</td>\n <td>(2672.0/3349.0)</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>203.0</td>\n <td>11587.0</td>\n <td>0.0172</td>\n <td>(203.0/11790.0)</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Total</td>\n <td>880.0</td>\n <td>14259.0</td>\n <td>0.1899</td>\n <td>(2875.0/15139.0)</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" 0 1 Error Rate\n0 0 677.0 2672.0 0.7979 (2672.0/3349.0)\n1 1 203.0 11587.0 0.0172 (203.0/11790.0)\n2 Total 880.0 14259.0 0.1899 (2875.0/15139.0)"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nMaximum Metrics: Maximum metrics at their respective thresholds\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>metric</th>\n <th>threshold</th>\n <th>value</th>\n <th>idx</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>max f1</td>\n <td>0.455937</td>\n <td>0.889631</td>\n <td>293.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>max f2</td>\n <td>0.325802</td>\n <td>0.948460</td>\n <td>334.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>max f0point5</td>\n <td>0.735794</td>\n <td>0.859811</td>\n <td>181.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>max accuracy</td>\n <td>0.587274</td>\n <td>0.813990</td>\n <td>246.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>max precision</td>\n <td>0.934944</td>\n <td>0.966667</td>\n <td>4.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>max recall</td>\n <td>0.094890</td>\n <td>1.000000</td>\n <td>395.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>max specificity</td>\n <td>0.950525</td>\n <td>0.999701</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>max absolute_mcc</td>\n <td>0.674930</td>\n <td>0.375372</td>\n <td>205.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>max min_per_class_accuracy</td>\n <td>0.818016</td>\n <td>0.661578</td>\n <td>117.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>max mean_per_class_accuracy</td>\n <td>0.783111</td>\n <td>0.674084</td>\n <td>151.0</td>\n </tr>\n <tr>\n <th>10</th>\n <td>max tns</td>\n <td>0.950525</td>\n <td>3348.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>11</th>\n <td>max fns</td>\n <td>0.950525</td>\n <td>11789.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>12</th>\n <td>max fps</td>\n <td>0.056394</td>\n <td>3349.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>13</th>\n <td>max tps</td>\n <td>0.094890</td>\n <td>11790.000000</td>\n <td>395.0</td>\n </tr>\n <tr>\n <th>14</th>\n <td>max tnr</td>\n <td>0.950525</td>\n <td>0.999701</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>15</th>\n <td>max fnr</td>\n <td>0.950525</td>\n <td>0.999915</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>16</th>\n <td>max fpr</td>\n <td>0.056394</td>\n <td>1.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>17</th>\n <td>max tpr</td>\n <td>0.094890</td>\n <td>1.000000</td>\n <td>395.0</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" metric threshold value idx\n0 max f1 0.455937 0.889631 293.0\n1 max f2 0.325802 0.948460 334.0\n2 max f0point5 0.735794 0.859811 181.0\n3 max accuracy 0.587274 0.813990 246.0\n4 max precision 0.934944 0.966667 4.0\n5 max recall 0.094890 1.000000 395.0\n6 max specificity 0.950525 0.999701 0.0\n7 max absolute_mcc 0.674930 0.375372 205.0\n8 max min_per_class_accuracy 0.818016 0.661578 117.0\n9 max mean_per_class_accuracy 0.783111 0.674084 151.0\n10 max tns 0.950525 3348.000000 0.0\n11 max fns 0.950525 11789.000000 0.0\n12 max fps 0.056394 3349.000000 399.0\n13 max tps 0.094890 11790.000000 395.0\n14 max tnr 0.950525 0.999701 0.0\n15 max fnr 0.950525 0.999915 0.0\n16 max fpr 0.056394 1.000000 399.0\n17 max tpr 0.094890 1.000000 395.0"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nGains/Lift Table: Avg response rate: 77.88 %, avg score: 78.37 %\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>group</th>\n <th>cumulative_data_fraction</th>\n <th>lower_threshold</th>\n <th>lift</th>\n <th>cumulative_lift</th>\n <th>response_rate</th>\n <th>score</th>\n <th>cumulative_response_rate</th>\n <th>cumulative_score</th>\n <th>capture_rate</th>\n <th>cumulative_capture_rate</th>\n <th>gain</th>\n <th>cumulative_gain</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>1</td>\n <td>0.010040</td>\n <td>0.920851</td>\n <td>1.174234</td>\n <td>1.174234</td>\n <td>0.914474</td>\n <td>0.928367</td>\n <td>0.914474</td>\n <td>0.928367</td>\n <td>0.011790</td>\n <td>0.011790</td>\n <td>17.423385</td>\n <td>17.423385</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2</td>\n <td>0.020015</td>\n <td>0.913837</td>\n <td>1.207521</td>\n <td>1.190823</td>\n <td>0.940397</td>\n <td>0.917088</td>\n <td>0.927393</td>\n <td>0.922746</td>\n <td>0.012044</td>\n <td>0.023834</td>\n <td>20.752125</td>\n <td>19.082262</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>3</td>\n <td>0.030055</td>\n <td>0.910840</td>\n <td>1.174234</td>\n <td>1.185281</td>\n <td>0.914474</td>\n <td>0.912427</td>\n <td>0.923077</td>\n <td>0.919299</td>\n <td>0.011790</td>\n <td>0.035623</td>\n <td>17.423385</td>\n <td>18.528088</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>4</td>\n <td>0.040756</td>\n <td>0.908798</td>\n <td>1.181013</td>\n <td>1.184160</td>\n <td>0.919753</td>\n <td>0.909681</td>\n <td>0.922204</td>\n <td>0.916774</td>\n <td>0.012638</td>\n <td>0.048261</td>\n <td>18.101289</td>\n <td>18.416027</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>5</td>\n <td>0.050334</td>\n <td>0.906383</td>\n <td>1.142366</td>\n <td>1.176207</td>\n <td>0.889655</td>\n <td>0.907578</td>\n <td>0.916010</td>\n <td>0.915024</td>\n <td>0.010941</td>\n <td>0.059203</td>\n <td>14.236553</td>\n <td>17.620720</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>6</td>\n <td>0.100007</td>\n <td>0.900004</td>\n <td>1.188433</td>\n <td>1.182280</td>\n <td>0.925532</td>\n <td>0.903123</td>\n <td>0.920740</td>\n <td>0.909113</td>\n <td>0.059033</td>\n <td>0.118236</td>\n <td>18.843322</td>\n <td>18.227984</td>\n </tr>\n <tr>\n <th>6</th>\n <td></td>\n <td>7</td>\n <td>0.150010</td>\n <td>0.890719</td>\n <td>1.151748</td>\n <td>1.172102</td>\n <td>0.896962</td>\n <td>0.895875</td>\n <td>0.912814</td>\n <td>0.904700</td>\n <td>0.057591</td>\n <td>0.175827</td>\n <td>15.174750</td>\n <td>17.210239</td>\n </tr>\n <tr>\n <th>7</th>\n <td></td>\n <td>8</td>\n <td>0.200013</td>\n <td>0.878214</td>\n <td>1.136481</td>\n <td>1.163197</td>\n <td>0.885073</td>\n <td>0.884291</td>\n <td>0.905878</td>\n <td>0.899598</td>\n <td>0.056828</td>\n <td>0.232655</td>\n <td>13.648133</td>\n <td>16.319713</td>\n </tr>\n <tr>\n <th>8</th>\n <td></td>\n <td>9</td>\n <td>0.302860</td>\n <td>0.855812</td>\n <td>1.143031</td>\n <td>1.156349</td>\n <td>0.890173</td>\n <td>0.866197</td>\n <td>0.900545</td>\n <td>0.888255</td>\n <td>0.117557</td>\n <td>0.350212</td>\n <td>14.303098</td>\n <td>15.634899</td>\n </tr>\n <tr>\n <th>9</th>\n <td></td>\n <td>10</td>\n <td>0.400555</td>\n <td>0.840171</td>\n <td>1.103471</td>\n <td>1.143452</td>\n <td>0.859364</td>\n <td>0.846380</td>\n <td>0.890501</td>\n <td>0.878042</td>\n <td>0.107803</td>\n <td>0.458015</td>\n <td>10.347058</td>\n <td>14.345203</td>\n </tr>\n <tr>\n <th>10</th>\n <td></td>\n <td>11</td>\n <td>0.500033</td>\n <td>0.830563</td>\n <td>1.082835</td>\n <td>1.131393</td>\n <td>0.843293</td>\n <td>0.835017</td>\n <td>0.881110</td>\n <td>0.869482</td>\n <td>0.107718</td>\n <td>0.565734</td>\n <td>8.283462</td>\n <td>13.139261</td>\n </tr>\n <tr>\n <th>11</th>\n <td></td>\n <td>12</td>\n <td>0.600634</td>\n <td>0.816120</td>\n <td>1.066532</td>\n <td>1.120529</td>\n <td>0.830598</td>\n <td>0.823761</td>\n <td>0.872649</td>\n <td>0.861825</td>\n <td>0.107294</td>\n <td>0.673028</td>\n <td>6.653228</td>\n <td>12.052906</td>\n </tr>\n <tr>\n <th>12</th>\n <td></td>\n <td>13</td>\n <td>0.699980</td>\n <td>0.799444</td>\n <td>1.033049</td>\n <td>1.108113</td>\n <td>0.804521</td>\n <td>0.809260</td>\n <td>0.862980</td>\n <td>0.854364</td>\n <td>0.102629</td>\n <td>0.775657</td>\n <td>3.304899</td>\n <td>10.811328</td>\n </tr>\n <tr>\n <th>13</th>\n <td></td>\n <td>14</td>\n <td>0.799987</td>\n <td>0.765323</td>\n <td>0.999086</td>\n <td>1.094484</td>\n <td>0.778071</td>\n <td>0.784247</td>\n <td>0.852366</td>\n <td>0.845599</td>\n <td>0.099915</td>\n <td>0.875573</td>\n <td>-0.091417</td>\n <td>9.448372</td>\n </tr>\n <tr>\n <th>14</th>\n <td></td>\n <td>15</td>\n <td>0.899993</td>\n <td>0.588424</td>\n <td>0.825221</td>\n <td>1.064563</td>\n <td>0.642668</td>\n <td>0.681231</td>\n <td>0.829064</td>\n <td>0.827334</td>\n <td>0.082528</td>\n <td>0.958100</td>\n <td>-17.477885</td>\n <td>6.456346</td>\n </tr>\n <tr>\n <th>15</th>\n <td></td>\n <td>16</td>\n <td>1.000000</td>\n <td>0.056394</td>\n <td>0.418971</td>\n <td>1.000000</td>\n <td>0.326288</td>\n <td>0.390840</td>\n <td>0.778783</td>\n <td>0.783682</td>\n <td>0.041900</td>\n <td>1.000000</td>\n <td>-58.102852</td>\n <td>0.000000</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" group cumulative_data_fraction lower_threshold lift \\\n0 1 0.010040 0.920851 1.174234 \n1 2 0.020015 0.913837 1.207521 \n2 3 0.030055 0.910840 1.174234 \n3 4 0.040756 0.908798 1.181013 \n4 5 0.050334 0.906383 1.142366 \n5 6 0.100007 0.900004 1.188433 \n6 7 0.150010 0.890719 1.151748 \n7 8 0.200013 0.878214 1.136481 \n8 9 0.302860 0.855812 1.143031 \n9 10 0.400555 0.840171 1.103471 \n10 11 0.500033 0.830563 1.082835 \n11 12 0.600634 0.816120 1.066532 \n12 13 0.699980 0.799444 1.033049 \n13 14 0.799987 0.765323 0.999086 \n14 15 0.899993 0.588424 0.825221 \n15 16 1.000000 0.056394 0.418971 \n\n cumulative_lift response_rate score cumulative_response_rate \\\n0 1.174234 0.914474 0.928367 0.914474 \n1 1.190823 0.940397 0.917088 0.927393 \n2 1.185281 0.914474 0.912427 0.923077 \n3 1.184160 0.919753 0.909681 0.922204 \n4 1.176207 0.889655 0.907578 0.916010 \n5 1.182280 0.925532 0.903123 0.920740 \n6 1.172102 0.896962 0.895875 0.912814 \n7 1.163197 0.885073 0.884291 0.905878 \n8 1.156349 0.890173 0.866197 0.900545 \n9 1.143452 0.859364 0.846380 0.890501 \n10 1.131393 0.843293 0.835017 0.881110 \n11 1.120529 0.830598 0.823761 0.872649 \n12 1.108113 0.804521 0.809260 0.862980 \n13 1.094484 0.778071 0.784247 0.852366 \n14 1.064563 0.642668 0.681231 0.829064 \n15 1.000000 0.326288 0.390840 0.778783 \n\n cumulative_score capture_rate cumulative_capture_rate gain \\\n0 0.928367 0.011790 0.011790 17.423385 \n1 0.922746 0.012044 0.023834 20.752125 \n2 0.919299 0.011790 0.035623 17.423385 \n3 0.916774 0.012638 0.048261 18.101289 \n4 0.915024 0.010941 0.059203 14.236553 \n5 0.909113 0.059033 0.118236 18.843322 \n6 0.904700 0.057591 0.175827 15.174750 \n7 0.899598 0.056828 0.232655 13.648133 \n8 0.888255 0.117557 0.350212 14.303098 \n9 0.878042 0.107803 0.458015 10.347058 \n10 0.869482 0.107718 0.565734 8.283462 \n11 0.861825 0.107294 0.673028 6.653228 \n12 0.854364 0.102629 0.775657 3.304899 \n13 0.845599 0.099915 0.875573 -0.091417 \n14 0.827334 0.082528 0.958100 -17.477885 \n15 0.783682 0.041900 1.000000 -58.102852 \n\n cumulative_gain \n0 17.423385 \n1 19.082262 \n2 18.528088 \n3 18.416027 \n4 17.620720 \n5 18.227984 \n6 17.210239 \n7 16.319713 \n8 15.634899 \n9 14.345203 \n10 13.139261 \n11 12.052906 \n12 10.811328 \n13 9.448372 \n14 6.456346 \n15 0.000000 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n\nScoring History: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>timestamp</th>\n <th>duration</th>\n <th>number_of_trees</th>\n <th>training_rmse</th>\n <th>training_logloss</th>\n <th>training_auc</th>\n <th>training_pr_auc</th>\n <th>training_lift</th>\n <th>training_classification_error</th>\n <th>validation_rmse</th>\n <th>validation_logloss</th>\n <th>validation_auc</th>\n <th>validation_pr_auc</th>\n <th>validation_lift</th>\n <th>validation_classification_error</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>2020-06-03 19:45:43</td>\n <td>0.079 sec</td>\n <td>0.0</td>\n <td>0.412852</td>\n <td>0.524300</td>\n <td>0.500000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.217949</td>\n <td>0.415079</td>\n <td>0.528475</td>\n <td>0.500000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.221217</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2020-06-03 19:45:44</td>\n <td>1.041 sec</td>\n <td>10.0</td>\n <td>0.385338</td>\n <td>0.468347</td>\n <td>0.722202</td>\n <td>0.886199</td>\n <td>1.227541</td>\n <td>0.191756</td>\n <td>0.388970</td>\n <td>0.475106</td>\n <td>0.714339</td>\n <td>0.877976</td>\n <td>1.225307</td>\n <td>0.197966</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>2020-06-03 19:45:45</td>\n <td>1.574 sec</td>\n <td>20.0</td>\n <td>0.377800</td>\n <td>0.453883</td>\n <td>0.731858</td>\n <td>0.890685</td>\n <td>1.241394</td>\n <td>0.184985</td>\n <td>0.381771</td>\n <td>0.461558</td>\n <td>0.721626</td>\n <td>0.879695</td>\n <td>1.200674</td>\n <td>0.193672</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>2020-06-03 19:45:45</td>\n <td>1.862 sec</td>\n <td>30.0</td>\n <td>0.374872</td>\n <td>0.448219</td>\n <td>0.736972</td>\n <td>0.892789</td>\n <td>1.242155</td>\n <td>0.183525</td>\n <td>0.379027</td>\n <td>0.456469</td>\n <td>0.725341</td>\n <td>0.880711</td>\n <td>1.168075</td>\n <td>0.190964</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>2020-06-03 19:45:45</td>\n <td>2.150 sec</td>\n <td>40.0</td>\n <td>0.372867</td>\n <td>0.444067</td>\n <td>0.743907</td>\n <td>0.896736</td>\n <td>1.249503</td>\n <td>0.182438</td>\n <td>0.377871</td>\n <td>0.454143</td>\n <td>0.728786</td>\n <td>0.882622</td>\n <td>1.174234</td>\n <td>0.190237</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>2020-06-03 19:45:46</td>\n <td>2.319 sec</td>\n <td>50.0</td>\n <td>0.370722</td>\n <td>0.439517</td>\n <td>0.754041</td>\n <td>0.901579</td>\n <td>1.249420</td>\n <td>0.179274</td>\n <td>0.377134</td>\n <td>0.452553</td>\n <td>0.731689</td>\n <td>0.883971</td>\n <td>1.174234</td>\n <td>0.189907</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" timestamp duration number_of_trees training_rmse \\\n0 2020-06-03 19:45:43 0.079 sec 0.0 0.412852 \n1 2020-06-03 19:45:44 1.041 sec 10.0 0.385338 \n2 2020-06-03 19:45:45 1.574 sec 20.0 0.377800 \n3 2020-06-03 19:45:45 1.862 sec 30.0 0.374872 \n4 2020-06-03 19:45:45 2.150 sec 40.0 0.372867 \n5 2020-06-03 19:45:46 2.319 sec 50.0 0.370722 \n\n training_logloss training_auc training_pr_auc training_lift \\\n0 0.524300 0.500000 0.000000 1.000000 \n1 0.468347 0.722202 0.886199 1.227541 \n2 0.453883 0.731858 0.890685 1.241394 \n3 0.448219 0.736972 0.892789 1.242155 \n4 0.444067 0.743907 0.896736 1.249503 \n5 0.439517 0.754041 0.901579 1.249420 \n\n training_classification_error validation_rmse validation_logloss \\\n0 0.217949 0.415079 0.528475 \n1 0.191756 0.388970 0.475106 \n2 0.184985 0.381771 0.461558 \n3 0.183525 0.379027 0.456469 \n4 0.182438 0.377871 0.454143 \n5 0.179274 0.377134 0.452553 \n\n validation_auc validation_pr_auc validation_lift \\\n0 0.500000 0.000000 1.000000 \n1 0.714339 0.877976 1.225307 \n2 0.721626 0.879695 1.200674 \n3 0.725341 0.880711 1.168075 \n4 0.728786 0.882622 1.174234 \n5 0.731689 0.883971 1.174234 \n\n validation_classification_error \n0 0.221217 \n1 0.197966 \n2 0.193672 \n3 0.190964 \n4 0.190237 \n5 0.189907 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nVariable Importances: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>variable</th>\n <th>relative_importance</th>\n <th>scaled_importance</th>\n <th>percentage</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>HelpfulnessNumerator</td>\n <td>4752.182129</td>\n <td>1.000000</td>\n <td>0.398990</td>\n </tr>\n <tr>\n <th>1</th>\n <td>HelpfulnessDenominator</td>\n <td>4481.750977</td>\n <td>0.943093</td>\n <td>0.376284</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Time</td>\n <td>1361.051514</td>\n <td>0.286406</td>\n <td>0.114273</td>\n </tr>\n <tr>\n <th>3</th>\n <td>ProductId</td>\n <td>844.423889</td>\n <td>0.177692</td>\n <td>0.070897</td>\n </tr>\n <tr>\n <th>4</th>\n <td>UserId</td>\n <td>471.133972</td>\n <td>0.099141</td>\n <td>0.039556</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" variable relative_importance scaled_importance percentage\n0 HelpfulnessNumerator 4752.182129 1.000000 0.398990\n1 HelpfulnessDenominator 4481.750977 0.943093 0.376284\n2 Time 1361.051514 0.286406 0.114273\n3 ProductId 844.423889 0.177692 0.070897\n4 UserId 471.133972 0.099141 0.039556"},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":""},"execution_count":15,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## You can see above that H2O gives you loads of details about your model. Let's inspect it a bit more and log some results to MLFlow"},{"metadata":{"trusted":false},"cell_type":"code","source":"%matplotlib inline \n\n# Print and Log Model params\nparams = dict(zip(gbm_baseline.summary().col_header[1:],\n gbm_baseline.summary().cell_values[0][1:]))\nprint(gbm_baseline.summary())\nmlflow.log_params(params)\n","execution_count":16,"outputs":[{"name":"stdout","output_type":"stream","text":"\nModel Summary: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>number_of_trees</th>\n <th>number_of_internal_trees</th>\n <th>model_size_in_bytes</th>\n <th>min_depth</th>\n <th>max_depth</th>\n <th>mean_depth</th>\n <th>min_leaves</th>\n <th>max_leaves</th>\n <th>mean_leaves</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>50.0</td>\n <td>50.0</td>\n <td>24560.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>20.0</td>\n <td>32.0</td>\n <td>29.62</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" number_of_trees number_of_internal_trees model_size_in_bytes \\\n0 50.0 50.0 24560.0 \n\n min_depth max_depth mean_depth min_leaves max_leaves mean_leaves \n0 5.0 5.0 5.0 20.0 32.0 29.62 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"from IPython.core.display import HTML\n#Plot and Log Scoring history\ngbm_baseline.plot()\nprint(\"AUC on Validation Data: \" + str(round(gbm_baseline.auc(valid = True), 3)))\n# Log training and validation metrics over time\nfor step, row in gbm_baseline.scoring_history().iterrows():\n row_dict = row.to_dict()\n for r in row_dict:\n if 'train' in r or 'valid' in r:\n mlflow.log_metric(r, row_dict[r],step=step)\n\ncur_run = mlflow.current_run_id()\ncur_exp = mlflow.current_exp_id()\nlink = f'/mlflow/#/metric/training_auc?runs=[\"{cur_run}\"]&experiment={cur_exp}&plot_metric_keys=[\\\"training_logloss\\\",\\\"validation_logloss\\\",\\\"training_rmse\\\",\\\"validation_rmse\\\"]'\nHTML(f'<font size=\"+2\">See your metrics plot <a href={link}>here</a></font>') \n\n","execution_count":17,"outputs":[{"data":{"image/png":"\n","text/plain":"<Figure size 432x288 with 1 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"AUC on Validation Data: 0.732\n"},{"data":{"text/html":"<font size=\"+2\">See your metrics plot <a href=/mlflow/#/metric/training_auc?runs=[\"a92e93f393df\"]&experiment=1&plot_metric_keys=[\"training_logloss\",\"validation_logloss\",\"training_rmse\",\"validation_rmse\"]>here</a></font>","text/plain":"<IPython.core.display.HTML object>"},"execution_count":17,"metadata":{},"output_type":"execute_result"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"# Print and Log Confusion Matrix\nprint(gbm_baseline.confusion_matrix(valid = True))\nmlflow.lm('fpr', gbm_baseline.fpr(valid=True)[0][0])\nmlflow.lm('tpr', gbm_baseline.tpr(valid=True)[0][0])\nmlflow.lm('fnr', gbm_baseline.fnr(valid=True)[0][0])\nmlflow.lm('tnr', gbm_baseline.fnr(valid=True)[0][0])\nmlflow.lm('F0point5', gbm_baseline.F0point5(valid=True)[0][1])\nmlflow.lm('F1', gbm_baseline.F1(valid=True)[0][1])\nmlflow.lm('F2', gbm_baseline.F2(valid=True)[0][1])\nmlflow.lm('auc', gbm_baseline.auc(valid = True))\nmlflow.lp('threshold', gbm_baseline.F1(valid=True)[0][0]) # First element is the threshold","execution_count":18,"outputs":[{"name":"stdout","output_type":"stream","text":"\nConfusion Matrix (Act/Pred) for max f1 @ threshold = 0.45593713188449025: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>Error</th>\n <th>Rate</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>677.0</td>\n <td>2672.0</td>\n <td>0.7979</td>\n <td>(2672.0/3349.0)</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>203.0</td>\n <td>11587.0</td>\n <td>0.0172</td>\n <td>(203.0/11790.0)</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Total</td>\n <td>880.0</td>\n <td>14259.0</td>\n <td>0.1899</td>\n <td>(2875.0/15139.0)</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" 0 1 Error Rate\n0 0 677.0 2672.0 0.7979 (2672.0/3349.0)\n1 1 203.0 11587.0 0.0172 (203.0/11790.0)\n2 Total 880.0 14259.0 0.1899 (2875.0/15139.0)"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"# Plot and Log Variable Importance\ngbm_baseline.varimp_plot()\nfor var in gbm_baseline.varimp():\n mlflow.lm(f'varimp_{var[0]}',var[-1])","execution_count":19,"outputs":[{"data":{"image/png":"\n","text/plain":"<Figure size 1008x720 with 1 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"# Partial Dependence Plot\npdp_helpfulness = gbm_baseline.partial_plot(train, cols = [\"HelpfulnessNumerator\"])","execution_count":20,"outputs":[{"name":"stdout","output_type":"stream","text":"PartialDependencePlot progress: |█████████████████████████████████████████| 100%\n"},{"data":{"image/png":"\n","text/plain":"<Figure size 504x720 with 1 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"mlflow.end_run()","execution_count":21,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"# There is room for improvement \n## Let's now Tokenize words in the Review\n### But first, let's start our new run"},{"metadata":{"trusted":false},"cell_type":"code","source":"import pandas as pd\n# Start new mlflow run\nmlflow.start_run(run_name='review_tokenizer')\nmlflow.lp('source data table', f'{get_user()}.AMAZON_REVIEWS')\n# Get common stop words from H2O\ndata_path = \"https://splice-demo.s3.amazonaws.com/stop_words.csv\"\nSTOP_WORDS = pd.read_csv(data_path, header=0)\nSTOP_WORDS = list(STOP_WORDS['STOP_WORD'])\nprint(STOP_WORDS)","execution_count":22,"outputs":[{"name":"stdout","output_type":"stream","text":"['all', 'just', 'being', 'over', 'both', 'through', 'yourselves', 'its', 'before', 'o', 'hadn', 'herself', 'll', 'had', 'should', 'to', 'only', 'won', 'under', 'ours', 'has', 'do', 'them', 'his', 'very', 'they', 'not', 'during', 'now', 'him', 'nor', 'd', 'did', 'didn', 'this', 'she', 'each', 'further', 'where', 'few', 'because', 'doing', 'some', 'hasn', 'are', 'our', 'ourselves', 'out', 'what', 'for', 'while', 're', 'does', 'above', 'between', 'mustn', 't', 'be', 'we', 'who', 'were', 'here', 'shouldn', 'hers', 'by', 'on', 'about', 'couldn', 'of', 'against', 's', 'isn', 'or', 'own', 'into', 'yourself', 'down', 'mightn', 'wasn', 'your', 'from', 'her', 'their', 'aren', 'there', 'been', 'whom', 'too', 'wouldn', 'themselves', 'weren', 'was', 'until', 'more', 'himself', 'that', 'but', 'don', 'with', 'than', 'those', 'he', 'me', 'myself', 'ma', 'these', 'up', 'will', 'below', 'ain', 'can', 'theirs', 'my', 'and', 've', 'then', 'is', 'am', 'it', 'doesn', 'an', 'as', 'itself', 'at', 'have', 'in', 'any', 'if', 'again', 'no', 'when', 'same', 'how', 'other', 'which', 'you', 'shan', 'needn', 'haven', 'after', 'most', 'such', 'why', 'a', 'off', 'i', 'm', 'yours', 'so', 'y', 'the', 'having', 'once']\n"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"# Inspect our reviews before tokenization\nreviews['Text']","execution_count":23,"outputs":[{"data":{"text/html":"<table>\n<thead>\n<tr><th>Text </th></tr>\n</thead>\n<tbody>\n<tr><td>These are made in China (do not buy ANY pet food from China). Dogswell has been using propylene glycol to soften their treats (what are they thinkng?). Do not purchase or support this company in any way until they clean up their act. And for whatever reason Amazon doesn&#x27;t allow returns of this item, so I had to toss mine out. Bad business all around on this one. </td></tr>\n<tr><td>Everything is ok, except it just isn&#x27;t as good as it is in the bags. Just considerably more bland -- less lemon and less zing. Boring. </td></tr>\n<tr><td>Best cat treat ever. There isn&#x27;t anything comparable to the love my cat has for these treats, he snubs away any other kind now.&lt;br /&gt;I know he likes to manipulate me with his cattiness but these treats are my way of manipulating him to come sit on my lap and have some chill time. :) </td></tr>\n<tr><td>My two Corgis were thoroughly spoiled by my late husband (I spent a year and a half dieting them down a combined total of 25 pounds!)&lt;br /&gt;&lt;br /&gt;They are accustomed to the finest of fare, and they absolutely love the Wellness brand of treats. </td></tr>\n<tr><td>We used to have drive down to the specialty pet food store for this product. So glad we discovered Amazon. As far as I can tell it is no more expensive &amp; in some cases less - Prime membership is awesome. Loving Pets treats are some of the best according to my dog. They do not develop that nasty smell like some dog treats do. </td></tr>\n<tr><td>I really liked this coffee, it was just as good as everyone claimed it was. Strong, bold and flavorful! I would recommend! </td></tr>\n<tr><td>I am always looking for healthier, whole grain versions of foods I enjoy. Unfortunately, these Peacock brand noodles are yet another food masquerading as healthy. The product title in big letters on the front says &quot;Brown Rice Vermicelli&quot;, making the consumer think &quot;this is made with brown rice, so it should be a healthy choice&quot;. But the first indication that it is not is when looking at the fiber content on the nutrition facts - only 0.6g per 2oz serving. Then onto the ingredients list to see why so low... contains brown rice, sago starch and water. The sago starch comes from palms and must not have much (if any) fiber.&lt;br /&gt;&lt;br /&gt;The Annie Chun&#x27;s Maifun Brown Rice Noodles (sold on Amazon and in my local healthy grocer) has become one of my staples and is my frame of reference when comparing to the Peacock brand. The Annie Chun&#x27;s product is made with 100% whole grain, with ingredients brown rice flour and water. Per 2oz serving, it has 4g fiber and pretty much the same calories and other nutrients as the Peacock brand.&lt;br /&gt;&lt;br /&gt;If you do try this Peacock brand noodles and have not used rice noodles before, you will need to seek guidance elsewhere on preparation. As others have pointed out, the Peacock package gives almost no directions on how to prepare the product, aside from a brief mention in the recipes (in the header text it does say that they are &quot;easy-to-cook&quot; but does not say how). It also contains a very strange recipe for rice noodles: Aglio Olio style - this is an Italian recipe for noodles with olive oil/garlic/sprinkled with grated cheese that I think would not be very tasty. The second recipe appears to be for a soup with veggie strips. Neither recipe gives amounts or much direction. In comparison, the Annie Chun&#x27;s package gives clear, specific directions on rice noodle preparation and two recipes.&lt;br /&gt;&lt;br /&gt;I use rice noodles = maifun = rice sticks = sometimes called vermicelli for making the Vietnamese salad &quot;bun tofu&quot;, to serve with stir-fried veggies or in lettuce rolls. They can also be used in spring rolls/egg rolls. When cooking with thin rice noodles, be careful not to oversoak/overcook/overmix or they tend to disintegrate. Asian rice noodle vermicelli (maifun) are not the same as Italian vermicelli and are not readily interchangeable. If making an Italian recipe, the best results would be expected from Italian pasta and not maifun.&lt;br /&gt;&lt;br /&gt;A few final notes... Both Peacock and Annie Chun&#x27;s brown rice noodles are gluten free. The Peacock is made in Singapore and the Annie Chun&#x27;s in Thailand. The Peacock noodles do taste fine (kind of bland), but so do the Annie Chun&#x27;s. At this time, they are both approximately the same price. Peacock come in an plastic bag with some noodle crushage upon shipping; Annie Chun&#x27;s are perfect upon removal from their cellophane bag in a box. Overall, I highly recommend the Annie Chun&#x27;s Maifun as a healthier option over the Peacock brand. On a related note, the Annie Chun&#x27;s soba and brown rice pad thai noodles are also excellent.&lt;br /&gt;&lt;br /&gt;Rating for this product: 2.5 stars rounded down to 2 stars.</td></tr>\n<tr><td>I have tried many different drink mix, this is the best tasting by far. It does not have the after taste of the sweetener and I really like it, it is pretty strong, so I use a big water bottle (20 oz) for one tube, it still a little stronger than I like, but it is just my taste. </td></tr>\n<tr><td>This is my favorite item to order for my Keurig. There are so many flavors, my finicky palate never gets bored! The only downside is there are probably 5-6 decaf varieties. I don&#x27;t drink decaf (I REQUIRE copious amounts of caffeine), so they sit on the shelf... </td></tr>\n<tr><td>Tastes like Earl Grey, but it&#x27;s green tea so it&#x27;s healthier. </td></tr>\n</tbody>\n</table>"},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":""},"execution_count":23,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## Now we can train our Doc2Vec model\n<blockquote>We are going to use the popular Gensim doc2vec implementation for scikit-learn. We use scikit-learn here because of it's implementation of <code>Pipelines</code> which allow us to create custom transformations on the data before training/running our model. This gives us the ultimate flexibility. We will use doc2vec, which is just an extension of word2vec but for full documents (sentences). We can also time how long it takes to train the word vectorizer, and log the vector size so we can change it and see how performance changes. Then we can look at some word synonyms to see how well the tokenizer did</i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"trusted":false},"cell_type":"code","source":"def tokenize( reviews ):\n review_tokens = []\n for review in reviews[0]:\n # Remove non-letters\n review = re.sub(\"[^a-zA-Z]\",\" \", review)\n review = review.lower().split()\n\n stops = set(STOP_WORDS)\n review = [w for w in review if w not in stops]\n review_tokens.append(review)\n return(review_tokens)","execution_count":24,"outputs":[]},{"metadata":{},"cell_type":"markdown","source":"## Train Doc2Vec Model\n<blockquote>This will take a few minutes to run as the model needs to generate mappings of every sentence to vectors</i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"scrolled":true,"trusted":false},"cell_type":"code","source":"from sklearn.feature_extraction.text import TfidfTransformer, CountVectorizer\nfrom sklearn.pipeline import FeatureUnion, Pipeline as skPipe\nfrom sklearn.base import BaseEstimator, TransformerMixin\nfrom sklearn.preprocessing import FunctionTransformer\nfrom gensim.sklearn_api import W2VTransformer, D2VTransformer\nimport pandas as pd\n\npdf = reviews['Text'].as_data_frame().astype('string')\n\nword2vec_model = skPipe(verbose=True,\n steps = [\n ('preprocessor', FunctionTransformer(tokenize, validate=False)),\n ('word2vec', D2VTransformer()),\n ('postprocessor', FunctionTransformer(lambda X: X.astype('double'), validate=True))\n ])\nwith mlflow.timer('word2vec_train_time'):\n # tokenize and build vocab\n word2vec_model.fit([pdf.dropna()['Text']])","execution_count":25,"outputs":[{"name":"stderr","output_type":"stream","text":"The usage of `cmp` is deprecated and will be removed on or after 2021-06-01. Please use `eq` and `order` instead.\n`scipy.sparse.sparsetools` is deprecated!\nscipy.sparse.sparsetools is a private module for scipy.sparse, and should not be used.\n"},{"name":"stdout","output_type":"stream","text":"Starting Code Block word2vec_train_time... [Pipeline] ...... (step 1 of 3) Processing preprocessor, total= 4.4s\n[Pipeline] .......... (step 2 of 3) Processing word2vec, total= 2.3min\n"},{"name":"stderr","output_type":"stream","text":"The default validate=True will be replaced by validate=False in 0.22.\n"},{"name":"stdout","output_type":"stream","text":"[Pipeline] ..... (step 3 of 3) Processing postprocessor, total= 0.0s\nDone.\nCode Block word2vec_train_time:\nRan in 140.889 secs\nRan in 2.348 mins\n"}]},{"metadata":{},"cell_type":"markdown","source":"## Now we can use the Doc2Vec Model to see the most similar sentences of an input"},{"metadata":{"trusted":false},"cell_type":"code","source":"from copy import deepcopy\ndoc2vec_model = deepcopy(word2vec_model.steps[1][1].gensim_model)\n\n# Tokenize our input\ninp = \"Tastes like Earl Grey\"\ntokens = tokenize([[inp]])\nnew_vector = doc2vec_model.infer_vector(tokens[0])\n# Get our vectorized sentence\nsims = doc2vec_model.docvecs.most_similar([new_vector])\nprint(f'Vector of input: {sims}\\n')\n# Get the most similar review\nindex = sims[0][0]\noutput = pdf.dropna()['Text'].iloc[index]\n\nprint(f'Input: {inp}\\nMost similar Output: {output}')","execution_count":27,"outputs":[{"name":"stdout","output_type":"stream","text":"Vector of input: [(93226, 0.8579040765762329), (1883, 0.8531466722488403), (23673, 0.8494805693626404), (87876, 0.848807156085968), (72010, 0.8470171689987183), (21202, 0.8461479544639587), (42383, 0.846132755279541), (26856, 0.8412899971008301), (28805, 0.8412365913391113), (26706, 0.8408089280128479)]\n\nInput: Tastes like Earl Grey\nMost similar Output: You have to taste this, it taste like you made it from scratch.\n"}]},{"metadata":{},"cell_type":"markdown","source":"## Now we can save this model and end our run\n<blockquote>We want to save this vectorizer as an <b>independent</b> <code>run</code>. This is because we may want to build more than 1 model that utilizes these word vectors. We don't want to duplicate those identical word vectors, so we can use the outputs for <b>more than one model</b>. This is the idea of a <i>feature store</i> where we use features from one dataset on multiple models. This is crucial to creating efficient ML workflow systems."},{"metadata":{"trusted":false},"cell_type":"code","source":"mlflow.log_model(word2vec_model, 'word2vec_model')\nmlflow.lp('vector_size',100)\nmlflow.end_run()","execution_count":28,"outputs":[{"name":"stdout","output_type":"stream","text":"Saving artifact of size: 56999.04 KB to Splice Machine DB\n"}]},{"metadata":{},"cell_type":"markdown","source":"## Let's vectorize our reviews\n<blockquote>Now that we have a word embedding for each word in our vocabulary, we will aggregate the words for each review using the <code>transform</code> function. This will give us one aggregated word embedding for each review.</blockquote>"},{"metadata":{"trusted":false},"cell_type":"code","source":"# Calculate a vector for each review\nreview_vecs = h2o.H2OFrame(word2vec_model.transform([pdf.fillna(\"\")['Text']]))\n# Add the review vectors to the original dataframe\n# Add aggregated word embeddings \next_reviews = reviews.cbind(review_vecs)\next_reviews.head()","execution_count":29,"outputs":[{"name":"stderr","output_type":"stream","text":"The default validate=True will be replaced by validate=False in 0.22.\n"},{"name":"stdout","output_type":"stream","text":"Parse progress: |█████████████████████████████████████████████████████████| 100%\n"},{"data":{"text/html":"<table>\n<thead>\n<tr><th>ProductId </th><th>UserId </th><th>Summary </th><th style=\"text-align: right;\"> Score</th><th style=\"text-align: right;\"> HelpfulnessDenominator</th><th style=\"text-align: right;\"> Id</th><th>ProfileName </th><th style=\"text-align: right;\"> HelpfulnessNumerator</th><th style=\"text-align: right;\"> Time</th><th>Text </th><th style=\"text-align: right;\"> PositiveReview</th><th style=\"text-align: right;\"> C1</th><th style=\"text-align: right;\"> C2</th><th style=\"text-align: right;\"> C3</th><th style=\"text-align: right;\"> C4</th><th style=\"text-align: right;\"> C5</th><th style=\"text-align: right;\"> C6</th><th style=\"text-align: right;\"> C7</th><th style=\"text-align: right;\"> C8</th><th style=\"text-align: right;\"> C9</th><th style=\"text-align: right;\"> C10</th><th style=\"text-align: right;\"> C11</th><th style=\"text-align: right;\"> C12</th><th style=\"text-align: right;\"> C13</th><th style=\"text-align: right;\"> C14</th><th style=\"text-align: right;\"> C15</th><th style=\"text-align: right;\"> C16</th><th style=\"text-align: right;\"> C17</th><th style=\"text-align: right;\"> C18</th><th style=\"text-align: right;\"> C19</th><th style=\"text-align: right;\"> C20</th><th style=\"text-align: right;\"> C21</th><th style=\"text-align: right;\"> C22</th><th style=\"text-align: right;\"> C23</th><th style=\"text-align: right;\"> C24</th><th style=\"text-align: right;\"> C25</th><th style=\"text-align: right;\"> C26</th><th style=\"text-align: right;\"> C27</th><th style=\"text-align: right;\"> C28</th><th style=\"text-align: right;\"> C29</th><th style=\"text-align: right;\"> C30</th><th style=\"text-align: right;\"> C31</th><th style=\"text-align: right;\"> C32</th><th style=\"text-align: right;\"> C33</th><th style=\"text-align: right;\"> C34</th><th style=\"text-align: right;\"> C35</th><th style=\"text-align: right;\"> C36</th><th style=\"text-align: right;\"> C37</th><th style=\"text-align: right;\"> C38</th><th style=\"text-align: right;\"> C39</th><th style=\"text-align: right;\"> C40</th><th style=\"text-align: right;\"> C41</th><th style=\"text-align: right;\"> C42</th><th style=\"text-align: right;\"> C43</th><th style=\"text-align: right;\"> C44</th><th style=\"text-align: right;\"> C45</th><th style=\"text-align: right;\"> C46</th><th style=\"text-align: right;\"> C47</th><th style=\"text-align: right;\"> C48</th><th style=\"text-align: right;\"> C49</th><th style=\"text-align: right;\"> C50</th><th style=\"text-align: right;\"> C51</th><th style=\"text-align: right;\"> C52</th><th style=\"text-align: right;\"> C53</th><th style=\"text-align: right;\"> C54</th><th style=\"text-align: right;\"> C55</th><th style=\"text-align: right;\"> C56</th><th style=\"text-align: right;\"> C57</th><th style=\"text-align: right;\"> C58</th><th style=\"text-align: right;\"> C59</th><th style=\"text-align: right;\"> C60</th><th style=\"text-align: right;\"> C61</th><th style=\"text-align: right;\"> C62</th><th style=\"text-align: right;\"> C63</th><th style=\"text-align: right;\"> C64</th><th style=\"text-align: right;\"> C65</th><th style=\"text-align: right;\"> C66</th><th style=\"text-align: right;\"> C67</th><th style=\"text-align: right;\"> C68</th><th style=\"text-align: right;\"> C69</th><th style=\"text-align: right;\"> C70</th><th style=\"text-align: right;\"> C71</th><th style=\"text-align: right;\"> C72</th><th style=\"text-align: right;\"> C73</th><th style=\"text-align: right;\"> C74</th><th style=\"text-align: right;\"> C75</th><th style=\"text-align: right;\"> C76</th><th style=\"text-align: right;\"> C77</th><th style=\"text-align: right;\"> C78</th><th style=\"text-align: right;\"> C79</th><th style=\"text-align: right;\"> C80</th><th style=\"text-align: right;\"> C81</th><th style=\"text-align: right;\"> C82</th><th style=\"text-align: right;\"> C83</th><th style=\"text-align: right;\"> C84</th><th style=\"text-align: right;\"> C85</th><th style=\"text-align: right;\"> C86</th><th style=\"text-align: right;\"> C87</th><th style=\"text-align: right;\"> C88</th><th style=\"text-align: right;\"> C89</th><th style=\"text-align: right;\"> C90</th><th style=\"text-align: right;\"> C91</th><th style=\"text-align: right;\"> C92</th><th style=\"text-align: right;\"> C93</th><th style=\"text-align: right;\"> C94</th><th style=\"text-align: right;\"> C95</th><th style=\"text-align: right;\"> C96</th><th style=\"text-align: right;\"> C97</th><th style=\"text-align: right;\"> C98</th><th style=\"text-align: right;\"> C99</th><th style=\"text-align: right;\"> C100</th></tr>\n</thead>\n<tbody>\n<tr><td>B00141QYSQ </td><td>A1YS02UZZGRDCT</td><td>Do Not Buy </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 41471</td><td>Evan Eberhardt </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">1.34836e+09</td><td>These are made in China (do not buy ANY pet food from China). Dogswell has been using propylene glycol to soften their treats (what are they thinkng?). Do not purchase or support this company in any way until they clean up their act. And for whatever reason Amazon doesn&#x27;t allow returns of this item, so I had to toss mine out. Bad business all around on this one. </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\"> 0.0311221 </td><td style=\"text-align: right;\"> 0.0719161 </td><td style=\"text-align: right;\">-0.1281 </td><td style=\"text-align: right;\"> 0.0273494 </td><td style=\"text-align: right;\">-0.0642016 </td><td style=\"text-align: right;\">0.0994345 </td><td style=\"text-align: right;\"> 0.0135567 </td><td style=\"text-align: right;\">-0.0737052 </td><td style=\"text-align: right;\"> 0.122077 </td><td style=\"text-align: right;\"> 0.0350201 </td><td style=\"text-align: right;\"> 0.0756314 </td><td style=\"text-align: right;\"> 0.0923928</td><td style=\"text-align: right;\">-0.132129 </td><td style=\"text-align: right;\"> 0.0115625 </td><td style=\"text-align: right;\"> 0.0306404 </td><td style=\"text-align: right;\">-0.0373009 </td><td style=\"text-align: right;\">-0.0684029 </td><td style=\"text-align: right;\"> 0.0680291 </td><td style=\"text-align: right;\">-0.023424 </td><td style=\"text-align: right;\">-0.0835139 </td><td style=\"text-align: right;\"> 0.00894095</td><td style=\"text-align: right;\"> 0.0749535</td><td style=\"text-align: right;\"> 0.129895 </td><td style=\"text-align: right;\"> 0.132389 </td><td style=\"text-align: right;\"> 0.0092737 </td><td style=\"text-align: right;\"> 0.108408 </td><td style=\"text-align: right;\"> 0.0476859 </td><td style=\"text-align: right;\">-0.0485954 </td><td style=\"text-align: right;\">-0.121283 </td><td style=\"text-align: right;\"> 0.0735992 </td><td style=\"text-align: right;\">-0.0445242 </td><td style=\"text-align: right;\">-0.0080538 </td><td style=\"text-align: right;\">-0.095095 </td><td style=\"text-align: right;\">-0.0605023 </td><td style=\"text-align: right;\">-0.00707685</td><td style=\"text-align: right;\"> 0.0376109 </td><td style=\"text-align: right;\"> 0.0138715 </td><td style=\"text-align: right;\"> 0.145876 </td><td style=\"text-align: right;\">-0.0366801 </td><td style=\"text-align: right;\"> 0.0694044</td><td style=\"text-align: right;\">-0.0392646 </td><td style=\"text-align: right;\">-0.0860089</td><td style=\"text-align: right;\"> 0.0646098</td><td style=\"text-align: right;\">-0.0416264 </td><td style=\"text-align: right;\"> 0.0500176 </td><td style=\"text-align: right;\">-0.0586521</td><td style=\"text-align: right;\"> 0.0536527 </td><td style=\"text-align: right;\"> 0.0441543 </td><td style=\"text-align: right;\">-0.0336521 </td><td style=\"text-align: right;\">-0.021085 </td><td style=\"text-align: right;\"> 0.0222311</td><td style=\"text-align: right;\"> 0.119254 </td><td style=\"text-align: right;\">-0.0199307 </td><td style=\"text-align: right;\">-0.0894037 </td><td style=\"text-align: right;\">-0.0437724 </td><td style=\"text-align: right;\"> 0.0452685 </td><td style=\"text-align: right;\"> 0.101266 </td><td style=\"text-align: right;\">-0.0958144</td><td style=\"text-align: right;\">-0.0453501 </td><td style=\"text-align: right;\">-0.127768 </td><td style=\"text-align: right;\"> 0.0535161</td><td style=\"text-align: right;\">-0.0619117 </td><td style=\"text-align: right;\">-0.0233732 </td><td style=\"text-align: right;\">-0.089195 </td><td style=\"text-align: right;\">0.112413 </td><td style=\"text-align: right;\">-0.0247127 </td><td style=\"text-align: right;\"> 0.0762887 </td><td style=\"text-align: right;\">-0.101294 </td><td style=\"text-align: right;\"> 0.0329748 </td><td style=\"text-align: right;\"> 0.0639789 </td><td style=\"text-align: right;\">0.118965 </td><td style=\"text-align: right;\">-0.0389742 </td><td style=\"text-align: right;\">-0.0120773 </td><td style=\"text-align: right;\">-0.0522605 </td><td style=\"text-align: right;\">-0.0395143 </td><td style=\"text-align: right;\">-0.107343 </td><td style=\"text-align: right;\"> 0.000138754</td><td style=\"text-align: right;\">-0.202054 </td><td style=\"text-align: right;\">-0.0322408 </td><td style=\"text-align: right;\"> 0.0204981</td><td style=\"text-align: right;\"> 0.0339285 </td><td style=\"text-align: right;\">-0.0551402</td><td style=\"text-align: right;\"> 0.0426714 </td><td style=\"text-align: right;\">-0.0427338 </td><td style=\"text-align: right;\"> 0.107102 </td><td style=\"text-align: right;\"> 0.0563949 </td><td style=\"text-align: right;\">-0.041101 </td><td style=\"text-align: right;\">-0.0848916</td><td style=\"text-align: right;\"> 0.0479946 </td><td style=\"text-align: right;\">-0.0431166 </td><td style=\"text-align: right;\">-0.00818688</td><td style=\"text-align: right;\">-0.00537665</td><td style=\"text-align: right;\"> 0.0358918</td><td style=\"text-align: right;\">-0.0373974 </td><td style=\"text-align: right;\">-0.102815 </td><td style=\"text-align: right;\"> 0.0306642</td><td style=\"text-align: right;\"> 0.0515081</td><td style=\"text-align: right;\"> 0.120599 </td><td style=\"text-align: right;\"> 0.0901687 </td><td style=\"text-align: right;\"> 0.115301 </td></tr>\n<tr><td>B0089SPEO2 </td><td>A3JOYNYL458QHP</td><td>Less lemon and less zing </td><td style=\"text-align: right;\"> 3</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\"> 28582</td><td>coleridge </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.32391e+09</td><td>Everything is ok, except it just isn&#x27;t as good as it is in the bags. Just considerably more bland -- less lemon and less zing. Boring. </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">-0.0354048 </td><td style=\"text-align: right;\"> 0.0247745 </td><td style=\"text-align: right;\">-0.0545446 </td><td style=\"text-align: right;\"> 0.0766969 </td><td style=\"text-align: right;\">-0.0542439 </td><td style=\"text-align: right;\">0.00712698</td><td style=\"text-align: right;\"> 0.0679792 </td><td style=\"text-align: right;\"> 0.00729536 </td><td style=\"text-align: right;\"> 0.0194643</td><td style=\"text-align: right;\">-0.000942943</td><td style=\"text-align: right;\"> 0.0149142 </td><td style=\"text-align: right;\"> 0.0336201</td><td style=\"text-align: right;\">-0.0262167</td><td style=\"text-align: right;\"> 0.00263732 </td><td style=\"text-align: right;\">-0.00473881</td><td style=\"text-align: right;\"> 0.0025277 </td><td style=\"text-align: right;\"> 0.0319848 </td><td style=\"text-align: right;\"> 0.0724836 </td><td style=\"text-align: right;\">-0.0159643</td><td style=\"text-align: right;\">-0.0369099 </td><td style=\"text-align: right;\"> 0.0349859 </td><td style=\"text-align: right;\">-0.0440495</td><td style=\"text-align: right;\">-0.0209405 </td><td style=\"text-align: right;\"> 0.00411849</td><td style=\"text-align: right;\"> 0.00909809</td><td style=\"text-align: right;\">-0.0124248 </td><td style=\"text-align: right;\"> 0.0167648 </td><td style=\"text-align: right;\">-0.021823 </td><td style=\"text-align: right;\"> 0.00381749</td><td style=\"text-align: right;\">-0.0172089 </td><td style=\"text-align: right;\">-0.00290564</td><td style=\"text-align: right;\">-0.0122623 </td><td style=\"text-align: right;\"> 0.00216576</td><td style=\"text-align: right;\"> 0.0540102 </td><td style=\"text-align: right;\">-0.0128816 </td><td style=\"text-align: right;\">-0.0172007 </td><td style=\"text-align: right;\"> 0.00682313</td><td style=\"text-align: right;\"> 0.0245506</td><td style=\"text-align: right;\">-0.0577924 </td><td style=\"text-align: right;\">-0.0198054</td><td style=\"text-align: right;\">-0.0630254 </td><td style=\"text-align: right;\">-0.0439116</td><td style=\"text-align: right;\"> 0.0178368</td><td style=\"text-align: right;\"> 0.0637651 </td><td style=\"text-align: right;\">-0.0437091 </td><td style=\"text-align: right;\">-0.0649058</td><td style=\"text-align: right;\">-0.0402683 </td><td style=\"text-align: right;\">-0.0244872 </td><td style=\"text-align: right;\"> 0.083092 </td><td style=\"text-align: right;\"> 0.0160841 </td><td style=\"text-align: right;\">-0.0193159</td><td style=\"text-align: right;\">-0.0250807</td><td style=\"text-align: right;\"> 0.00872545</td><td style=\"text-align: right;\">-0.00508996</td><td style=\"text-align: right;\"> 0.0469757 </td><td style=\"text-align: right;\">-0.0459333 </td><td style=\"text-align: right;\">-0.0293787</td><td style=\"text-align: right;\">-0.0118928</td><td style=\"text-align: right;\">-0.0353911 </td><td style=\"text-align: right;\"> 0.00960084</td><td style=\"text-align: right;\"> 0.0197541</td><td style=\"text-align: right;\">-0.00492643</td><td style=\"text-align: right;\">-0.0258128 </td><td style=\"text-align: right;\"> 0.0134544 </td><td style=\"text-align: right;\">0.0435747 </td><td style=\"text-align: right;\"> 0.0105644 </td><td style=\"text-align: right;\"> 0.0530692 </td><td style=\"text-align: right;\">-0.000385565</td><td style=\"text-align: right;\">-0.0257902 </td><td style=\"text-align: right;\"> 0.0121527 </td><td style=\"text-align: right;\">0.00295611</td><td style=\"text-align: right;\">-0.0238126 </td><td style=\"text-align: right;\"> 0.00596643</td><td style=\"text-align: right;\">-0.0372999 </td><td style=\"text-align: right;\"> 0.0326834 </td><td style=\"text-align: right;\">-0.0418227 </td><td style=\"text-align: right;\">-0.0677815 </td><td style=\"text-align: right;\"> 0.00903367</td><td style=\"text-align: right;\">-0.0430189 </td><td style=\"text-align: right;\">-0.0240547</td><td style=\"text-align: right;\">-0.00625893</td><td style=\"text-align: right;\">-0.0332857</td><td style=\"text-align: right;\">-0.00795872</td><td style=\"text-align: right;\">-0.00244623</td><td style=\"text-align: right;\">-0.0130634</td><td style=\"text-align: right;\"> 0.0290811 </td><td style=\"text-align: right;\">-0.00679918</td><td style=\"text-align: right;\"> 0.0219198</td><td style=\"text-align: right;\">-0.0891907 </td><td style=\"text-align: right;\">-0.0157458 </td><td style=\"text-align: right;\"> 0.0173031 </td><td style=\"text-align: right;\"> 0.0519746 </td><td style=\"text-align: right;\">-0.0118705</td><td style=\"text-align: right;\">-0.0143847 </td><td style=\"text-align: right;\"> 0.09397 </td><td style=\"text-align: right;\"> 0.0110028</td><td style=\"text-align: right;\"> 0.0328517</td><td style=\"text-align: right;\"> 0.0368655</td><td style=\"text-align: right;\"> 0.0645739 </td><td style=\"text-align: right;\"> 0.047565 </td></tr>\n<tr><td>B001PMCDK2 </td><td>A14TTMM0Z03Y2W</td><td>my cat goes crazy for these! </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">389965</td><td>Lindsay S. Bradford </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.3106e+09 </td><td>Best cat treat ever. There isn&#x27;t anything comparable to the love my cat has for these treats, he snubs away any other kind now.&lt;br /&gt;I know he likes to manipulate me with his cattiness but these treats are my way of manipulating him to come sit on my lap and have some chill time. :) </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 0.0722773 </td><td style=\"text-align: right;\"> 0.0639396 </td><td style=\"text-align: right;\">-0.0335094 </td><td style=\"text-align: right;\"> 0.0664469 </td><td style=\"text-align: right;\">-0.0652916 </td><td style=\"text-align: right;\">0.0689586 </td><td style=\"text-align: right;\"> 0.00527908</td><td style=\"text-align: right;\"> 0.0413024 </td><td style=\"text-align: right;\"> 0.0791953</td><td style=\"text-align: right;\"> 0.0910244 </td><td style=\"text-align: right;\"> 0.0555998 </td><td style=\"text-align: right;\"> 0.136567 </td><td style=\"text-align: right;\">-0.0604036</td><td style=\"text-align: right;\"> 0.00816555 </td><td style=\"text-align: right;\">-0.0229452 </td><td style=\"text-align: right;\">-0.0184804 </td><td style=\"text-align: right;\">-0.156753 </td><td style=\"text-align: right;\"> 0.0496772 </td><td style=\"text-align: right;\">-0.0695818</td><td style=\"text-align: right;\"> 0.0520854 </td><td style=\"text-align: right;\"> 0.0916832 </td><td style=\"text-align: right;\"> 0.0356442</td><td style=\"text-align: right;\">-0.0426297 </td><td style=\"text-align: right;\"> 0.180441 </td><td style=\"text-align: right;\"> 0.0691552 </td><td style=\"text-align: right;\"> 0.0166221 </td><td style=\"text-align: right;\">-0.0345218 </td><td style=\"text-align: right;\">-0.0499622 </td><td style=\"text-align: right;\"> 0.00111568</td><td style=\"text-align: right;\"> 0.0759182 </td><td style=\"text-align: right;\">-0.023927 </td><td style=\"text-align: right;\">-0.07914 </td><td style=\"text-align: right;\">-0.0675944 </td><td style=\"text-align: right;\">-0.0155206 </td><td style=\"text-align: right;\"> 0.0650796 </td><td style=\"text-align: right;\"> 0.0760519 </td><td style=\"text-align: right;\"> 0.0215851 </td><td style=\"text-align: right;\"> 0.0226041</td><td style=\"text-align: right;\">-0.0835985 </td><td style=\"text-align: right;\"> 0.0163129</td><td style=\"text-align: right;\"> 0.0167953 </td><td style=\"text-align: right;\">-0.0885574</td><td style=\"text-align: right;\">-0.0367466</td><td style=\"text-align: right;\">-0.0578619 </td><td style=\"text-align: right;\"> 0.0156162 </td><td style=\"text-align: right;\">-0.0248519</td><td style=\"text-align: right;\"> 0.0831028 </td><td style=\"text-align: right;\"> 0.00309607</td><td style=\"text-align: right;\">-0.00960189</td><td style=\"text-align: right;\"> 0.051891 </td><td style=\"text-align: right;\">-0.0101117</td><td style=\"text-align: right;\"> 0.0763692</td><td style=\"text-align: right;\"> 0.0693945 </td><td style=\"text-align: right;\">-0.0432502 </td><td style=\"text-align: right;\">-0.0587365 </td><td style=\"text-align: right;\"> 0.0761859 </td><td style=\"text-align: right;\">-0.0310571</td><td style=\"text-align: right;\">-0.108232 </td><td style=\"text-align: right;\">-0.105073 </td><td style=\"text-align: right;\">-0.0249751 </td><td style=\"text-align: right;\"> 0.0113496</td><td style=\"text-align: right;\">-0.0585231 </td><td style=\"text-align: right;\">-0.0157676 </td><td style=\"text-align: right;\">-0.0404807 </td><td style=\"text-align: right;\">0.0492526 </td><td style=\"text-align: right;\">-0.138522 </td><td style=\"text-align: right;\"> 0.00527175</td><td style=\"text-align: right;\"> 0.0012283 </td><td style=\"text-align: right;\"> 0.0671212 </td><td style=\"text-align: right;\"> 0.00836717 </td><td style=\"text-align: right;\">0.143647 </td><td style=\"text-align: right;\">-0.010005 </td><td style=\"text-align: right;\">-0.0288803 </td><td style=\"text-align: right;\">-0.0533496 </td><td style=\"text-align: right;\"> 0.0189405 </td><td style=\"text-align: right;\">-0.05666 </td><td style=\"text-align: right;\">-0.128763 </td><td style=\"text-align: right;\">-0.06002 </td><td style=\"text-align: right;\">-0.0298549 </td><td style=\"text-align: right;\"> 0.0422105</td><td style=\"text-align: right;\">-0.0321987 </td><td style=\"text-align: right;\">-0.107224 </td><td style=\"text-align: right;\"> 0.0706086 </td><td style=\"text-align: right;\"> 0.0965514 </td><td style=\"text-align: right;\"> 0.0849076</td><td style=\"text-align: right;\"> 0.0487253 </td><td style=\"text-align: right;\">-0.0208525 </td><td style=\"text-align: right;\">-0.0516878</td><td style=\"text-align: right;\"> 0.0413375 </td><td style=\"text-align: right;\">-0.119434 </td><td style=\"text-align: right;\">-0.0490209 </td><td style=\"text-align: right;\">-0.0426325 </td><td style=\"text-align: right;\">-0.141985 </td><td style=\"text-align: right;\">-0.0731703 </td><td style=\"text-align: right;\"> 0.00255836</td><td style=\"text-align: right;\">-0.0352337</td><td style=\"text-align: right;\"> 0.0615106</td><td style=\"text-align: right;\"> 0.130422 </td><td style=\"text-align: right;\"> 0.132418 </td><td style=\"text-align: right;\"> 0.0860906</td></tr>\n<tr><td>B002Q8JOSI </td><td>A17UQD2RSSQH5X</td><td>My dogs tell me these treats are YUMMY</td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">212536</td><td>in the dark </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.31613e+09</td><td>My two Corgis were thoroughly spoiled by my late husband (I spent a year and a half dieting them down a combined total of 25 pounds!)&lt;br /&gt;&lt;br /&gt;They are accustomed to the finest of fare, and they absolutely love the Wellness brand of treats. </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 0.00664075</td><td style=\"text-align: right;\"> 0.0294306 </td><td style=\"text-align: right;\">-0.00529865 </td><td style=\"text-align: right;\"> 0.0971298 </td><td style=\"text-align: right;\">-0.0554535 </td><td style=\"text-align: right;\">0.041721 </td><td style=\"text-align: right;\"> 0.0711783 </td><td style=\"text-align: right;\"> 0.000611028</td><td style=\"text-align: right;\"> 0.0224197</td><td style=\"text-align: right;\"> 0.0377034 </td><td style=\"text-align: right;\">-0.0358093 </td><td style=\"text-align: right;\"> 0.033314 </td><td style=\"text-align: right;\">-0.0671549</td><td style=\"text-align: right;\">-0.033238 </td><td style=\"text-align: right;\"> 0.0135984 </td><td style=\"text-align: right;\"> 0.0541221 </td><td style=\"text-align: right;\">-0.0676616 </td><td style=\"text-align: right;\"> 0.00970966</td><td style=\"text-align: right;\"> 0.0540937</td><td style=\"text-align: right;\"> 0.00182744</td><td style=\"text-align: right;\"> 0.0213856 </td><td style=\"text-align: right;\"> 0.0461527</td><td style=\"text-align: right;\">-0.00813494</td><td style=\"text-align: right;\"> 0.0593336 </td><td style=\"text-align: right;\">-0.0452856 </td><td style=\"text-align: right;\">-0.0446647 </td><td style=\"text-align: right;\">-0.00545645 </td><td style=\"text-align: right;\">-0.0844449 </td><td style=\"text-align: right;\">-0.012087 </td><td style=\"text-align: right;\"> 0.01034 </td><td style=\"text-align: right;\"> 0.0414165 </td><td style=\"text-align: right;\"> 0.000379321</td><td style=\"text-align: right;\">-0.0102823 </td><td style=\"text-align: right;\">-0.048552 </td><td style=\"text-align: right;\">-0.0478385 </td><td style=\"text-align: right;\"> 0.0280377 </td><td style=\"text-align: right;\"> 0.0373936 </td><td style=\"text-align: right;\"> 0.0682869</td><td style=\"text-align: right;\"> 0.00163835 </td><td style=\"text-align: right;\">-0.0111299</td><td style=\"text-align: right;\">-0.0219148 </td><td style=\"text-align: right;\">-0.0924591</td><td style=\"text-align: right;\"> 0.0605706</td><td style=\"text-align: right;\">-0.0393773 </td><td style=\"text-align: right;\"> 0.00369886</td><td style=\"text-align: right;\">-0.0124289</td><td style=\"text-align: right;\">-0.0573726 </td><td style=\"text-align: right;\">-0.0531469 </td><td style=\"text-align: right;\"> 0.0102651 </td><td style=\"text-align: right;\"> 0.0505138 </td><td style=\"text-align: right;\">-0.0210899</td><td style=\"text-align: right;\"> 0.0418745</td><td style=\"text-align: right;\">-0.00207173</td><td style=\"text-align: right;\">-0.0217494 </td><td style=\"text-align: right;\">-0.0760661 </td><td style=\"text-align: right;\">-0.00915289</td><td style=\"text-align: right;\"> 0.0539563</td><td style=\"text-align: right;\">-0.0754855</td><td style=\"text-align: right;\">-0.0416121 </td><td style=\"text-align: right;\">-0.0607258 </td><td style=\"text-align: right;\">-0.0297508</td><td style=\"text-align: right;\">-0.0212294 </td><td style=\"text-align: right;\">-0.0355835 </td><td style=\"text-align: right;\"> 0.0276151 </td><td style=\"text-align: right;\">0.0650853 </td><td style=\"text-align: right;\">-0.0233705 </td><td style=\"text-align: right;\">-0.0188454 </td><td style=\"text-align: right;\"> 0.0202355 </td><td style=\"text-align: right;\">-0.0565332 </td><td style=\"text-align: right;\">-0.00200412 </td><td style=\"text-align: right;\">0.0732944 </td><td style=\"text-align: right;\">-0.0131517 </td><td style=\"text-align: right;\">-0.0254612 </td><td style=\"text-align: right;\">-0.059395 </td><td style=\"text-align: right;\"> 0.0546296 </td><td style=\"text-align: right;\"> 0.00453196</td><td style=\"text-align: right;\">-0.0866346 </td><td style=\"text-align: right;\">-0.0316054 </td><td style=\"text-align: right;\"> 0.00321747</td><td style=\"text-align: right;\"> 0.10967 </td><td style=\"text-align: right;\">-0.0123553 </td><td style=\"text-align: right;\">-0.0371342</td><td style=\"text-align: right;\"> 0.0453067 </td><td style=\"text-align: right;\">-0.0184933 </td><td style=\"text-align: right;\"> 0.0944639</td><td style=\"text-align: right;\">-0.0483836 </td><td style=\"text-align: right;\">-0.0253078 </td><td style=\"text-align: right;\"> 0.0512179</td><td style=\"text-align: right;\">-0.0190948 </td><td style=\"text-align: right;\">-0.0654936 </td><td style=\"text-align: right;\"> 0.0123237 </td><td style=\"text-align: right;\">-0.0115953 </td><td style=\"text-align: right;\">-0.0616764</td><td style=\"text-align: right;\">-0.0494044 </td><td style=\"text-align: right;\"> 0.0845929 </td><td style=\"text-align: right;\">-0.0297328</td><td style=\"text-align: right;\">-0.0123154</td><td style=\"text-align: right;\"> 0.0593596</td><td style=\"text-align: right;\">-0.0221123 </td><td style=\"text-align: right;\"> 0.0738102</td></tr>\n<tr><td>B00176G870 </td><td>A2F2MZW8EOGH5J</td><td>Yummy to the tummy </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">115971</td><td>daemoncycler &quot;When you arrive at a fork in th...</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.33479e+09</td><td>We used to have drive down to the specialty pet food store for this product. So glad we discovered Amazon. As far as I can tell it is no more expensive &amp; in some cases less - Prime membership is awesome. Loving Pets treats are some of the best according to my dog. They do not develop that nasty smell like some dog treats do. </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">-0.0569507 </td><td style=\"text-align: right;\"> 0.100785 </td><td style=\"text-align: right;\">-0.139421 </td><td style=\"text-align: right;\"> 0.0916591 </td><td style=\"text-align: right;\">-0.0669327 </td><td style=\"text-align: right;\">0.139871 </td><td style=\"text-align: right;\">-0.053762 </td><td style=\"text-align: right;\">-0.0516505 </td><td style=\"text-align: right;\"> 0.118875 </td><td style=\"text-align: right;\"> 0.0320348 </td><td style=\"text-align: right;\"> 0.072005 </td><td style=\"text-align: right;\"> 0.183638 </td><td style=\"text-align: right;\">-0.0507139</td><td style=\"text-align: right;\"> 0.0662295 </td><td style=\"text-align: right;\">-0.0175711 </td><td style=\"text-align: right;\">-0.0563984 </td><td style=\"text-align: right;\">-0.0718676 </td><td style=\"text-align: right;\"> 0.0765699 </td><td style=\"text-align: right;\"> 0.0315671</td><td style=\"text-align: right;\">-0.0192526 </td><td style=\"text-align: right;\"> 0.0754111 </td><td style=\"text-align: right;\"> 0.0841884</td><td style=\"text-align: right;\"> 0.0744908 </td><td style=\"text-align: right;\"> 0.0465411 </td><td style=\"text-align: right;\">-0.0424741 </td><td style=\"text-align: right;\"> 0.0661015 </td><td style=\"text-align: right;\"> 0.0406489 </td><td style=\"text-align: right;\">-0.18337 </td><td style=\"text-align: right;\">-0.0585701 </td><td style=\"text-align: right;\"> 0.0298698 </td><td style=\"text-align: right;\">-0.0111099 </td><td style=\"text-align: right;\"> 0.0202008 </td><td style=\"text-align: right;\">-0.0343549 </td><td style=\"text-align: right;\">-0.0418548 </td><td style=\"text-align: right;\">-0.0324968 </td><td style=\"text-align: right;\"> 0.00647369</td><td style=\"text-align: right;\">-0.0611532 </td><td style=\"text-align: right;\"> 0.0551729</td><td style=\"text-align: right;\">-0.00687854 </td><td style=\"text-align: right;\">-0.0248682</td><td style=\"text-align: right;\">-0.0540005 </td><td style=\"text-align: right;\">-0.198181 </td><td style=\"text-align: right;\"> 0.0737756</td><td style=\"text-align: right;\">-0.0429527 </td><td style=\"text-align: right;\"> 0.107321 </td><td style=\"text-align: right;\">-0.0898639</td><td style=\"text-align: right;\"> 0.0765675 </td><td style=\"text-align: right;\">-0.0155799 </td><td style=\"text-align: right;\"> 0.0183806 </td><td style=\"text-align: right;\"> 0.127784 </td><td style=\"text-align: right;\">-0.0959386</td><td style=\"text-align: right;\"> 0.116064 </td><td style=\"text-align: right;\">-0.0179106 </td><td style=\"text-align: right;\"> 0.0426698 </td><td style=\"text-align: right;\">-0.033683 </td><td style=\"text-align: right;\"> 0.0445723 </td><td style=\"text-align: right;\"> 0.008436 </td><td style=\"text-align: right;\">-0.149899 </td><td style=\"text-align: right;\"> 0.0567 </td><td style=\"text-align: right;\">-0.061969 </td><td style=\"text-align: right;\"> 0.106395 </td><td style=\"text-align: right;\">-0.0474444 </td><td style=\"text-align: right;\">-0.0159554 </td><td style=\"text-align: right;\">-0.0205738 </td><td style=\"text-align: right;\">0.0895157 </td><td style=\"text-align: right;\">-0.061288 </td><td style=\"text-align: right;\"> 0.0101447 </td><td style=\"text-align: right;\">-0.0686496 </td><td style=\"text-align: right;\"> 0.112418 </td><td style=\"text-align: right;\">-0.0104366 </td><td style=\"text-align: right;\">0.0889506 </td><td style=\"text-align: right;\">-0.0832996 </td><td style=\"text-align: right;\">-0.0569302 </td><td style=\"text-align: right;\">-0.0880074 </td><td style=\"text-align: right;\"> 0.0605617 </td><td style=\"text-align: right;\">-0.0295004 </td><td style=\"text-align: right;\">-0.0154352 </td><td style=\"text-align: right;\">-0.157382 </td><td style=\"text-align: right;\">-0.0701636 </td><td style=\"text-align: right;\"> 0.0672711</td><td style=\"text-align: right;\"> 0.0324169 </td><td style=\"text-align: right;\">-0.06646 </td><td style=\"text-align: right;\">-0.0100355 </td><td style=\"text-align: right;\">-0.0226492 </td><td style=\"text-align: right;\"> 0.159023 </td><td style=\"text-align: right;\"> 0.0748435 </td><td style=\"text-align: right;\">-0.061932 </td><td style=\"text-align: right;\">-0.0430855</td><td style=\"text-align: right;\"> 0.00506526</td><td style=\"text-align: right;\">-0.0945003 </td><td style=\"text-align: right;\">-0.00753332</td><td style=\"text-align: right;\"> 0.0870276 </td><td style=\"text-align: right;\">-0.0298816</td><td style=\"text-align: right;\">-0.00772067</td><td style=\"text-align: right;\">-0.0255571 </td><td style=\"text-align: right;\">-0.0150707</td><td style=\"text-align: right;\"> 0.0882468</td><td style=\"text-align: right;\"> 0.07531 </td><td style=\"text-align: right;\"> 0.044203 </td><td style=\"text-align: right;\"> 0.144223 </td></tr>\n<tr><td>B001CHFUGY </td><td>A2M8VROSDPU4JT</td><td>Very good coffee </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">434484</td><td>Officefan &quot;Officefankt&quot; </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.27725e+09</td><td>I really liked this coffee, it was just as good as everyone claimed it was. Strong, bold and flavorful! I would recommend! </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">-0.0640295 </td><td style=\"text-align: right;\"> 0.0737147 </td><td style=\"text-align: right;\">-0.000756723</td><td style=\"text-align: right;\"> 0.0261864 </td><td style=\"text-align: right;\">-0.038828 </td><td style=\"text-align: right;\">0.0207082 </td><td style=\"text-align: right;\">-0.0415848 </td><td style=\"text-align: right;\">-0.0312121 </td><td style=\"text-align: right;\"> 0.0150354</td><td style=\"text-align: right;\"> 0.0221745 </td><td style=\"text-align: right;\">-0.0195091 </td><td style=\"text-align: right;\"> 0.0577201</td><td style=\"text-align: right;\">-0.0569465</td><td style=\"text-align: right;\">-0.00376638 </td><td style=\"text-align: right;\"> 0.00802344</td><td style=\"text-align: right;\"> 0.0536572 </td><td style=\"text-align: right;\"> 0.0169469 </td><td style=\"text-align: right;\"> 0.0172603 </td><td style=\"text-align: right;\">-0.0174735</td><td style=\"text-align: right;\"> 0.0103346 </td><td style=\"text-align: right;\"> 0.0331177 </td><td style=\"text-align: right;\"> 0.0563277</td><td style=\"text-align: right;\"> 0.00481703</td><td style=\"text-align: right;\">-0.0410948 </td><td style=\"text-align: right;\"> 0.00314931</td><td style=\"text-align: right;\"> 0.0392393 </td><td style=\"text-align: right;\"> 0.0101468 </td><td style=\"text-align: right;\">-0.0415722 </td><td style=\"text-align: right;\">-0.0644963 </td><td style=\"text-align: right;\"> 0.0480652 </td><td style=\"text-align: right;\"> 0.0112925 </td><td style=\"text-align: right;\">-0.0561226 </td><td style=\"text-align: right;\">-0.0406242 </td><td style=\"text-align: right;\">-0.00624946</td><td style=\"text-align: right;\">-0.0116796 </td><td style=\"text-align: right;\">-0.040125 </td><td style=\"text-align: right;\">-0.0292939 </td><td style=\"text-align: right;\"> 0.0120191</td><td style=\"text-align: right;\"> 0.000726164</td><td style=\"text-align: right;\"> 0.0390184</td><td style=\"text-align: right;\">-0.0648858 </td><td style=\"text-align: right;\">-0.0544326</td><td style=\"text-align: right;\"> 0.0364144</td><td style=\"text-align: right;\">-0.0354804 </td><td style=\"text-align: right;\"> 0.0287168 </td><td style=\"text-align: right;\">-0.0408558</td><td style=\"text-align: right;\"> 0.00537148</td><td style=\"text-align: right;\">-0.0166096 </td><td style=\"text-align: right;\">-0.0148217 </td><td style=\"text-align: right;\"> 0.00864069</td><td style=\"text-align: right;\">-0.0132793</td><td style=\"text-align: right;\"> 0.0107762</td><td style=\"text-align: right;\"> 0.0475812 </td><td style=\"text-align: right;\">-0.0110306 </td><td style=\"text-align: right;\">-0.0419987 </td><td style=\"text-align: right;\"> 0.00737521</td><td style=\"text-align: right;\"> 0.0581882</td><td style=\"text-align: right;\"> 0.0135915</td><td style=\"text-align: right;\">-0.0274563 </td><td style=\"text-align: right;\">-0.0064732 </td><td style=\"text-align: right;\"> 0.0601388</td><td style=\"text-align: right;\">-0.0345679 </td><td style=\"text-align: right;\">-0.0332279 </td><td style=\"text-align: right;\">-0.0221325 </td><td style=\"text-align: right;\">0.0363207 </td><td style=\"text-align: right;\"> 0.00927576</td><td style=\"text-align: right;\">-0.00137429</td><td style=\"text-align: right;\">-0.057251 </td><td style=\"text-align: right;\"> 0.00328567</td><td style=\"text-align: right;\">-0.00798443 </td><td style=\"text-align: right;\">0.0287494 </td><td style=\"text-align: right;\">-0.0203596 </td><td style=\"text-align: right;\">-0.0159451 </td><td style=\"text-align: right;\">-0.0333854 </td><td style=\"text-align: right;\"> 0.00618047</td><td style=\"text-align: right;\">-0.0343008 </td><td style=\"text-align: right;\"> 0.00322222 </td><td style=\"text-align: right;\">-0.079853 </td><td style=\"text-align: right;\">-0.0570092 </td><td style=\"text-align: right;\">-0.0158511</td><td style=\"text-align: right;\"> 0.0513264 </td><td style=\"text-align: right;\">-0.0356655</td><td style=\"text-align: right;\"> 0.059126 </td><td style=\"text-align: right;\">-0.00116156</td><td style=\"text-align: right;\"> 0.075408 </td><td style=\"text-align: right;\"> 0.00785019 </td><td style=\"text-align: right;\">-0.0360608 </td><td style=\"text-align: right;\">-0.0549585</td><td style=\"text-align: right;\"> 0.0156058 </td><td style=\"text-align: right;\"> 0.00302057</td><td style=\"text-align: right;\"> 0.0368497 </td><td style=\"text-align: right;\"> 0.0135622 </td><td style=\"text-align: right;\">-0.0201859</td><td style=\"text-align: right;\">-0.00640041</td><td style=\"text-align: right;\"> 0.018656 </td><td style=\"text-align: right;\">-0.0105721</td><td style=\"text-align: right;\"> 0.0240232</td><td style=\"text-align: right;\"> 0.0520951</td><td style=\"text-align: right;\"> 0.0568436 </td><td style=\"text-align: right;\"> 0.0579182</td></tr>\n<tr><td>B0041CIR62 </td><td>A16I6WJUEBJ1C3</td><td>okay but not as healthy as it appears </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">138997</td><td>doctorsirena &quot;doctorsirena&quot; </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.34369e+09</td><td>I am always looking for healthier, whole grain versions of foods I enjoy. Unfortunately, these Peacock brand noodles are yet another food masquerading as healthy. The product title in big letters on the front says &quot;Brown Rice Vermicelli&quot;, making the consumer think &quot;this is made with brown rice, so it should be a healthy choice&quot;. But the first indication that it is not is when looking at the fiber content on the nutrition facts - only 0.6g per 2oz serving. Then onto the ingredients list to see why so low... contains brown rice, sago starch and water. The sago starch comes from palms and must not have much (if any) fiber.&lt;br /&gt;&lt;br /&gt;The Annie Chun&#x27;s Maifun Brown Rice Noodles (sold on Amazon and in my local healthy grocer) has become one of my staples and is my frame of reference when comparing to the Peacock brand. The Annie Chun&#x27;s product is made with 100% whole grain, with ingredients brown rice flour and water. Per 2oz serving, it has 4g fiber and pretty much the same calories and other nutrients as the Peacock brand.&lt;br /&gt;&lt;br /&gt;If you do try this Peacock brand noodles and have not used rice noodles before, you will need to seek guidance elsewhere on preparation. As others have pointed out, the Peacock package gives almost no directions on how to prepare the product, aside from a brief mention in the recipes (in the header text it does say that they are &quot;easy-to-cook&quot; but does not say how). It also contains a very strange recipe for rice noodles: Aglio Olio style - this is an Italian recipe for noodles with olive oil/garlic/sprinkled with grated cheese that I think would not be very tasty. The second recipe appears to be for a soup with veggie strips. Neither recipe gives amounts or much direction. In comparison, the Annie Chun&#x27;s package gives clear, specific directions on rice noodle preparation and two recipes.&lt;br /&gt;&lt;br /&gt;I use rice noodles = maifun = rice sticks = sometimes called vermicelli for making the Vietnamese salad &quot;bun tofu&quot;, to serve with stir-fried veggies or in lettuce rolls. They can also be used in spring rolls/egg rolls. When cooking with thin rice noodles, be careful not to oversoak/overcook/overmix or they tend to disintegrate. Asian rice noodle vermicelli (maifun) are not the same as Italian vermicelli and are not readily interchangeable. If making an Italian recipe, the best results would be expected from Italian pasta and not maifun.&lt;br /&gt;&lt;br /&gt;A few final notes... Both Peacock and Annie Chun&#x27;s brown rice noodles are gluten free. The Peacock is made in Singapore and the Annie Chun&#x27;s in Thailand. The Peacock noodles do taste fine (kind of bland), but so do the Annie Chun&#x27;s. At this time, they are both approximately the same price. Peacock come in an plastic bag with some noodle crushage upon shipping; Annie Chun&#x27;s are perfect upon removal from their cellophane bag in a box. Overall, I highly recommend the Annie Chun&#x27;s Maifun as a healthier option over the Peacock brand. On a related note, the Annie Chun&#x27;s soba and brown rice pad thai noodles are also excellent.&lt;br /&gt;&lt;br /&gt;Rating for this product: 2.5 stars rounded down to 2 stars.</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">-0.452577 </td><td style=\"text-align: right;\">-0.113423 </td><td style=\"text-align: right;\"> 0.138709 </td><td style=\"text-align: right;\"> 0.182277 </td><td style=\"text-align: right;\"> 0.215801 </td><td style=\"text-align: right;\">0.375206 </td><td style=\"text-align: right;\"> 0.136874 </td><td style=\"text-align: right;\">-0.419679 </td><td style=\"text-align: right;\">-0.0637614</td><td style=\"text-align: right;\"> 0.0869655 </td><td style=\"text-align: right;\"> 0.170927 </td><td style=\"text-align: right;\"> 0.0831557</td><td style=\"text-align: right;\"> 0.117467 </td><td style=\"text-align: right;\"> 0.263678 </td><td style=\"text-align: right;\"> 0.0261424 </td><td style=\"text-align: right;\">-0.014187 </td><td style=\"text-align: right;\"> 2.95397e-05</td><td style=\"text-align: right;\"> 0.010737 </td><td style=\"text-align: right;\"> 0.190601 </td><td style=\"text-align: right;\">-0.424746 </td><td style=\"text-align: right;\">-0.206713 </td><td style=\"text-align: right;\"> 0.303866 </td><td style=\"text-align: right;\"> 0.0229484 </td><td style=\"text-align: right;\"> 0.278378 </td><td style=\"text-align: right;\">-0.413625 </td><td style=\"text-align: right;\">-0.415573 </td><td style=\"text-align: right;\"> 0.0516297 </td><td style=\"text-align: right;\">-0.317631 </td><td style=\"text-align: right;\">-0.311793 </td><td style=\"text-align: right;\">-0.268484 </td><td style=\"text-align: right;\"> 0.133884 </td><td style=\"text-align: right;\">-0.152781 </td><td style=\"text-align: right;\">-0.308243 </td><td style=\"text-align: right;\"> 0.396307 </td><td style=\"text-align: right;\"> 0.046927 </td><td style=\"text-align: right;\"> 0.17968 </td><td style=\"text-align: right;\"> 0.272848 </td><td style=\"text-align: right;\"> 0.703343 </td><td style=\"text-align: right;\">-0.181181 </td><td style=\"text-align: right;\"> 0.303255 </td><td style=\"text-align: right;\"> 0.0247568 </td><td style=\"text-align: right;\">-0.167266 </td><td style=\"text-align: right;\"> 0.333582 </td><td style=\"text-align: right;\">-0.164405 </td><td style=\"text-align: right;\"> 0.273838 </td><td style=\"text-align: right;\">-0.860874 </td><td style=\"text-align: right;\"> 0.204537 </td><td style=\"text-align: right;\"> 0.17322 </td><td style=\"text-align: right;\"> 0.464163 </td><td style=\"text-align: right;\">-0.064438 </td><td style=\"text-align: right;\">-0.166222 </td><td style=\"text-align: right;\"> 0.455148 </td><td style=\"text-align: right;\"> 0.582417 </td><td style=\"text-align: right;\">-0.326859 </td><td style=\"text-align: right;\">-0.103194 </td><td style=\"text-align: right;\"> 0.292087 </td><td style=\"text-align: right;\"> 0.264221 </td><td style=\"text-align: right;\">-0.158491 </td><td style=\"text-align: right;\"> 0.79183 </td><td style=\"text-align: right;\"> 0.108343 </td><td style=\"text-align: right;\">-0.116043 </td><td style=\"text-align: right;\">-0.178905 </td><td style=\"text-align: right;\">-0.154207 </td><td style=\"text-align: right;\">-0.219633 </td><td style=\"text-align: right;\">0.186179 </td><td style=\"text-align: right;\">-0.0761969 </td><td style=\"text-align: right;\">-0.251955 </td><td style=\"text-align: right;\">-0.422602 </td><td style=\"text-align: right;\">-0.0589413 </td><td style=\"text-align: right;\"> 0.209728 </td><td style=\"text-align: right;\">0.217822 </td><td style=\"text-align: right;\"> 0.00609156</td><td style=\"text-align: right;\">-0.150387 </td><td style=\"text-align: right;\">-0.47817 </td><td style=\"text-align: right;\">-0.812136 </td><td style=\"text-align: right;\"> 0.429651 </td><td style=\"text-align: right;\"> 0.179498 </td><td style=\"text-align: right;\">-0.540474 </td><td style=\"text-align: right;\"> 0.149226 </td><td style=\"text-align: right;\">-0.257322 </td><td style=\"text-align: right;\">-0.338138 </td><td style=\"text-align: right;\"> 0.674082 </td><td style=\"text-align: right;\"> 0.234841 </td><td style=\"text-align: right;\">-0.0127675 </td><td style=\"text-align: right;\"> 0.0860772</td><td style=\"text-align: right;\"> 0.0177764 </td><td style=\"text-align: right;\"> 0.0524732 </td><td style=\"text-align: right;\">-0.440476 </td><td style=\"text-align: right;\"> 0.134106 </td><td style=\"text-align: right;\"> 0.0706629 </td><td style=\"text-align: right;\">-0.465079 </td><td style=\"text-align: right;\"> 0.0521292 </td><td style=\"text-align: right;\">-0.0426664</td><td style=\"text-align: right;\">-0.171736 </td><td style=\"text-align: right;\"> 0.291847 </td><td style=\"text-align: right;\">-0.646853 </td><td style=\"text-align: right;\"> 0.181061 </td><td style=\"text-align: right;\"> 0.257701 </td><td style=\"text-align: right;\"> 0.165145 </td><td style=\"text-align: right;\"> 0.42037 </td></tr>\n<tr><td>B001R3BQFW </td><td>AM50E42AFUVNL </td><td>Taste great. </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">126555</td><td>T. Higley &quot;Tina&quot; </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.32356e+09</td><td>I have tried many different drink mix, this is the best tasting by far. It does not have the after taste of the sweetener and I really like it, it is pretty strong, so I use a big water bottle (20 oz) for one tube, it still a little stronger than I like, but it is just my taste. </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">-0.0448968 </td><td style=\"text-align: right;\"> 0.0637996 </td><td style=\"text-align: right;\">-0.0264709 </td><td style=\"text-align: right;\"> 0.0365484 </td><td style=\"text-align: right;\">-0.0477997 </td><td style=\"text-align: right;\">0.1581 </td><td style=\"text-align: right;\"> 0.00405392</td><td style=\"text-align: right;\"> 0.0221248 </td><td style=\"text-align: right;\"> 0.126317 </td><td style=\"text-align: right;\"> 0.0537661 </td><td style=\"text-align: right;\"> 0.00186567</td><td style=\"text-align: right;\"> 0.191703 </td><td style=\"text-align: right;\">-0.0857201</td><td style=\"text-align: right;\">-0.0257879 </td><td style=\"text-align: right;\"> 0.0510087 </td><td style=\"text-align: right;\">-0.00382809</td><td style=\"text-align: right;\">-0.10507 </td><td style=\"text-align: right;\"> 0.0853127 </td><td style=\"text-align: right;\">-0.0472438</td><td style=\"text-align: right;\">-0.112961 </td><td style=\"text-align: right;\"> 0.0210565 </td><td style=\"text-align: right;\"> 0.0023231</td><td style=\"text-align: right;\"> 0.0381847 </td><td style=\"text-align: right;\"> 0.020215 </td><td style=\"text-align: right;\">-0.00684582</td><td style=\"text-align: right;\"> 0.0336772 </td><td style=\"text-align: right;\"> 0.000759529</td><td style=\"text-align: right;\">-0.152974 </td><td style=\"text-align: right;\">-0.143636 </td><td style=\"text-align: right;\"> 0.049318 </td><td style=\"text-align: right;\">-0.0337021 </td><td style=\"text-align: right;\"> 0.0425444 </td><td style=\"text-align: right;\">-0.049597 </td><td style=\"text-align: right;\"> 0.0165418 </td><td style=\"text-align: right;\">-0.00176342</td><td style=\"text-align: right;\">-0.0176381 </td><td style=\"text-align: right;\">-0.0761584 </td><td style=\"text-align: right;\"> 0.0703652</td><td style=\"text-align: right;\">-0.0896073 </td><td style=\"text-align: right;\"> 0.0946777</td><td style=\"text-align: right;\">-0.0769748 </td><td style=\"text-align: right;\">-0.10356 </td><td style=\"text-align: right;\"> 0.0486011</td><td style=\"text-align: right;\"> 0.0529407 </td><td style=\"text-align: right;\"> 0.0283809 </td><td style=\"text-align: right;\">-0.100215 </td><td style=\"text-align: right;\"> 0.0704391 </td><td style=\"text-align: right;\">-0.0395672 </td><td style=\"text-align: right;\"> 0.0186939 </td><td style=\"text-align: right;\"> 0.0291437 </td><td style=\"text-align: right;\">-0.0784622</td><td style=\"text-align: right;\"> 0.08415 </td><td style=\"text-align: right;\"> 0.0509581 </td><td style=\"text-align: right;\">-0.010384 </td><td style=\"text-align: right;\">-0.0655889 </td><td style=\"text-align: right;\"> 0.0313172 </td><td style=\"text-align: right;\"> 0.161041 </td><td style=\"text-align: right;\">-0.0693216</td><td style=\"text-align: right;\"> 0.0230501 </td><td style=\"text-align: right;\">-0.0481256 </td><td style=\"text-align: right;\"> 0.0522207</td><td style=\"text-align: right;\">-0.0185826 </td><td style=\"text-align: right;\">-0.073891 </td><td style=\"text-align: right;\">-0.0442866 </td><td style=\"text-align: right;\">0.0857146 </td><td style=\"text-align: right;\">-0.0156655 </td><td style=\"text-align: right;\"> 0.0475037 </td><td style=\"text-align: right;\">-0.119135 </td><td style=\"text-align: right;\"> 0.0641011 </td><td style=\"text-align: right;\"> 0.0828759 </td><td style=\"text-align: right;\">0.099265 </td><td style=\"text-align: right;\">-0.0904073 </td><td style=\"text-align: right;\">-0.0209445 </td><td style=\"text-align: right;\">-0.0531817 </td><td style=\"text-align: right;\"> 0.086743 </td><td style=\"text-align: right;\">-0.0347895 </td><td style=\"text-align: right;\"> 0.0701843 </td><td style=\"text-align: right;\">-0.178539 </td><td style=\"text-align: right;\">-0.0872093 </td><td style=\"text-align: right;\"> 0.0371891</td><td style=\"text-align: right;\"> 0.0652032 </td><td style=\"text-align: right;\">-0.0987059</td><td style=\"text-align: right;\"> 0.0230445 </td><td style=\"text-align: right;\">-0.0699361 </td><td style=\"text-align: right;\"> 0.139951 </td><td style=\"text-align: right;\">-0.0344978 </td><td style=\"text-align: right;\">-0.0641258 </td><td style=\"text-align: right;\">-0.0340536</td><td style=\"text-align: right;\">-0.0221485 </td><td style=\"text-align: right;\">-0.0848722 </td><td style=\"text-align: right;\">-0.0422028 </td><td style=\"text-align: right;\"> 0.020315 </td><td style=\"text-align: right;\"> 0.0234233</td><td style=\"text-align: right;\"> 0.00761684</td><td style=\"text-align: right;\">-0.0300293 </td><td style=\"text-align: right;\">-0.0190704</td><td style=\"text-align: right;\"> 0.0275005</td><td style=\"text-align: right;\"> 0.0642959</td><td style=\"text-align: right;\"> 0.0682783 </td><td style=\"text-align: right;\"> 0.138162 </td></tr>\n<tr><td>B005HGAV8I </td><td>A2I5KDNOESGJ1H</td><td>variety galore </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">438837</td><td>TJ </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.33402e+09</td><td>This is my favorite item to order for my Keurig. There are so many flavors, my finicky palate never gets bored! The only downside is there are probably 5-6 decaf varieties. I don&#x27;t drink decaf (I REQUIRE copious amounts of caffeine), so they sit on the shelf... </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 0.0180591 </td><td style=\"text-align: right;\"> 0.0699772 </td><td style=\"text-align: right;\"> 0.0587516 </td><td style=\"text-align: right;\">-0.0706088 </td><td style=\"text-align: right;\"> 0.00662619</td><td style=\"text-align: right;\">0.114756 </td><td style=\"text-align: right;\">-0.144086 </td><td style=\"text-align: right;\"> 0.0275614 </td><td style=\"text-align: right;\">-0.0864321</td><td style=\"text-align: right;\"> 0.0979501 </td><td style=\"text-align: right;\"> 0.0339819 </td><td style=\"text-align: right;\">-0.169916 </td><td style=\"text-align: right;\"> 0.032254 </td><td style=\"text-align: right;\">-0.000541935</td><td style=\"text-align: right;\"> 0.0469471 </td><td style=\"text-align: right;\"> 0.0967113 </td><td style=\"text-align: right;\">-0.0881062 </td><td style=\"text-align: right;\">-0.0478772 </td><td style=\"text-align: right;\">-0.0772477</td><td style=\"text-align: right;\"> 0.0541128 </td><td style=\"text-align: right;\"> 0.0453607 </td><td style=\"text-align: right;\">-0.0475929</td><td style=\"text-align: right;\">-0.062531 </td><td style=\"text-align: right;\"> 0.0431321 </td><td style=\"text-align: right;\"> 0.046509 </td><td style=\"text-align: right;\">-0.00622096 </td><td style=\"text-align: right;\"> 0.0683421 </td><td style=\"text-align: right;\"> 0.00586513</td><td style=\"text-align: right;\"> 0.0312982 </td><td style=\"text-align: right;\"> 0.0244473 </td><td style=\"text-align: right;\">-0.0195214 </td><td style=\"text-align: right;\">-0.0243877 </td><td style=\"text-align: right;\"> 0.00131361</td><td style=\"text-align: right;\"> 0.0688599 </td><td style=\"text-align: right;\"> 0.0185295 </td><td style=\"text-align: right;\"> 0.00569342</td><td style=\"text-align: right;\"> 0.0909279 </td><td style=\"text-align: right;\">-0.0444268</td><td style=\"text-align: right;\"> 0.00835721 </td><td style=\"text-align: right;\">-0.0653897</td><td style=\"text-align: right;\">-0.059535 </td><td style=\"text-align: right;\"> 0.010251 </td><td style=\"text-align: right;\"> 0.0192812</td><td style=\"text-align: right;\"> 0.00349173</td><td style=\"text-align: right;\">-0.0662491 </td><td style=\"text-align: right;\"> 0.0736086</td><td style=\"text-align: right;\"> 0.0397233 </td><td style=\"text-align: right;\">-0.0803349 </td><td style=\"text-align: right;\">-0.0402739 </td><td style=\"text-align: right;\">-0.00139385</td><td style=\"text-align: right;\">-0.0287986</td><td style=\"text-align: right;\"> 0.0184184</td><td style=\"text-align: right;\"> 0.111406 </td><td style=\"text-align: right;\">-0.00877129</td><td style=\"text-align: right;\"> 0.0332851 </td><td style=\"text-align: right;\">-0.031644 </td><td style=\"text-align: right;\">-0.0445515</td><td style=\"text-align: right;\"> 0.0795765</td><td style=\"text-align: right;\"> 0.00433454</td><td style=\"text-align: right;\">-0.0518782 </td><td style=\"text-align: right;\"> 0.0171259</td><td style=\"text-align: right;\">-0.0580711 </td><td style=\"text-align: right;\">-0.0864474 </td><td style=\"text-align: right;\"> 0.0490716 </td><td style=\"text-align: right;\">0.00418072</td><td style=\"text-align: right;\">-0.113097 </td><td style=\"text-align: right;\">-0.0315454 </td><td style=\"text-align: right;\"> 0.0728978 </td><td style=\"text-align: right;\">-0.106284 </td><td style=\"text-align: right;\"> 0.0650408 </td><td style=\"text-align: right;\">0.0351158 </td><td style=\"text-align: right;\"> 0.00593883</td><td style=\"text-align: right;\">-0.0179361 </td><td style=\"text-align: right;\"> 0.00257075</td><td style=\"text-align: right;\"> 0.0165245 </td><td style=\"text-align: right;\">-0.00649849</td><td style=\"text-align: right;\">-0.0536741 </td><td style=\"text-align: right;\">-0.0414687 </td><td style=\"text-align: right;\">-0.0469103 </td><td style=\"text-align: right;\">-0.0308045</td><td style=\"text-align: right;\">-0.0177595 </td><td style=\"text-align: right;\"> 0.0449958</td><td style=\"text-align: right;\"> 0.120506 </td><td style=\"text-align: right;\">-0.00437376</td><td style=\"text-align: right;\"> 0.0102097</td><td style=\"text-align: right;\">-0.0482851 </td><td style=\"text-align: right;\"> 0.00225975</td><td style=\"text-align: right;\">-0.054261 </td><td style=\"text-align: right;\">-0.0365799 </td><td style=\"text-align: right;\">-0.0697037 </td><td style=\"text-align: right;\">-0.095618 </td><td style=\"text-align: right;\">-0.0294475 </td><td style=\"text-align: right;\"> 0.0174716</td><td style=\"text-align: right;\"> 0.06957 </td><td style=\"text-align: right;\"> 0.101367 </td><td style=\"text-align: right;\"> 0.0650128</td><td style=\"text-align: right;\"> 0.0111191</td><td style=\"text-align: right;\">-0.023346 </td><td style=\"text-align: right;\">-0.0656063 </td><td style=\"text-align: right;\">-0.042266 </td></tr>\n<tr><td>B000GFYRHQ </td><td>A3A7YUR6FS6ZCI</td><td>Bigelow Earl Grey Green Tea </td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">245379</td><td>Tea Lover </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.17841e+09</td><td>Tastes like Earl Grey, but it&#x27;s green tea so it&#x27;s healthier. </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">-0.0148666 </td><td style=\"text-align: right;\"> 0.00547086</td><td style=\"text-align: right;\">-0.0598337 </td><td style=\"text-align: right;\"> 0.00896689</td><td style=\"text-align: right;\">-0.018512 </td><td style=\"text-align: right;\">0.00104907</td><td style=\"text-align: right;\"> 0.0193834 </td><td style=\"text-align: right;\">-0.0303619 </td><td style=\"text-align: right;\"> 0.0139678</td><td style=\"text-align: right;\">-0.00858391 </td><td style=\"text-align: right;\"> 0.0359025 </td><td style=\"text-align: right;\"> 0.0562054</td><td style=\"text-align: right;\">-0.0571819</td><td style=\"text-align: right;\">-0.0157056 </td><td style=\"text-align: right;\">-0.0115961 </td><td style=\"text-align: right;\">-0.0183816 </td><td style=\"text-align: right;\">-0.0173937 </td><td style=\"text-align: right;\"> 0.0157471 </td><td style=\"text-align: right;\">-0.0117092</td><td style=\"text-align: right;\">-0.0329408 </td><td style=\"text-align: right;\">-0.0213905 </td><td style=\"text-align: right;\"> 0.0359655</td><td style=\"text-align: right;\">-0.00932292</td><td style=\"text-align: right;\"> 0.00242552</td><td style=\"text-align: right;\">-0.0265486 </td><td style=\"text-align: right;\"> 0.000809336</td><td style=\"text-align: right;\"> 0.0293861 </td><td style=\"text-align: right;\">-0.0132304 </td><td style=\"text-align: right;\">-0.0448134 </td><td style=\"text-align: right;\"> 0.00535186</td><td style=\"text-align: right;\"> 0.0132919 </td><td style=\"text-align: right;\"> 0.0232046 </td><td style=\"text-align: right;\">-0.0398275 </td><td style=\"text-align: right;\"> 0.0242108 </td><td style=\"text-align: right;\">-0.00336366</td><td style=\"text-align: right;\"> 0.00886357</td><td style=\"text-align: right;\"> 0.00396398</td><td style=\"text-align: right;\"> 0.0441849</td><td style=\"text-align: right;\"> 0.0122714 </td><td style=\"text-align: right;\"> 0.0319533</td><td style=\"text-align: right;\">-0.00348067</td><td style=\"text-align: right;\">-0.0294214</td><td style=\"text-align: right;\"> 0.0153113</td><td style=\"text-align: right;\">-0.0184586 </td><td style=\"text-align: right;\"> 0.0173887 </td><td style=\"text-align: right;\">-0.0453563</td><td style=\"text-align: right;\"> 0.0025473 </td><td style=\"text-align: right;\"> 0.0130461 </td><td style=\"text-align: right;\"> 0.0404173 </td><td style=\"text-align: right;\"> 0.00737931</td><td style=\"text-align: right;\">-0.0420292</td><td style=\"text-align: right;\"> 0.0284435</td><td style=\"text-align: right;\"> 0.0341711 </td><td style=\"text-align: right;\"> 0.0170331 </td><td style=\"text-align: right;\">-0.00459612</td><td style=\"text-align: right;\"> 0.0128041 </td><td style=\"text-align: right;\"> 0.0366783</td><td style=\"text-align: right;\">-0.0333503</td><td style=\"text-align: right;\">-0.00117786</td><td style=\"text-align: right;\"> 0.00162055</td><td style=\"text-align: right;\"> 0.0079412</td><td style=\"text-align: right;\"> 0.0146446 </td><td style=\"text-align: right;\"> 0.00372818</td><td style=\"text-align: right;\"> 0.00244897</td><td style=\"text-align: right;\">0.0106038 </td><td style=\"text-align: right;\"> 0.0119089 </td><td style=\"text-align: right;\"> 0.0500549 </td><td style=\"text-align: right;\">-0.0508029 </td><td style=\"text-align: right;\"> 0.0211623 </td><td style=\"text-align: right;\">-0.000432948</td><td style=\"text-align: right;\">0.0534101 </td><td style=\"text-align: right;\">-0.0262044 </td><td style=\"text-align: right;\">-0.0137686 </td><td style=\"text-align: right;\"> 0.0017875 </td><td style=\"text-align: right;\">-0.00189653</td><td style=\"text-align: right;\"> 0.00516932</td><td style=\"text-align: right;\"> 0.0155711 </td><td style=\"text-align: right;\">-0.0218159 </td><td style=\"text-align: right;\">-0.0490738 </td><td style=\"text-align: right;\"> 0.0248318</td><td style=\"text-align: right;\"> 0.00470242</td><td style=\"text-align: right;\">-0.0251454</td><td style=\"text-align: right;\"> 0.0269307 </td><td style=\"text-align: right;\">-0.0166039 </td><td style=\"text-align: right;\"> 0.0529522</td><td style=\"text-align: right;\"> 0.000367695</td><td style=\"text-align: right;\">-0.0106417 </td><td style=\"text-align: right;\"> 0.0047698</td><td style=\"text-align: right;\"> 0.00729426</td><td style=\"text-align: right;\"> 0.0195029 </td><td style=\"text-align: right;\">-0.0165866 </td><td style=\"text-align: right;\"> 0.0460096 </td><td style=\"text-align: right;\">-0.0206906</td><td style=\"text-align: right;\"> 0.00637032</td><td style=\"text-align: right;\">-0.00561091</td><td style=\"text-align: right;\">-0.0369974</td><td style=\"text-align: right;\"> 0.0197267</td><td style=\"text-align: right;\"> 0.0345398</td><td style=\"text-align: right;\"> 0.00685155</td><td style=\"text-align: right;\"> 0.0188883</td></tr>\n</tbody>\n</table>"},"metadata":{},"output_type":"display_data"},{"data":{"text/plain":""},"execution_count":29,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## Model 2: GBM with Review vectors\n<blockquote>\n Now we can train a GBM like before, but include the review vectors. This should hopefully increase improvement! We'll log everything to mlflow so we can compare the results.\n </i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"trusted":false},"cell_type":"code","source":"from h2o.estimators import H2OGradientBoostingEstimator\nmlflow.end_run()\nmlflow.start_run(run_name='GBM with word vectors')\nRATIOS = [0.7,0.15]\n# Train Test Split\next_train,ext_test,ext_valid = ext_reviews.split_frame(ratios=RATIOS)\n# Log our ratios\nmlflow.lp('ratios',RATIOS)\n# Log what word vectors we're using\nmlflow.lp('word vectors', 'reviews')\n\nnon_token_predictors = ['ProductId', 'UserId', 'HelpfulnessNumerator', 'HelpfulnessDenominator', 'Time']\npredictors = non_token_predictors + review_vecs.names\nresponse = 'PositiveReview'\n\nmlflow.lp('label', response)\n# There are a lot of predictors here (C1-C100 + features) so let's shorten that\nmlflow.lp('predictors', non_token_predictors + [f'C1-C{len(review_vecs.columns)}'])\n\ngbm_embeddings = H2OGradientBoostingEstimator(stopping_metric = \"AUC\", stopping_tolerance = 0.001,\n stopping_rounds = 5, score_tree_interval = 10,\n model_id = \"gbm_embeddings.hex\"\n )\nwith mlflow.timer('train_time'):\n gbm_embeddings.train(x = predictors, y = response, \n training_frame = ext_train, validation_frame = ext_test\n )\n\n# Log the model params to mlflow\nmlflow.log_params(gbm_embeddings.get_params())\n# Log the model to MLFlow\nmlflow.log_model(gbm_embeddings, 'vectorized_model')\n# Log the training notebook to MLFlow\nmlflow.log_artifact('MLManager H2O Demo.ipynb', 'training_notebook')\ngbm_embeddings","execution_count":30,"outputs":[{"name":"stdout","output_type":"stream","text":"Starting Code Block train_time... gbm Model Build progress: |███████████████████████████████████████████████| 100%\nDone.\nCode Block train_time:\nRan in 5.513 secs\nRan in 0.092 mins\nSaving artifact of size: 1597.877 KB to Splice Machine DB\nSaving artifact of size: 666.985 KB to Splice Machine DB\nModel Details\n=============\nH2OGradientBoostingEstimator : Gradient Boosting Machine\nModel Key: gbm_embeddings.hex\n\n\nModel Summary: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>number_of_trees</th>\n <th>number_of_internal_trees</th>\n <th>model_size_in_bytes</th>\n <th>min_depth</th>\n <th>max_depth</th>\n <th>mean_depth</th>\n <th>min_leaves</th>\n <th>max_leaves</th>\n <th>mean_leaves</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>50.0</td>\n <td>50.0</td>\n <td>22916.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>29.0</td>\n <td>32.0</td>\n <td>31.82</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" number_of_trees number_of_internal_trees model_size_in_bytes \\\n0 50.0 50.0 22916.0 \n\n min_depth max_depth mean_depth min_leaves max_leaves mean_leaves \n0 5.0 5.0 5.0 29.0 32.0 31.82 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n\nModelMetricsBinomial: gbm\n** Reported on train data. **\n\nMSE: 0.11763099104636422\nRMSE: 0.34297374687629406\nLogLoss: 0.38139378118952527\nMean Per-Class Error: 0.21461955222976004\nAUC: 0.8683543868961463\nAUCPR: 0.9550113279093856\nGini: 0.7367087737922926\n\nConfusion Matrix (Act/Pred) for max f1 @ threshold = 0.603683879583344: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>Error</th>\n <th>Rate</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>6784.0</td>\n <td>8432.0</td>\n <td>0.5542</td>\n <td>(8432.0/15216.0)</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>2227.0</td>\n <td>52529.0</td>\n <td>0.0407</td>\n <td>(2227.0/54756.0)</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Total</td>\n <td>9011.0</td>\n <td>60961.0</td>\n <td>0.1523</td>\n <td>(10659.0/69972.0)</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" 0 1 Error Rate\n0 0 6784.0 8432.0 0.5542 (8432.0/15216.0)\n1 1 2227.0 52529.0 0.0407 (2227.0/54756.0)\n2 Total 9011.0 60961.0 0.1523 (10659.0/69972.0)"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nMaximum Metrics: Maximum metrics at their respective thresholds\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>metric</th>\n <th>threshold</th>\n <th>value</th>\n <th>idx</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>max f1</td>\n <td>0.603684</td>\n <td>0.907887</td>\n <td>245.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>max f2</td>\n <td>0.438034</td>\n <td>0.952923</td>\n <td>310.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>max f0point5</td>\n <td>0.742401</td>\n <td>0.899509</td>\n <td>174.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>max accuracy</td>\n <td>0.639008</td>\n <td>0.848225</td>\n <td>230.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>max precision</td>\n <td>0.970715</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>max recall</td>\n <td>0.121993</td>\n <td>1.000000</td>\n <td>392.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>max specificity</td>\n <td>0.970715</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>max absolute_mcc</td>\n <td>0.702468</td>\n <td>0.526670</td>\n <td>198.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>max min_per_class_accuracy</td>\n <td>0.777815</td>\n <td>0.784043</td>\n <td>152.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>max mean_per_class_accuracy</td>\n <td>0.783004</td>\n <td>0.785380</td>\n <td>149.0</td>\n </tr>\n <tr>\n <th>10</th>\n <td>max tns</td>\n <td>0.970715</td>\n <td>15216.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>11</th>\n <td>max fns</td>\n <td>0.970715</td>\n <td>54744.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>12</th>\n <td>max fps</td>\n <td>0.041544</td>\n <td>15216.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>13</th>\n <td>max tps</td>\n <td>0.121993</td>\n <td>54756.000000</td>\n <td>392.0</td>\n </tr>\n <tr>\n <th>14</th>\n <td>max tnr</td>\n <td>0.970715</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>15</th>\n <td>max fnr</td>\n <td>0.970715</td>\n <td>0.999781</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>16</th>\n <td>max fpr</td>\n <td>0.041544</td>\n <td>1.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>17</th>\n <td>max tpr</td>\n <td>0.121993</td>\n <td>1.000000</td>\n <td>392.0</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" metric threshold value idx\n0 max f1 0.603684 0.907887 245.0\n1 max f2 0.438034 0.952923 310.0\n2 max f0point5 0.742401 0.899509 174.0\n3 max accuracy 0.639008 0.848225 230.0\n4 max precision 0.970715 1.000000 0.0\n5 max recall 0.121993 1.000000 392.0\n6 max specificity 0.970715 1.000000 0.0\n7 max absolute_mcc 0.702468 0.526670 198.0\n8 max min_per_class_accuracy 0.777815 0.784043 152.0\n9 max mean_per_class_accuracy 0.783004 0.785380 149.0\n10 max tns 0.970715 15216.000000 0.0\n11 max fns 0.970715 54744.000000 0.0\n12 max fps 0.041544 15216.000000 399.0\n13 max tps 0.121993 54756.000000 392.0\n14 max tnr 0.970715 1.000000 0.0\n15 max fnr 0.970715 0.999781 0.0\n16 max fpr 0.041544 1.000000 399.0\n17 max tpr 0.121993 1.000000 392.0"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nGains/Lift Table: Avg response rate: 78.25 %, avg score: 78.22 %\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>group</th>\n <th>cumulative_data_fraction</th>\n <th>lower_threshold</th>\n <th>lift</th>\n <th>cumulative_lift</th>\n <th>response_rate</th>\n <th>score</th>\n <th>cumulative_response_rate</th>\n <th>cumulative_score</th>\n <th>capture_rate</th>\n <th>cumulative_capture_rate</th>\n <th>gain</th>\n <th>cumulative_gain</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>1</td>\n <td>0.010004</td>\n <td>0.950928</td>\n <td>1.274236</td>\n <td>1.274236</td>\n <td>0.997143</td>\n <td>0.957110</td>\n <td>0.997143</td>\n <td>0.957110</td>\n <td>0.012747</td>\n <td>0.012747</td>\n <td>27.423625</td>\n <td>27.423625</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2</td>\n <td>0.020008</td>\n <td>0.945205</td>\n <td>1.274236</td>\n <td>1.274236</td>\n <td>0.997143</td>\n <td>0.947873</td>\n <td>0.997143</td>\n <td>0.952492</td>\n <td>0.012747</td>\n <td>0.025495</td>\n <td>27.423625</td>\n <td>27.423625</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>3</td>\n <td>0.030012</td>\n <td>0.940951</td>\n <td>1.272411</td>\n <td>1.273628</td>\n <td>0.995714</td>\n <td>0.943057</td>\n <td>0.996667</td>\n <td>0.949347</td>\n <td>0.012729</td>\n <td>0.038224</td>\n <td>27.241069</td>\n <td>27.362773</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>4</td>\n <td>0.040002</td>\n <td>0.937519</td>\n <td>1.268747</td>\n <td>1.272409</td>\n <td>0.992847</td>\n <td>0.939208</td>\n <td>0.995713</td>\n <td>0.946815</td>\n <td>0.012674</td>\n <td>0.050899</td>\n <td>26.874653</td>\n <td>27.240874</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>5</td>\n <td>0.050006</td>\n <td>0.934455</td>\n <td>1.274236</td>\n <td>1.272774</td>\n <td>0.997143</td>\n <td>0.935936</td>\n <td>0.995999</td>\n <td>0.944638</td>\n <td>0.012747</td>\n <td>0.063646</td>\n <td>27.423625</td>\n <td>27.277434</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>6</td>\n <td>0.100011</td>\n <td>0.921381</td>\n <td>1.266931</td>\n <td>1.269853</td>\n <td>0.991426</td>\n <td>0.927596</td>\n <td>0.993712</td>\n <td>0.936117</td>\n <td>0.063354</td>\n <td>0.127000</td>\n <td>26.693090</td>\n <td>26.985262</td>\n </tr>\n <tr>\n <th>6</th>\n <td></td>\n <td>7</td>\n <td>0.150003</td>\n <td>0.911046</td>\n <td>1.253046</td>\n <td>1.264251</td>\n <td>0.980560</td>\n <td>0.916133</td>\n <td>0.989329</td>\n <td>0.929457</td>\n <td>0.062642</td>\n <td>0.189641</td>\n <td>25.304563</td>\n <td>26.425136</td>\n </tr>\n <tr>\n <th>7</th>\n <td></td>\n <td>8</td>\n <td>0.200009</td>\n <td>0.901310</td>\n <td>1.243922</td>\n <td>1.259169</td>\n <td>0.973421</td>\n <td>0.906064</td>\n <td>0.985352</td>\n <td>0.923608</td>\n <td>0.062203</td>\n <td>0.251845</td>\n <td>24.392236</td>\n <td>25.916875</td>\n </tr>\n <tr>\n <th>8</th>\n <td></td>\n <td>9</td>\n <td>0.300006</td>\n <td>0.882085</td>\n <td>1.228211</td>\n <td>1.248850</td>\n <td>0.961126</td>\n <td>0.891748</td>\n <td>0.977277</td>\n <td>0.912989</td>\n <td>0.122818</td>\n <td>0.374662</td>\n <td>22.821101</td>\n <td>24.884999</td>\n </tr>\n <tr>\n <th>9</th>\n <td></td>\n <td>10</td>\n <td>0.400003</td>\n <td>0.860930</td>\n <td>1.206660</td>\n <td>1.238303</td>\n <td>0.944262</td>\n <td>0.871642</td>\n <td>0.969024</td>\n <td>0.902652</td>\n <td>0.120663</td>\n <td>0.495325</td>\n <td>20.666025</td>\n <td>23.830293</td>\n </tr>\n <tr>\n <th>10</th>\n <td></td>\n <td>11</td>\n <td>0.500000</td>\n <td>0.835298</td>\n <td>1.162280</td>\n <td>1.223099</td>\n <td>0.909533</td>\n <td>0.848680</td>\n <td>0.957126</td>\n <td>0.891858</td>\n <td>0.116225</td>\n <td>0.611549</td>\n <td>16.228028</td>\n <td>22.309884</td>\n </tr>\n <tr>\n <th>11</th>\n <td></td>\n <td>12</td>\n <td>0.599997</td>\n <td>0.803029</td>\n <td>1.101829</td>\n <td>1.202888</td>\n <td>0.862227</td>\n <td>0.819804</td>\n <td>0.941310</td>\n <td>0.879849</td>\n <td>0.110180</td>\n <td>0.721729</td>\n <td>10.182856</td>\n <td>20.288761</td>\n </tr>\n <tr>\n <th>12</th>\n <td></td>\n <td>13</td>\n <td>0.699994</td>\n <td>0.758217</td>\n <td>1.007590</td>\n <td>1.174988</td>\n <td>0.788481</td>\n <td>0.781751</td>\n <td>0.919477</td>\n <td>0.865836</td>\n <td>0.100756</td>\n <td>0.822485</td>\n <td>0.758962</td>\n <td>17.498846</td>\n </tr>\n <tr>\n <th>13</th>\n <td></td>\n <td>14</td>\n <td>0.799991</td>\n <td>0.687910</td>\n <td>0.881390</td>\n <td>1.138289</td>\n <td>0.689724</td>\n <td>0.726272</td>\n <td>0.890759</td>\n <td>0.848391</td>\n <td>0.088136</td>\n <td>0.910622</td>\n <td>-11.861021</td>\n <td>13.828928</td>\n </tr>\n <tr>\n <th>14</th>\n <td></td>\n <td>15</td>\n <td>0.899989</td>\n <td>0.555041</td>\n <td>0.637391</td>\n <td>1.082635</td>\n <td>0.498785</td>\n <td>0.629473</td>\n <td>0.847207</td>\n <td>0.824067</td>\n <td>0.063737</td>\n <td>0.974359</td>\n <td>-36.260871</td>\n <td>8.263484</td>\n </tr>\n <tr>\n <th>15</th>\n <td></td>\n <td>16</td>\n <td>1.000000</td>\n <td>0.041544</td>\n <td>0.256381</td>\n <td>1.000000</td>\n <td>0.200629</td>\n <td>0.405392</td>\n <td>0.782542</td>\n <td>0.782194</td>\n <td>0.025641</td>\n <td>1.000000</td>\n <td>-74.361906</td>\n <td>0.000000</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" group cumulative_data_fraction lower_threshold lift \\\n0 1 0.010004 0.950928 1.274236 \n1 2 0.020008 0.945205 1.274236 \n2 3 0.030012 0.940951 1.272411 \n3 4 0.040002 0.937519 1.268747 \n4 5 0.050006 0.934455 1.274236 \n5 6 0.100011 0.921381 1.266931 \n6 7 0.150003 0.911046 1.253046 \n7 8 0.200009 0.901310 1.243922 \n8 9 0.300006 0.882085 1.228211 \n9 10 0.400003 0.860930 1.206660 \n10 11 0.500000 0.835298 1.162280 \n11 12 0.599997 0.803029 1.101829 \n12 13 0.699994 0.758217 1.007590 \n13 14 0.799991 0.687910 0.881390 \n14 15 0.899989 0.555041 0.637391 \n15 16 1.000000 0.041544 0.256381 \n\n cumulative_lift response_rate score cumulative_response_rate \\\n0 1.274236 0.997143 0.957110 0.997143 \n1 1.274236 0.997143 0.947873 0.997143 \n2 1.273628 0.995714 0.943057 0.996667 \n3 1.272409 0.992847 0.939208 0.995713 \n4 1.272774 0.997143 0.935936 0.995999 \n5 1.269853 0.991426 0.927596 0.993712 \n6 1.264251 0.980560 0.916133 0.989329 \n7 1.259169 0.973421 0.906064 0.985352 \n8 1.248850 0.961126 0.891748 0.977277 \n9 1.238303 0.944262 0.871642 0.969024 \n10 1.223099 0.909533 0.848680 0.957126 \n11 1.202888 0.862227 0.819804 0.941310 \n12 1.174988 0.788481 0.781751 0.919477 \n13 1.138289 0.689724 0.726272 0.890759 \n14 1.082635 0.498785 0.629473 0.847207 \n15 1.000000 0.200629 0.405392 0.782542 \n\n cumulative_score capture_rate cumulative_capture_rate gain \\\n0 0.957110 0.012747 0.012747 27.423625 \n1 0.952492 0.012747 0.025495 27.423625 \n2 0.949347 0.012729 0.038224 27.241069 \n3 0.946815 0.012674 0.050899 26.874653 \n4 0.944638 0.012747 0.063646 27.423625 \n5 0.936117 0.063354 0.127000 26.693090 \n6 0.929457 0.062642 0.189641 25.304563 \n7 0.923608 0.062203 0.251845 24.392236 \n8 0.912989 0.122818 0.374662 22.821101 \n9 0.902652 0.120663 0.495325 20.666025 \n10 0.891858 0.116225 0.611549 16.228028 \n11 0.879849 0.110180 0.721729 10.182856 \n12 0.865836 0.100756 0.822485 0.758962 \n13 0.848391 0.088136 0.910622 -11.861021 \n14 0.824067 0.063737 0.974359 -36.260871 \n15 0.782194 0.025641 1.000000 -74.361906 \n\n cumulative_gain \n0 27.423625 \n1 27.423625 \n2 27.362773 \n3 27.240874 \n4 27.277434 \n5 26.985262 \n6 26.425136 \n7 25.916875 \n8 24.884999 \n9 23.830293 \n10 22.309884 \n11 20.288761 \n12 17.498846 \n13 13.828928 \n14 8.263484 \n15 0.000000 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n\nModelMetricsBinomial: gbm\n** Reported on validation data. **\n\nMSE: 0.12646804143667775\nRMSE: 0.35562345456490596\nLogLoss: 0.40431468438767093\nMean Per-Class Error: 0.24322534304415488\nAUC: 0.8356918248557774\nAUCPR: 0.941458335659589\nGini: 0.6713836497115548\n\nConfusion Matrix (Act/Pred) for max f1 @ threshold = 0.5467423242599629: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>Error</th>\n <th>Rate</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>1045.0</td>\n <td>2250.0</td>\n <td>0.6829</td>\n <td>(2250.0/3295.0)</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>302.0</td>\n <td>11423.0</td>\n <td>0.0258</td>\n <td>(302.0/11725.0)</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Total</td>\n <td>1347.0</td>\n <td>13673.0</td>\n <td>0.1699</td>\n <td>(2552.0/15020.0)</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" 0 1 Error Rate\n0 0 1045.0 2250.0 0.6829 (2250.0/3295.0)\n1 1 302.0 11423.0 0.0258 (302.0/11725.0)\n2 Total 1347.0 13673.0 0.1699 (2552.0/15020.0)"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nMaximum Metrics: Maximum metrics at their respective thresholds\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>metric</th>\n <th>threshold</th>\n <th>value</th>\n <th>idx</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>max f1</td>\n <td>0.546742</td>\n <td>0.899520</td>\n <td>274.0</td>\n </tr>\n <tr>\n <th>1</th>\n <td>max f2</td>\n <td>0.400301</td>\n <td>0.949257</td>\n <td>326.0</td>\n </tr>\n <tr>\n <th>2</th>\n <td>max f0point5</td>\n <td>0.741179</td>\n <td>0.884006</td>\n <td>177.0</td>\n </tr>\n <tr>\n <th>3</th>\n <td>max accuracy</td>\n <td>0.605872</td>\n <td>0.833489</td>\n <td>250.0</td>\n </tr>\n <tr>\n <th>4</th>\n <td>max precision</td>\n <td>0.966835</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>5</th>\n <td>max recall</td>\n <td>0.171955</td>\n <td>1.000000</td>\n <td>385.0</td>\n </tr>\n <tr>\n <th>6</th>\n <td>max specificity</td>\n <td>0.966835</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>7</th>\n <td>max absolute_mcc</td>\n <td>0.728986</td>\n <td>0.460782</td>\n <td>184.0</td>\n </tr>\n <tr>\n <th>8</th>\n <td>max min_per_class_accuracy</td>\n <td>0.783641</td>\n <td>0.753263</td>\n <td>149.0</td>\n </tr>\n <tr>\n <th>9</th>\n <td>max mean_per_class_accuracy</td>\n <td>0.788654</td>\n <td>0.756775</td>\n <td>145.0</td>\n </tr>\n <tr>\n <th>10</th>\n <td>max tns</td>\n <td>0.966835</td>\n <td>3295.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>11</th>\n <td>max fns</td>\n <td>0.966835</td>\n <td>11715.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>12</th>\n <td>max fps</td>\n <td>0.064508</td>\n <td>3295.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>13</th>\n <td>max tps</td>\n <td>0.171955</td>\n <td>11725.000000</td>\n <td>385.0</td>\n </tr>\n <tr>\n <th>14</th>\n <td>max tnr</td>\n <td>0.966835</td>\n <td>1.000000</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>15</th>\n <td>max fnr</td>\n <td>0.966835</td>\n <td>0.999147</td>\n <td>0.0</td>\n </tr>\n <tr>\n <th>16</th>\n <td>max fpr</td>\n <td>0.064508</td>\n <td>1.000000</td>\n <td>399.0</td>\n </tr>\n <tr>\n <th>17</th>\n <td>max tpr</td>\n <td>0.171955</td>\n <td>1.000000</td>\n <td>385.0</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" metric threshold value idx\n0 max f1 0.546742 0.899520 274.0\n1 max f2 0.400301 0.949257 326.0\n2 max f0point5 0.741179 0.884006 177.0\n3 max accuracy 0.605872 0.833489 250.0\n4 max precision 0.966835 1.000000 0.0\n5 max recall 0.171955 1.000000 385.0\n6 max specificity 0.966835 1.000000 0.0\n7 max absolute_mcc 0.728986 0.460782 184.0\n8 max min_per_class_accuracy 0.783641 0.753263 149.0\n9 max mean_per_class_accuracy 0.788654 0.756775 145.0\n10 max tns 0.966835 3295.000000 0.0\n11 max fns 0.966835 11715.000000 0.0\n12 max fps 0.064508 3295.000000 399.0\n13 max tps 0.171955 11725.000000 385.0\n14 max tnr 0.966835 1.000000 0.0\n15 max fnr 0.966835 0.999147 0.0\n16 max fpr 0.064508 1.000000 399.0\n17 max tpr 0.171955 1.000000 385.0"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nGains/Lift Table: Avg response rate: 78.06 %, avg score: 78.21 %\n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>group</th>\n <th>cumulative_data_fraction</th>\n <th>lower_threshold</th>\n <th>lift</th>\n <th>cumulative_lift</th>\n <th>response_rate</th>\n <th>score</th>\n <th>cumulative_response_rate</th>\n <th>cumulative_score</th>\n <th>capture_rate</th>\n <th>cumulative_capture_rate</th>\n <th>gain</th>\n <th>cumulative_gain</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>1</td>\n <td>0.010053</td>\n <td>0.951514</td>\n <td>1.281023</td>\n <td>1.281023</td>\n <td>1.000000</td>\n <td>0.957157</td>\n <td>1.000000</td>\n <td>0.957157</td>\n <td>0.012878</td>\n <td>0.012878</td>\n <td>28.102345</td>\n <td>28.102345</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2</td>\n <td>0.020040</td>\n <td>0.944583</td>\n <td>1.272483</td>\n <td>1.276768</td>\n <td>0.993333</td>\n <td>0.947860</td>\n <td>0.996678</td>\n <td>0.952524</td>\n <td>0.012708</td>\n <td>0.025586</td>\n <td>27.248330</td>\n <td>27.676756</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>3</td>\n <td>0.030027</td>\n <td>0.939599</td>\n <td>1.281023</td>\n <td>1.278183</td>\n <td>1.000000</td>\n <td>0.942052</td>\n <td>0.997783</td>\n <td>0.949041</td>\n <td>0.012793</td>\n <td>0.038380</td>\n <td>28.102345</td>\n <td>27.818305</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>4</td>\n <td>0.040013</td>\n <td>0.936074</td>\n <td>1.272483</td>\n <td>1.276760</td>\n <td>0.993333</td>\n <td>0.937724</td>\n <td>0.996672</td>\n <td>0.946216</td>\n <td>0.012708</td>\n <td>0.051087</td>\n <td>27.248330</td>\n <td>27.676048</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>5</td>\n <td>0.050000</td>\n <td>0.933012</td>\n <td>1.263943</td>\n <td>1.274200</td>\n <td>0.986667</td>\n <td>0.934541</td>\n <td>0.994674</td>\n <td>0.943884</td>\n <td>0.012623</td>\n <td>0.063710</td>\n <td>26.394314</td>\n <td>27.420043</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>6</td>\n <td>0.100000</td>\n <td>0.920021</td>\n <td>1.255437</td>\n <td>1.264819</td>\n <td>0.980027</td>\n <td>0.926271</td>\n <td>0.987350</td>\n <td>0.935078</td>\n <td>0.062772</td>\n <td>0.126482</td>\n <td>25.543710</td>\n <td>26.481876</td>\n </tr>\n <tr>\n <th>6</th>\n <td></td>\n <td>7</td>\n <td>0.150000</td>\n <td>0.909208</td>\n <td>1.248614</td>\n <td>1.259417</td>\n <td>0.974700</td>\n <td>0.914476</td>\n <td>0.983134</td>\n <td>0.928211</td>\n <td>0.062431</td>\n <td>0.188913</td>\n <td>24.861407</td>\n <td>25.941720</td>\n </tr>\n <tr>\n <th>7</th>\n <td></td>\n <td>8</td>\n <td>0.200000</td>\n <td>0.899211</td>\n <td>1.229851</td>\n <td>1.252026</td>\n <td>0.960053</td>\n <td>0.904328</td>\n <td>0.977364</td>\n <td>0.922240</td>\n <td>0.061493</td>\n <td>0.250405</td>\n <td>22.985075</td>\n <td>25.202559</td>\n </tr>\n <tr>\n <th>8</th>\n <td></td>\n <td>9</td>\n <td>0.300000</td>\n <td>0.878844</td>\n <td>1.212793</td>\n <td>1.238948</td>\n <td>0.946738</td>\n <td>0.888916</td>\n <td>0.967155</td>\n <td>0.911132</td>\n <td>0.121279</td>\n <td>0.371684</td>\n <td>21.279318</td>\n <td>23.894812</td>\n </tr>\n <tr>\n <th>9</th>\n <td></td>\n <td>10</td>\n <td>0.400000</td>\n <td>0.857422</td>\n <td>1.171002</td>\n <td>1.221962</td>\n <td>0.914115</td>\n <td>0.868480</td>\n <td>0.953895</td>\n <td>0.900469</td>\n <td>0.117100</td>\n <td>0.488785</td>\n <td>17.100213</td>\n <td>22.196162</td>\n </tr>\n <tr>\n <th>10</th>\n <td></td>\n <td>11</td>\n <td>0.500000</td>\n <td>0.832167</td>\n <td>1.135181</td>\n <td>1.204606</td>\n <td>0.886152</td>\n <td>0.845024</td>\n <td>0.940346</td>\n <td>0.889380</td>\n <td>0.113518</td>\n <td>0.602303</td>\n <td>13.518124</td>\n <td>20.460554</td>\n </tr>\n <tr>\n <th>11</th>\n <td></td>\n <td>12</td>\n <td>0.600000</td>\n <td>0.800473</td>\n <td>1.084861</td>\n <td>1.184648</td>\n <td>0.846871</td>\n <td>0.817035</td>\n <td>0.924767</td>\n <td>0.877322</td>\n <td>0.108486</td>\n <td>0.710789</td>\n <td>8.486141</td>\n <td>18.464819</td>\n </tr>\n <tr>\n <th>12</th>\n <td></td>\n <td>13</td>\n <td>0.700000</td>\n <td>0.757131</td>\n <td>0.992751</td>\n <td>1.157234</td>\n <td>0.774967</td>\n <td>0.780013</td>\n <td>0.903367</td>\n <td>0.863421</td>\n <td>0.099275</td>\n <td>0.810064</td>\n <td>-0.724947</td>\n <td>15.723424</td>\n </tr>\n <tr>\n <th>13</th>\n <td></td>\n <td>14</td>\n <td>0.800000</td>\n <td>0.691098</td>\n <td>0.865672</td>\n <td>1.120789</td>\n <td>0.675766</td>\n <td>0.727475</td>\n <td>0.874917</td>\n <td>0.846428</td>\n <td>0.086567</td>\n <td>0.896631</td>\n <td>-13.432836</td>\n <td>12.078891</td>\n </tr>\n <tr>\n <th>14</th>\n <td></td>\n <td>15</td>\n <td>0.900000</td>\n <td>0.565262</td>\n <td>0.709595</td>\n <td>1.075101</td>\n <td>0.553928</td>\n <td>0.636082</td>\n <td>0.839251</td>\n <td>0.823056</td>\n <td>0.070959</td>\n <td>0.967591</td>\n <td>-29.040512</td>\n <td>7.510069</td>\n </tr>\n <tr>\n <th>15</th>\n <td></td>\n <td>16</td>\n <td>1.000000</td>\n <td>0.064508</td>\n <td>0.324094</td>\n <td>1.000000</td>\n <td>0.252996</td>\n <td>0.413434</td>\n <td>0.780626</td>\n <td>0.782094</td>\n <td>0.032409</td>\n <td>1.000000</td>\n <td>-67.590618</td>\n <td>0.000000</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" group cumulative_data_fraction lower_threshold lift \\\n0 1 0.010053 0.951514 1.281023 \n1 2 0.020040 0.944583 1.272483 \n2 3 0.030027 0.939599 1.281023 \n3 4 0.040013 0.936074 1.272483 \n4 5 0.050000 0.933012 1.263943 \n5 6 0.100000 0.920021 1.255437 \n6 7 0.150000 0.909208 1.248614 \n7 8 0.200000 0.899211 1.229851 \n8 9 0.300000 0.878844 1.212793 \n9 10 0.400000 0.857422 1.171002 \n10 11 0.500000 0.832167 1.135181 \n11 12 0.600000 0.800473 1.084861 \n12 13 0.700000 0.757131 0.992751 \n13 14 0.800000 0.691098 0.865672 \n14 15 0.900000 0.565262 0.709595 \n15 16 1.000000 0.064508 0.324094 \n\n cumulative_lift response_rate score cumulative_response_rate \\\n0 1.281023 1.000000 0.957157 1.000000 \n1 1.276768 0.993333 0.947860 0.996678 \n2 1.278183 1.000000 0.942052 0.997783 \n3 1.276760 0.993333 0.937724 0.996672 \n4 1.274200 0.986667 0.934541 0.994674 \n5 1.264819 0.980027 0.926271 0.987350 \n6 1.259417 0.974700 0.914476 0.983134 \n7 1.252026 0.960053 0.904328 0.977364 \n8 1.238948 0.946738 0.888916 0.967155 \n9 1.221962 0.914115 0.868480 0.953895 \n10 1.204606 0.886152 0.845024 0.940346 \n11 1.184648 0.846871 0.817035 0.924767 \n12 1.157234 0.774967 0.780013 0.903367 \n13 1.120789 0.675766 0.727475 0.874917 \n14 1.075101 0.553928 0.636082 0.839251 \n15 1.000000 0.252996 0.413434 0.780626 \n\n cumulative_score capture_rate cumulative_capture_rate gain \\\n0 0.957157 0.012878 0.012878 28.102345 \n1 0.952524 0.012708 0.025586 27.248330 \n2 0.949041 0.012793 0.038380 28.102345 \n3 0.946216 0.012708 0.051087 27.248330 \n4 0.943884 0.012623 0.063710 26.394314 \n5 0.935078 0.062772 0.126482 25.543710 \n6 0.928211 0.062431 0.188913 24.861407 \n7 0.922240 0.061493 0.250405 22.985075 \n8 0.911132 0.121279 0.371684 21.279318 \n9 0.900469 0.117100 0.488785 17.100213 \n10 0.889380 0.113518 0.602303 13.518124 \n11 0.877322 0.108486 0.710789 8.486141 \n12 0.863421 0.099275 0.810064 -0.724947 \n13 0.846428 0.086567 0.896631 -13.432836 \n14 0.823056 0.070959 0.967591 -29.040512 \n15 0.782094 0.032409 1.000000 -67.590618 \n\n cumulative_gain \n0 28.102345 \n1 27.676756 \n2 27.818305 \n3 27.676048 \n4 27.420043 \n5 26.481876 \n6 25.941720 \n7 25.202559 \n8 23.894812 \n9 22.196162 \n10 20.460554 \n11 18.464819 \n12 15.723424 \n13 12.078891 \n14 7.510069 \n15 0.000000 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n\nScoring History: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>timestamp</th>\n <th>duration</th>\n <th>number_of_trees</th>\n <th>training_rmse</th>\n <th>training_logloss</th>\n <th>training_auc</th>\n <th>training_pr_auc</th>\n <th>training_lift</th>\n <th>training_classification_error</th>\n <th>validation_rmse</th>\n <th>validation_logloss</th>\n <th>validation_auc</th>\n <th>validation_pr_auc</th>\n <th>validation_lift</th>\n <th>validation_classification_error</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>2020-06-03 20:06:33</td>\n <td>0.131 sec</td>\n <td>0.0</td>\n <td>0.412517</td>\n <td>0.523672</td>\n <td>0.500000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.217458</td>\n <td>0.413827</td>\n <td>0.526125</td>\n <td>0.500000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.219374</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2020-06-03 20:06:34</td>\n <td>1.147 sec</td>\n <td>10.0</td>\n <td>0.382665</td>\n <td>0.460842</td>\n <td>0.778901</td>\n <td>0.916229</td>\n <td>1.255981</td>\n <td>0.185903</td>\n <td>0.385980</td>\n <td>0.466955</td>\n <td>0.768938</td>\n <td>0.911758</td>\n <td>1.272596</td>\n <td>0.191478</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>2020-06-03 20:06:35</td>\n <td>2.106 sec</td>\n <td>20.0</td>\n <td>0.368603</td>\n <td>0.432135</td>\n <td>0.815712</td>\n <td>0.933750</td>\n <td>1.268760</td>\n <td>0.174441</td>\n <td>0.374044</td>\n <td>0.442140</td>\n <td>0.799287</td>\n <td>0.926887</td>\n <td>1.281023</td>\n <td>0.181625</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>2020-06-03 20:06:36</td>\n <td>3.032 sec</td>\n <td>30.0</td>\n <td>0.358795</td>\n <td>0.412357</td>\n <td>0.838837</td>\n <td>0.943557</td>\n <td>1.268773</td>\n <td>0.167667</td>\n <td>0.367004</td>\n <td>0.427451</td>\n <td>0.814227</td>\n <td>0.932822</td>\n <td>1.281023</td>\n <td>0.176964</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>2020-06-03 20:06:36</td>\n <td>3.928 sec</td>\n <td>40.0</td>\n <td>0.350731</td>\n <td>0.396337</td>\n <td>0.854237</td>\n <td>0.949766</td>\n <td>1.274236</td>\n <td>0.159421</td>\n <td>0.361087</td>\n <td>0.415420</td>\n <td>0.824577</td>\n <td>0.936954</td>\n <td>1.281023</td>\n <td>0.174501</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>2020-06-03 20:06:37</td>\n <td>4.830 sec</td>\n <td>50.0</td>\n <td>0.342974</td>\n <td>0.381394</td>\n <td>0.868354</td>\n <td>0.955011</td>\n <td>1.274236</td>\n <td>0.152332</td>\n <td>0.355623</td>\n <td>0.404315</td>\n <td>0.835692</td>\n <td>0.941458</td>\n <td>1.281023</td>\n <td>0.169907</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" timestamp duration number_of_trees training_rmse \\\n0 2020-06-03 20:06:33 0.131 sec 0.0 0.412517 \n1 2020-06-03 20:06:34 1.147 sec 10.0 0.382665 \n2 2020-06-03 20:06:35 2.106 sec 20.0 0.368603 \n3 2020-06-03 20:06:36 3.032 sec 30.0 0.358795 \n4 2020-06-03 20:06:36 3.928 sec 40.0 0.350731 \n5 2020-06-03 20:06:37 4.830 sec 50.0 0.342974 \n\n training_logloss training_auc training_pr_auc training_lift \\\n0 0.523672 0.500000 0.000000 1.000000 \n1 0.460842 0.778901 0.916229 1.255981 \n2 0.432135 0.815712 0.933750 1.268760 \n3 0.412357 0.838837 0.943557 1.268773 \n4 0.396337 0.854237 0.949766 1.274236 \n5 0.381394 0.868354 0.955011 1.274236 \n\n training_classification_error validation_rmse validation_logloss \\\n0 0.217458 0.413827 0.526125 \n1 0.185903 0.385980 0.466955 \n2 0.174441 0.374044 0.442140 \n3 0.167667 0.367004 0.427451 \n4 0.159421 0.361087 0.415420 \n5 0.152332 0.355623 0.404315 \n\n validation_auc validation_pr_auc validation_lift \\\n0 0.500000 0.000000 1.000000 \n1 0.768938 0.911758 1.272596 \n2 0.799287 0.926887 1.281023 \n3 0.814227 0.932822 1.281023 \n4 0.824577 0.936954 1.281023 \n5 0.835692 0.941458 1.281023 \n\n validation_classification_error \n0 0.219374 \n1 0.191478 \n2 0.181625 \n3 0.176964 \n4 0.174501 \n5 0.169907 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nVariable Importances: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>variable</th>\n <th>relative_importance</th>\n <th>scaled_importance</th>\n <th>percentage</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>HelpfulnessDenominator</td>\n <td>3420.752686</td>\n <td>1.000000</td>\n <td>0.187125</td>\n </tr>\n <tr>\n <th>1</th>\n <td>HelpfulnessNumerator</td>\n <td>1690.508789</td>\n <td>0.494192</td>\n <td>0.092476</td>\n </tr>\n <tr>\n <th>2</th>\n <td>C99</td>\n <td>1359.424683</td>\n <td>0.397405</td>\n <td>0.074365</td>\n </tr>\n <tr>\n <th>3</th>\n <td>C62</td>\n <td>1213.377075</td>\n <td>0.354711</td>\n <td>0.066375</td>\n </tr>\n <tr>\n <th>4</th>\n <td>C4</td>\n <td>916.311829</td>\n <td>0.267868</td>\n <td>0.050125</td>\n </tr>\n <tr>\n <th>5</th>\n <td>Time</td>\n <td>797.362915</td>\n <td>0.233096</td>\n <td>0.043618</td>\n </tr>\n <tr>\n <th>6</th>\n <td>C43</td>\n <td>684.979614</td>\n <td>0.200242</td>\n <td>0.037470</td>\n </tr>\n <tr>\n <th>7</th>\n <td>C54</td>\n <td>549.793091</td>\n <td>0.160723</td>\n <td>0.030075</td>\n </tr>\n <tr>\n <th>8</th>\n <td>C16</td>\n <td>497.228485</td>\n <td>0.145356</td>\n <td>0.027200</td>\n </tr>\n <tr>\n <th>9</th>\n <td>C81</td>\n <td>478.059814</td>\n <td>0.139753</td>\n <td>0.026151</td>\n </tr>\n <tr>\n <th>10</th>\n <td>C2</td>\n <td>387.449371</td>\n <td>0.113264</td>\n <td>0.021195</td>\n </tr>\n <tr>\n <th>11</th>\n <td>C73</td>\n <td>348.745483</td>\n <td>0.101950</td>\n <td>0.019077</td>\n </tr>\n <tr>\n <th>12</th>\n <td>C9</td>\n <td>317.079102</td>\n <td>0.092693</td>\n <td>0.017345</td>\n </tr>\n <tr>\n <th>13</th>\n <td>C22</td>\n <td>304.667206</td>\n <td>0.089064</td>\n <td>0.016666</td>\n </tr>\n <tr>\n <th>14</th>\n <td>C100</td>\n <td>290.342468</td>\n <td>0.084877</td>\n <td>0.015883</td>\n </tr>\n <tr>\n <th>15</th>\n <td>C19</td>\n <td>262.545776</td>\n <td>0.076751</td>\n <td>0.014362</td>\n </tr>\n <tr>\n <th>16</th>\n <td>C93</td>\n <td>261.258850</td>\n <td>0.076375</td>\n <td>0.014292</td>\n </tr>\n <tr>\n <th>17</th>\n <td>C60</td>\n <td>259.402893</td>\n <td>0.075832</td>\n <td>0.014190</td>\n </tr>\n <tr>\n <th>18</th>\n <td>C86</td>\n <td>245.558838</td>\n <td>0.071785</td>\n <td>0.013433</td>\n </tr>\n <tr>\n <th>19</th>\n <td>C90</td>\n <td>237.920685</td>\n <td>0.069552</td>\n <td>0.013015</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" variable relative_importance scaled_importance percentage\n0 HelpfulnessDenominator 3420.752686 1.000000 0.187125\n1 HelpfulnessNumerator 1690.508789 0.494192 0.092476\n2 C99 1359.424683 0.397405 0.074365\n3 C62 1213.377075 0.354711 0.066375\n4 C4 916.311829 0.267868 0.050125\n5 Time 797.362915 0.233096 0.043618\n6 C43 684.979614 0.200242 0.037470\n7 C54 549.793091 0.160723 0.030075\n8 C16 497.228485 0.145356 0.027200\n9 C81 478.059814 0.139753 0.026151\n10 C2 387.449371 0.113264 0.021195\n11 C73 348.745483 0.101950 0.019077\n12 C9 317.079102 0.092693 0.017345\n13 C22 304.667206 0.089064 0.016666\n14 C100 290.342468 0.084877 0.015883\n15 C19 262.545776 0.076751 0.014362\n16 C93 261.258850 0.076375 0.014292\n17 C60 259.402893 0.075832 0.014190\n18 C86 245.558838 0.071785 0.013433\n19 C90 237.920685 0.069552 0.013015"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\nSee the whole table with table.as_data_frame()\n"},{"data":{"text/plain":""},"execution_count":30,"metadata":{},"output_type":"execute_result"}]},{"metadata":{},"cell_type":"markdown","source":"## Just like before, let's log all of our outcomes"},{"metadata":{"trusted":false,"scrolled":false},"cell_type":"code","source":"%matplotlib inline \nfrom IPython.core.display import HTML\n\n# Print and Log Model params\nparams = dict(zip(gbm_embeddings.summary().col_header[1:],\n gbm_embeddings.summary().cell_values[0][1:]))\nprint(gbm_embeddings.summary())\nmlflow.log_params(params)\n\n\n#Plot and Log Scoring history\ngbm_embeddings.plot()\nprint(\"AUC on Validation Data: \" + str(round(gbm_embeddings.auc(valid = True), 3)))\n# Log training and validation metrics over time\nfor step, row in gbm_embeddings.scoring_history().iterrows():\n row_dict = row.to_dict()\n for r in row_dict:\n if 'train' in r or 'valid' in r:\n mlflow.log_metric(r, row_dict[r],step=step)\n\n\n# Print and Log Confusion Matrix\nprint(gbm_embeddings.confusion_matrix(valid = True))\nmlflow.lm('fpr', gbm_embeddings.fpr(valid=True)[0][0])\nmlflow.lm('tpr', gbm_embeddings.tpr(valid=True)[0][0])\nmlflow.lm('fnr', gbm_embeddings.fnr(valid=True)[0][0])\nmlflow.lm('tnr', gbm_embeddings.fnr(valid=True)[0][0])\nmlflow.lm('F0point5', gbm_embeddings.F0point5(valid=True)[0][1])\nmlflow.lm('F1', gbm_embeddings.F1(valid=True)[0][1])\nmlflow.lm('F2', gbm_embeddings.F2(valid=True)[0][1])\nmlflow.lm('auc', gbm_embeddings.auc(valid = True))\nmlflow.lp('threshold', gbm_embeddings.F1(valid=True)[0][0]) # First element is the threshold\n\n\n# Plot and Log Variable Importance\ngbm_embeddings.varimp_plot()\nfor var in gbm_embeddings.varimp():\n mlflow.lm(f'varimp_{var[0]}',var[-1])\n \n \n# Partial Dependence Plot\npdp_helpfulness = gbm_embeddings.partial_plot(ext_train, cols = [\"HelpfulnessNumerator\"])","execution_count":31,"outputs":[{"name":"stdout","output_type":"stream","text":"\nModel Summary: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>number_of_trees</th>\n <th>number_of_internal_trees</th>\n <th>model_size_in_bytes</th>\n <th>min_depth</th>\n <th>max_depth</th>\n <th>mean_depth</th>\n <th>min_leaves</th>\n <th>max_leaves</th>\n <th>mean_leaves</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>50.0</td>\n <td>50.0</td>\n <td>22916.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>5.0</td>\n <td>29.0</td>\n <td>32.0</td>\n <td>31.82</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" number_of_trees number_of_internal_trees model_size_in_bytes \\\n0 50.0 50.0 22916.0 \n\n min_depth max_depth mean_depth min_leaves max_leaves mean_leaves \n0 5.0 5.0 5.0 29.0 32.0 31.82 "},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n"},{"data":{"image/png":"\n","text/plain":"<Figure size 432x288 with 1 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"AUC on Validation Data: 0.836\n\nConfusion Matrix (Act/Pred) for max f1 @ threshold = 0.5467423242599629: \n"},{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>0</th>\n <th>1</th>\n <th>Error</th>\n <th>Rate</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>0</td>\n <td>1045.0</td>\n <td>2250.0</td>\n <td>0.6829</td>\n <td>(2250.0/3295.0)</td>\n </tr>\n <tr>\n <th>1</th>\n <td>1</td>\n <td>302.0</td>\n <td>11423.0</td>\n <td>0.0258</td>\n <td>(302.0/11725.0)</td>\n </tr>\n <tr>\n <th>2</th>\n <td>Total</td>\n <td>1347.0</td>\n <td>13673.0</td>\n <td>0.1699</td>\n <td>(2552.0/15020.0)</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" 0 1 Error Rate\n0 0 1045.0 2250.0 0.6829 (2250.0/3295.0)\n1 1 302.0 11423.0 0.0258 (302.0/11725.0)\n2 Total 1347.0 13673.0 0.1699 (2552.0/15020.0)"},"metadata":{},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"\n"},{"data":{"image/png":"\n","text/plain":"<Figure size 1008x720 with 1 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"},{"name":"stdout","output_type":"stream","text":"PartialDependencePlot progress: |█████████████████████████████████████████| 100%\n"},{"data":{"image/png":"\n","text/plain":"<Figure size 504x720 with 1 Axes>"},"metadata":{"needs_background":"light"},"output_type":"display_data"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"old_run = cur_run\ncur_run = mlflow.current_run_id()\ncur_exp = mlflow.current_exp_id()\nlink = f'/mlflow/#/metric/training_auc?runs=[\"{cur_run}\",\"{old_run}\"]&experiment={cur_exp}&plot_metric_keys=[\\\"training_logloss\\\",\\\"validation_logloss\\\",\\\"training_rmse\\\",\\\"validation_rmse\\\"]'\nHTML(f'<font size=\"+1\">Compare your 2 runs <a href={link}>here</a></font>') ","execution_count":32,"outputs":[{"data":{"text/html":"<font size=\"+1\">Compare your 2 runs <a href=/mlflow/#/metric/training_auc?runs=[\"acbb4ff7f942\",\"a92e93f393df\"]&experiment=1&plot_metric_keys=[\"training_logloss\",\"validation_logloss\",\"training_rmse\",\"validation_rmse\"]>here</a></font>","text/plain":"<IPython.core.display.HTML object>"},"execution_count":32,"metadata":{},"output_type":"execute_result"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"gbm_embeddings.scoring_history()","execution_count":33,"outputs":[{"data":{"text/html":"<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th></th>\n <th>timestamp</th>\n <th>duration</th>\n <th>number_of_trees</th>\n <th>training_rmse</th>\n <th>training_logloss</th>\n <th>training_auc</th>\n <th>training_pr_auc</th>\n <th>training_lift</th>\n <th>training_classification_error</th>\n <th>validation_rmse</th>\n <th>validation_logloss</th>\n <th>validation_auc</th>\n <th>validation_pr_auc</th>\n <th>validation_lift</th>\n <th>validation_classification_error</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td></td>\n <td>2020-06-03 20:06:33</td>\n <td>0.131 sec</td>\n <td>0.0</td>\n <td>0.412517</td>\n <td>0.523672</td>\n <td>0.500000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.217458</td>\n <td>0.413827</td>\n <td>0.526125</td>\n <td>0.500000</td>\n <td>0.000000</td>\n <td>1.000000</td>\n <td>0.219374</td>\n </tr>\n <tr>\n <th>1</th>\n <td></td>\n <td>2020-06-03 20:06:34</td>\n <td>1.147 sec</td>\n <td>10.0</td>\n <td>0.382665</td>\n <td>0.460842</td>\n <td>0.778901</td>\n <td>0.916229</td>\n <td>1.255981</td>\n <td>0.185903</td>\n <td>0.385980</td>\n <td>0.466955</td>\n <td>0.768938</td>\n <td>0.911758</td>\n <td>1.272596</td>\n <td>0.191478</td>\n </tr>\n <tr>\n <th>2</th>\n <td></td>\n <td>2020-06-03 20:06:35</td>\n <td>2.106 sec</td>\n <td>20.0</td>\n <td>0.368603</td>\n <td>0.432135</td>\n <td>0.815712</td>\n <td>0.933750</td>\n <td>1.268760</td>\n <td>0.174441</td>\n <td>0.374044</td>\n <td>0.442140</td>\n <td>0.799287</td>\n <td>0.926887</td>\n <td>1.281023</td>\n <td>0.181625</td>\n </tr>\n <tr>\n <th>3</th>\n <td></td>\n <td>2020-06-03 20:06:36</td>\n <td>3.032 sec</td>\n <td>30.0</td>\n <td>0.358795</td>\n <td>0.412357</td>\n <td>0.838837</td>\n <td>0.943557</td>\n <td>1.268773</td>\n <td>0.167667</td>\n <td>0.367004</td>\n <td>0.427451</td>\n <td>0.814227</td>\n <td>0.932822</td>\n <td>1.281023</td>\n <td>0.176964</td>\n </tr>\n <tr>\n <th>4</th>\n <td></td>\n <td>2020-06-03 20:06:36</td>\n <td>3.928 sec</td>\n <td>40.0</td>\n <td>0.350731</td>\n <td>0.396337</td>\n <td>0.854237</td>\n <td>0.949766</td>\n <td>1.274236</td>\n <td>0.159421</td>\n <td>0.361087</td>\n <td>0.415420</td>\n <td>0.824577</td>\n <td>0.936954</td>\n <td>1.281023</td>\n <td>0.174501</td>\n </tr>\n <tr>\n <th>5</th>\n <td></td>\n <td>2020-06-03 20:06:37</td>\n <td>4.830 sec</td>\n <td>50.0</td>\n <td>0.342974</td>\n <td>0.381394</td>\n <td>0.868354</td>\n <td>0.955011</td>\n <td>1.274236</td>\n <td>0.152332</td>\n <td>0.355623</td>\n <td>0.404315</td>\n <td>0.835692</td>\n <td>0.941458</td>\n <td>1.281023</td>\n <td>0.169907</td>\n </tr>\n </tbody>\n</table>\n</div>","text/plain":" timestamp duration number_of_trees training_rmse \\\n0 2020-06-03 20:06:33 0.131 sec 0.0 0.412517 \n1 2020-06-03 20:06:34 1.147 sec 10.0 0.382665 \n2 2020-06-03 20:06:35 2.106 sec 20.0 0.368603 \n3 2020-06-03 20:06:36 3.032 sec 30.0 0.358795 \n4 2020-06-03 20:06:36 3.928 sec 40.0 0.350731 \n5 2020-06-03 20:06:37 4.830 sec 50.0 0.342974 \n\n training_logloss training_auc training_pr_auc training_lift \\\n0 0.523672 0.500000 0.000000 1.000000 \n1 0.460842 0.778901 0.916229 1.255981 \n2 0.432135 0.815712 0.933750 1.268760 \n3 0.412357 0.838837 0.943557 1.268773 \n4 0.396337 0.854237 0.949766 1.274236 \n5 0.381394 0.868354 0.955011 1.274236 \n\n training_classification_error validation_rmse validation_logloss \\\n0 0.217458 0.413827 0.526125 \n1 0.185903 0.385980 0.466955 \n2 0.174441 0.374044 0.442140 \n3 0.167667 0.367004 0.427451 \n4 0.159421 0.361087 0.415420 \n5 0.152332 0.355623 0.404315 \n\n validation_auc validation_pr_auc validation_lift \\\n0 0.500000 0.000000 1.000000 \n1 0.768938 0.911758 1.272596 \n2 0.799287 0.926887 1.281023 \n3 0.814227 0.932822 1.281023 \n4 0.824577 0.936954 1.281023 \n5 0.835692 0.941458 1.281023 \n\n validation_classification_error \n0 0.219374 \n1 0.191478 \n2 0.181625 \n3 0.176964 \n4 0.174501 \n5 0.169907 "},"execution_count":33,"metadata":{},"output_type":"execute_result"}]},{"metadata":{"trusted":false},"cell_type":"code","source":"mlflow.end_run()","execution_count":34,"outputs":[]},{"metadata":{"trusted":true},"cell_type":"code","source":"print(\"Baseline AUC: \" + str(round(gbm_baseline.auc(valid = True), 3)))\nprint(\"With Embeddings AUC: \" + str(round(gbm_embeddings.auc(valid = True), 3)))\nlink = f'/mlflow/#/metric/training_auc?runs=[\"{cur_run}\",\"{old_run}\"]&experiment={cur_exp}&plot_metric_keys=[\\\"training_auc\\\",\\\"validation_auc\\\"]'\nHTML(f'<font size=\"+1\">See a metrics comparison <a href={link}>here</a></font>')","execution_count":78,"outputs":[{"output_type":"stream","text":"Baseline AUC: 0.732\nWith Embeddings AUC: 0.836\n","name":"stdout"},{"output_type":"execute_result","execution_count":78,"data":{"text/plain":"<IPython.core.display.HTML object>","text/html":"<font size=\"+1\">See a metrics comparison <a href=/mlflow/#/metric/training_auc?runs=[\"acbb4ff7f942\",\"a92e93f393df\"]&experiment=1&plot_metric_keys=[\"training_auc\",\"validation_auc\"]>here</a></font>"},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"# That's some great imrpovement! So what's next?\n<blockquote>We included the customer reviews and developed a better model. We've logged everything to MLFlow for detailed comparisons. Now what?<br>\n Let's deploy our models to production so we can utilize what we've built. First, we'll deploy our word vectorizer model, and then deploy our GBM model. Finally, we'll create a feed from the first model to the second, so we can see the final predictions!\n </i></br><footer>Splice Machine</footer></blockquote><br>\n<img src=https://splice-demo.s3.amazonaws.com/H2O+Demo+Model+Diagram.png>"},{"metadata":{},"cell_type":"markdown","source":"## Step 1: Deploy Word2Vec Model"},{"metadata":{"trusted":false},"cell_type":"code","source":"help(mlflow.deploy_db)","execution_count":165,"outputs":[{"name":"stdout","output_type":"stream","text":"Help on function _deploy_db in module splicemachine.mlflow_support.mlflow_support:\n\n_deploy_db(fittedModel, df, db_schema_name, db_table_name, primary_key, run_id: str = None, classes=None, sklearn_args={}, verbose=False, pred_threshold=None, replace=False) -> None\n Function to deploy a trained (currently Spark, Sklearn or H2O) model to the Database.\n This creates 2 tables: One with the features of the model, and one with the prediction and metadata.\n They are linked with a column called MOMENT_ID\n \n :param fittedModel: (ML pipeline or model) The fitted pipeline to deploy\n :param df: (Spark DF) The dataframe used to train the model\n NOTE: this dataframe should NOT be transformed by the model. The columns in this df are the ones\n that will be used to create the table.\n :param db_schema_name: (str) the schema name to deploy to. If None, the currently set schema will be used.\n :param db_table_name: (str) the table name to deploy to. If none, the run_id will be used for the table name(s)\n :param primary_key: (List[Tuple[str, str]]) List of column + SQL datatype to use for the primary/composite key\n :param run_id: (str) The active run_id\n :param classes: (List[str]) The classes (prediction labels) for the model being deployed.\n NOTE: If not supplied, the table will have default column names for each class\n :param sklearn_args: (dict{str: str}) Prediction options for sklearn models\n Available key value options:\n 'predict_call': 'predict', 'predict_proba', or 'transform'\n - Determines the function call for the model\n If blank, predict will be used\n (or transform if model doesn't have predict)\n 'predict_args': 'return_std' or 'return_cov' - For Bayesian and Gaussian models\n Only one can be specified\n If the model does not have the option specified, it will be ignored.\n :param verbose: (bool) Whether or not to print out the queries being created. Helpful for debugging\n :param pred_threshold: (double) A prediction threshold for *Keras* binary classification models\n If the model type isn't Keras, this parameter will be ignored\n NOTE: If the model type is Keras, the output layer has 1 node, and pred_threshold is None,\n you will NOT receive a class prediction, only the output of the final layer (like model.predict()).\n If you want a class prediction\n for your binary classification problem, you MUST pass in a threshold.\n :param replace: (bool) whether or not to replace a currently existing model. This param does not yet work\n \n \n This function creates the following:\n * Table (default called DATA_{run_id}) where run_id is the run_id of the mlflow run associated to that model.\n This will have a column for each feature in the feature vector as well as a MOMENT_ID as primary key\n * Table (default called DATA_{run_id}_PREDS) That will have the columns:\n USER which is the current user who made the request\n EVAL_TIME which is the CURRENT_TIMESTAMP\n MOMENT_ID same as the DATA table to link predictions to rows in the table\n PREDICTION. The prediction of the model. If the :classes: param is not filled in, this will be default values for classification models\n A column for each class of the predictor with the value being the probability/confidence of the model if applicable\n * A trigger that runs on (after) insertion to the data table that runs an INSERT into the prediction table,\n calling the PREDICT function, passing in the row of data as well as the schema of the dataset, and the run_id of the model to run\n * A trigger that runs on (after) insertion to the prediction table that calls an UPDATE to the row inserted,\n parsing the prediction probabilities and filling in proper column values\n\n"}]},{"metadata":{"trusted":true},"cell_type":"code","source":"# Get the run_id from the name. Note that multiple runs can have the same name, so this returns a list\nrun_id = mlflow.get_run_ids_by_name('review_tokenizer')[0]\n# Get the model from that run\nw2v_model = mlflow.load_model(run_id=run_id, name='word2vec_model')\ndeploy_df = hc.asSparkFrame(reviews[['Text']])\nschema = 'REPLACE_ME_DBSCHEMA'\nschema='splice'\nsplice._dropTableIfExists(f'{schema}.word_vec_model')\nsplice._dropTableIfExists(f'{schema}.word_vec_model_preds')\nmlflow.deploy_db(w2v_model, deploy_df, schema, 'word_vec_model', [('REVIEW_ID', 'INT')], run_id=run_id, classes=[f'C{i+1}' for i in range(100)], verbose=True)","execution_count":111,"outputs":[{"output_type":"stream","text":"Droping table splice.word_vec_model\nDroping table splice.word_vec_model_preds\nWARN: A model with this ID already exists in the table. We are NOT replacing it. We will use the currently existing model.\nTo replace, use a new run_id\nPrediction labels found. Using ['C1', 'C2', 'C3', 'C4', 'C5', 'C6', 'C7', 'C8', 'C9', 'C10', 'C11', 'C12', 'C13', 'C14', 'C15', 'C16', 'C17', 'C18', 'C19', 'C20', 'C21', 'C22', 'C23', 'C24', 'C25', 'C26', 'C27', 'C28', 'C29', 'C30', 'C31', 'C32', 'C33', 'C34', 'C35', 'C36', 'C37', 'C38', 'C39', 'C40', 'C41', 'C42', 'C43', 'C44', 'C45', 'C46', 'C47', 'C48', 'C49', 'C50', 'C51', 'C52', 'C53', 'C54', 'C55', 'C56', 'C57', 'C58', 'C59', 'C60', 'C61', 'C62', 'C63', 'C64', 'C65', 'C66', 'C67', 'C68', 'C69', 'C70', 'C71', 'C72', 'C73', 'C74', 'C75', 'C76', 'C77', 'C78', 'C79', 'C80', 'C81', 'C82', 'C83', 'C84', 'C85', 'C86', 'C87', 'C88', 'C89', 'C90', 'C91', 'C92', 'C93', 'C94', 'C95', 'C96', 'C97', 'C98', 'C99', 'C100'] as labels for predictions [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99] respectively\nDeploying model 53f5a36065ad to table splice.word_vec_model\nCreating data table ... \n CREATE TABLE splice.word_vec_model (\n\tText VARCHAR(5000),\tREVIEW_ID INT,\n\tPRIMARY KEY(REVIEW_ID)\n)\n\nDone.\nCreating prediction table ... \nCREATE TABLE splice.word_vec_model_PREDS (\n \tCUR_USER VARCHAR(50) DEFAULT CURRENT_USER,\n \tEVAL_TIME TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n \tRUN_ID VARCHAR(50) DEFAULT '53f5a36065ad',\n \tREVIEW_ID INT,\n\t\"C1\" DOUBLE,\n\t\"C2\" DOUBLE,\n\t\"C3\" DOUBLE,\n\t\"C4\" DOUBLE,\n\t\"C5\" DOUBLE,\n\t\"C6\" DOUBLE,\n\t\"C7\" DOUBLE,\n\t\"C8\" DOUBLE,\n\t\"C9\" DOUBLE,\n\t\"C10\" DOUBLE,\n\t\"C11\" DOUBLE,\n\t\"C12\" DOUBLE,\n\t\"C13\" DOUBLE,\n\t\"C14\" DOUBLE,\n\t\"C15\" DOUBLE,\n\t\"C16\" DOUBLE,\n\t\"C17\" DOUBLE,\n\t\"C18\" DOUBLE,\n\t\"C19\" DOUBLE,\n\t\"C20\" DOUBLE,\n\t\"C21\" DOUBLE,\n\t\"C22\" DOUBLE,\n\t\"C23\" DOUBLE,\n\t\"C24\" DOUBLE,\n\t\"C25\" DOUBLE,\n\t\"C26\" DOUBLE,\n\t\"C27\" DOUBLE,\n\t\"C28\" DOUBLE,\n\t\"C29\" DOUBLE,\n\t\"C30\" DOUBLE,\n\t\"C31\" DOUBLE,\n\t\"C32\" DOUBLE,\n\t\"C33\" DOUBLE,\n\t\"C34\" DOUBLE,\n\t\"C35\" DOUBLE,\n\t\"C36\" DOUBLE,\n\t\"C37\" DOUBLE,\n\t\"C38\" DOUBLE,\n\t\"C39\" DOUBLE,\n\t\"C40\" DOUBLE,\n\t\"C41\" DOUBLE,\n\t\"C42\" DOUBLE,\n\t\"C43\" DOUBLE,\n\t\"C44\" DOUBLE,\n\t\"C45\" DOUBLE,\n\t\"C46\" DOUBLE,\n\t\"C47\" DOUBLE,\n\t\"C48\" DOUBLE,\n\t\"C49\" DOUBLE,\n\t\"C50\" DOUBLE,\n\t\"C51\" DOUBLE,\n\t\"C52\" DOUBLE,\n\t\"C53\" DOUBLE,\n\t\"C54\" DOUBLE,\n\t\"C55\" DOUBLE,\n\t\"C56\" DOUBLE,\n\t\"C57\" DOUBLE,\n\t\"C58\" DOUBLE,\n\t\"C59\" DOUBLE,\n\t\"C60\" DOUBLE,\n\t\"C61\" DOUBLE,\n\t\"C62\" DOUBLE,\n\t\"C63\" DOUBLE,\n\t\"C64\" DOUBLE,\n\t\"C65\" DOUBLE,\n\t\"C66\" DOUBLE,\n\t\"C67\" DOUBLE,\n\t\"C68\" DOUBLE,\n\t\"C69\" DOUBLE,\n\t\"C70\" DOUBLE,\n\t\"C71\" DOUBLE,\n\t\"C72\" DOUBLE,\n\t\"C73\" DOUBLE,\n\t\"C74\" DOUBLE,\n\t\"C75\" DOUBLE,\n\t\"C76\" DOUBLE,\n\t\"C77\" DOUBLE,\n\t\"C78\" DOUBLE,\n\t\"C79\" DOUBLE,\n\t\"C80\" DOUBLE,\n\t\"C81\" DOUBLE,\n\t\"C82\" DOUBLE,\n\t\"C83\" DOUBLE,\n\t\"C84\" DOUBLE,\n\t\"C85\" DOUBLE,\n\t\"C86\" DOUBLE,\n\t\"C87\" DOUBLE,\n\t\"C88\" DOUBLE,\n\t\"C89\" DOUBLE,\n\t\"C90\" DOUBLE,\n\t\"C91\" DOUBLE,\n\t\"C92\" DOUBLE,\n\t\"C93\" DOUBLE,\n\t\"C94\" DOUBLE,\n\t\"C95\" DOUBLE,\n\t\"C96\" DOUBLE,\n\t\"C97\" DOUBLE,\n\t\"C98\" DOUBLE,\n\t\"C99\" DOUBLE,\n\t\"C100\" DOUBLE,\n\tPRIMARY KEY(REVIEW_ID)\n)\n\nDone.\nCreating model prediction trigger ... \nCREATE TRIGGER runModel_splice_word_vec_model_53f5a36065ad\n \tAFTER INSERT\n \tON splice.word_vec_model\n \tREFERENCING NEW AS NEWROW\n \tFOR EACH ROW\n \t\tINSERT INTO splice.word_vec_model_PREDS(REVIEW_ID,\"C1\",\"C2\",\"C3\",\"C4\",\"C5\",\"C6\",\"C7\",\"C8\",\"C9\",\"C10\",\"C11\",\"C12\",\"C13\",\"C14\",\"C15\",\"C16\",\"C17\",\"C18\",\"C19\",\"C20\",\"C21\",\"C22\",\"C23\",\"C24\",\"C25\",\"C26\",\"C27\",\"C28\",\"C29\",\"C30\",\"C31\",\"C32\",\"C33\",\"C34\",\"C35\",\"C36\",\"C37\",\"C38\",\"C39\",\"C40\",\"C41\",\"C42\",\"C43\",\"C44\",\"C45\",\"C46\",\"C47\",\"C48\",\"C49\",\"C50\",\"C51\",\"C52\",\"C53\",\"C54\",\"C55\",\"C56\",\"C57\",\"C58\",\"C59\",\"C60\",\"C61\",\"C62\",\"C63\",\"C64\",\"C65\",\"C66\",\"C67\",\"C68\",\"C69\",\"C70\",\"C71\",\"C72\",\"C73\",\"C74\",\"C75\",\"C76\",\"C77\",\"C78\",\"C79\",\"C80\",\"C81\",\"C82\",\"C83\",\"C84\",\"C85\",\"C86\",\"C87\",\"C88\",\"C89\",\"C90\",\"C91\",\"C92\",\"C93\",\"C94\",\"C95\",\"C96\",\"C97\",\"C98\",\"C99\",\"C100\") SELECT \tNEWROW.REVIEW_ID, b.\"C1\",b.\"C2\",b.\"C3\",b.\"C4\",b.\"C5\",b.\"C6\",b.\"C7\",b.\"C8\",b.\"C9\",b.\"C10\",b.\"C11\",b.\"C12\",b.\"C13\",b.\"C14\",b.\"C15\",b.\"C16\",b.\"C17\",b.\"C18\",b.\"C19\",b.\"C20\",b.\"C21\",b.\"C22\",b.\"C23\",b.\"C24\",b.\"C25\",b.\"C26\",b.\"C27\",b.\"C28\",b.\"C29\",b.\"C30\",b.\"C31\",b.\"C32\",b.\"C33\",b.\"C34\",b.\"C35\",b.\"C36\",b.\"C37\",b.\"C38\",b.\"C39\",b.\"C40\",b.\"C41\",b.\"C42\",b.\"C43\",b.\"C44\",b.\"C45\",b.\"C46\",b.\"C47\",b.\"C48\",b.\"C49\",b.\"C50\",b.\"C51\",b.\"C52\",b.\"C53\",b.\"C54\",b.\"C55\",b.\"C56\",b.\"C57\",b.\"C58\",b.\"C59\",b.\"C60\",b.\"C61\",b.\"C62\",b.\"C63\",b.\"C64\",b.\"C65\",b.\"C66\",b.\"C67\",b.\"C68\",b.\"C69\",b.\"C70\",b.\"C71\",b.\"C72\",b.\"C73\",b.\"C74\",b.\"C75\",b.\"C76\",b.\"C77\",b.\"C78\",b.\"C79\",b.\"C80\",b.\"C81\",b.\"C82\",b.\"C83\",b.\"C84\",b.\"C85\",b.\"C86\",b.\"C87\",b.\"C88\",b.\"C89\",b.\"C90\",b.\"C91\",b.\"C92\",b.\"C93\",b.\"C94\",b.\"C95\",b.\"C96\",b.\"C97\",b.\"C98\",b.\"C99\",b.\"C100\" FROM new com.splicemachine.mlrunner.MLRunner('key_value', '53f5a36065ad', TRIM(CAST(NEWROW.Text as CHAR(41))), 'Text VARCHAR(5000)', 'transform', 'None') as b (\"C1\" DOUBLE,\"C2\" DOUBLE,\"C3\" DOUBLE,\"C4\" DOUBLE,\"C5\" DOUBLE,\"C6\" DOUBLE,\"C7\" DOUBLE,\"C8\" DOUBLE,\"C9\" DOUBLE,\"C10\" DOUBLE,\"C11\" DOUBLE,\"C12\" DOUBLE,\"C13\" DOUBLE,\"C14\" DOUBLE,\"C15\" DOUBLE,\"C16\" DOUBLE,\"C17\" DOUBLE,\"C18\" DOUBLE,\"C19\" DOUBLE,\"C20\" DOUBLE,\"C21\" DOUBLE,\"C22\" DOUBLE,\"C23\" DOUBLE,\"C24\" DOUBLE,\"C25\" DOUBLE,\"C26\" DOUBLE,\"C27\" DOUBLE,\"C28\" DOUBLE,\"C29\" DOUBLE,\"C30\" DOUBLE,\"C31\" DOUBLE,\"C32\" DOUBLE,\"C33\" DOUBLE,\"C34\" DOUBLE,\"C35\" DOUBLE,\"C36\" DOUBLE,\"C37\" DOUBLE,\"C38\" DOUBLE,\"C39\" DOUBLE,\"C40\" DOUBLE,\"C41\" DOUBLE,\"C42\" DOUBLE,\"C43\" DOUBLE,\"C44\" DOUBLE,\"C45\" DOUBLE,\"C46\" DOUBLE,\"C47\" DOUBLE,\"C48\" DOUBLE,\"C49\" DOUBLE,\"C50\" DOUBLE,\"C51\" DOUBLE,\"C52\" DOUBLE,\"C53\" DOUBLE,\"C54\" DOUBLE,\"C55\" DOUBLE,\"C56\" DOUBLE,\"C57\" DOUBLE,\"C58\" DOUBLE,\"C59\" DOUBLE,\"C60\" DOUBLE,\"C61\" DOUBLE,\"C62\" DOUBLE,\"C63\" DOUBLE,\"C64\" DOUBLE,\"C65\" DOUBLE,\"C66\" DOUBLE,\"C67\" DOUBLE,\"C68\" DOUBLE,\"C69\" DOUBLE,\"C70\" DOUBLE,\"C71\" DOUBLE,\"C72\" DOUBLE,\"C73\" DOUBLE,\"C74\" DOUBLE,\"C75\" DOUBLE,\"C76\" DOUBLE,\"C77\" DOUBLE,\"C78\" DOUBLE,\"C79\" DOUBLE,\"C80\" DOUBLE,\"C81\" DOUBLE,\"C82\" DOUBLE,\"C83\" DOUBLE,\"C84\" DOUBLE,\"C85\" DOUBLE,\"C86\" DOUBLE,\"C87\" DOUBLE,\"C88\" DOUBLE,\"C89\" DOUBLE,\"C90\" DOUBLE,\"C91\" DOUBLE,\"C92\" DOUBLE,\"C93\" DOUBLE,\"C94\" DOUBLE,\"C95\" DOUBLE,\"C96\" DOUBLE,\"C97\" DOUBLE,\"C98\" DOUBLE,\"C99\" DOUBLE,\"C100\" DOUBLE)\n\nDone.\nModel Deployed.\n","name":"stdout"}]},{"metadata":{},"cell_type":"markdown","source":"## Sweet! Let's try it out"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%sql\ninsert into word_vec_model values('It''s a great value!', 1);\ninsert into word_vec_model values('I hate this product', 2);\n\nselect * from word_vec_model_preds;","execution_count":109,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"649b6b28-7ce6-4d07-8cab-c1870840c105","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"d7d4225f-68f0-4a2d-9899-ee3484628dd1","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"b002f839-64c8-4fd4-a71a-2b04a8fdc39d","version_major":2}},"metadata":{}}]},{"metadata":{},"cell_type":"markdown","source":"## Amazing! Up Next: GBM\n<blockquote>\n Now when we insert rows into our <code>WORD_VEC_MODEL</code> table, our machine learning vectorizer will run automatically and make \"predictions\". Now we need to deploy our GBM model in the same way.\n </i></br><footer>Splice Machine</footer></blockquote><br>\n\n"},{"metadata":{"trusted":true,"scrolled":true},"cell_type":"code","source":"# Get the run_id from the name. Note that multiple runs can have the same name, so this returns a list\nrun_id = mlflow.get_run_ids_by_name('GBM with word vectors')[0]\n# Get the model from that run\ngbm_vec_model = mlflow.load_model(run_id=run_id, name='vectorized_model')\ndeploy_df = hc.asSparkFrame(ext_reviews[predictors])\nschema = 'REPLACE_ME_DBSCHEMA'\nschema='splice'\nsplice._dropTableIfExists(f'{schema}.gbm_w2v_model')\nsplice._dropTableIfExists(f'{schema}.gbm_w2v_model_preds')\nmlflow.deploy_db(gbm_vec_model, deploy_df, schema, 'gbm_w2v_model', [('REVIEW_ID', 'INT')], classes=['negative', 'positive'], run_id=run_id, verbose=True)","execution_count":85,"outputs":[{"output_type":"stream","text":"Droping table splice.gbm_w2v_model\nDroping table splice.gbm_w2v_model_preds\nA model with this ID already exists in the table. We are NOT replacing it. We will use the currently existing model.\nTo replace, use a new run_id\nPrediction labels found. Using ['negative', 'positive'] as labels for predictions [0, 1] respectively\nDeploying model acbb4ff7f942 to table splice.gbm_w2v_model\nCreating data table ... \n CREATE TABLE splice.gbm_w2v_model (\n\tProductId VARCHAR(5000),\tUserId VARCHAR(5000),\tHelpfulnessNumerator SMALLINT,\tHelpfulnessDenominator SMALLINT,\tTime INTEGER,\tC1 DOUBLE,\tC2 DOUBLE,\tC3 DOUBLE,\tC4 DOUBLE,\tC5 DOUBLE,\tC6 DOUBLE,\tC7 DOUBLE,\tC8 DOUBLE,\tC9 DOUBLE,\tC10 DOUBLE,\tC11 DOUBLE,\tC12 DOUBLE,\tC13 DOUBLE,\tC14 DOUBLE,\tC15 DOUBLE,\tC16 DOUBLE,\tC17 DOUBLE,\tC18 DOUBLE,\tC19 DOUBLE,\tC20 DOUBLE,\tC21 DOUBLE,\tC22 DOUBLE,\tC23 DOUBLE,\tC24 DOUBLE,\tC25 DOUBLE,\tC26 DOUBLE,\tC27 DOUBLE,\tC28 DOUBLE,\tC29 DOUBLE,\tC30 DOUBLE,\tC31 DOUBLE,\tC32 DOUBLE,\tC33 DOUBLE,\tC34 DOUBLE,\tC35 DOUBLE,\tC36 DOUBLE,\tC37 DOUBLE,\tC38 DOUBLE,\tC39 DOUBLE,\tC40 DOUBLE,\tC41 DOUBLE,\tC42 DOUBLE,\tC43 DOUBLE,\tC44 DOUBLE,\tC45 DOUBLE,\tC46 DOUBLE,\tC47 DOUBLE,\tC48 DOUBLE,\tC49 DOUBLE,\tC50 DOUBLE,\tC51 DOUBLE,\tC52 DOUBLE,\tC53 DOUBLE,\tC54 DOUBLE,\tC55 DOUBLE,\tC56 DOUBLE,\tC57 DOUBLE,\tC58 DOUBLE,\tC59 DOUBLE,\tC60 DOUBLE,\tC61 DOUBLE,\tC62 DOUBLE,\tC63 DOUBLE,\tC64 DOUBLE,\tC65 DOUBLE,\tC66 DOUBLE,\tC67 DOUBLE,\tC68 DOUBLE,\tC69 DOUBLE,\tC70 DOUBLE,\tC71 DOUBLE,\tC72 DOUBLE,\tC73 DOUBLE,\tC74 DOUBLE,\tC75 DOUBLE,\tC76 DOUBLE,\tC77 DOUBLE,\tC78 DOUBLE,\tC79 DOUBLE,\tC80 DOUBLE,\tC81 DOUBLE,\tC82 DOUBLE,\tC83 DOUBLE,\tC84 DOUBLE,\tC85 DOUBLE,\tC86 DOUBLE,\tC87 DOUBLE,\tC88 DOUBLE,\tC89 DOUBLE,\tC90 DOUBLE,\tC91 DOUBLE,\tC92 DOUBLE,\tC93 DOUBLE,\tC94 DOUBLE,\tC95 DOUBLE,\tC96 DOUBLE,\tC97 DOUBLE,\tC98 DOUBLE,\tC99 DOUBLE,\tC100 DOUBLE,\tREVIEW_ID INT,\n\tPRIMARY KEY(REVIEW_ID)\n)\n\nDone.\nCreating prediction table ... \nCREATE TABLE splice.gbm_w2v_model_PREDS (\n \tCUR_USER VARCHAR(50) DEFAULT CURRENT_USER,\n \tEVAL_TIME TIMESTAMP DEFAULT CURRENT_TIMESTAMP,\n \tRUN_ID VARCHAR(50) DEFAULT 'acbb4ff7f942',\n \tREVIEW_ID INT,\n\tPREDICTION VARCHAR(5000),\n\t\"negative\" DOUBLE,\n\t\"positive\" DOUBLE,\n\tPRIMARY KEY(REVIEW_ID)\n)\n\nDone.\nCreating model prediction trigger ... \nCREATE TRIGGER runModel_splice_gbm_w2v_model_acbb4ff7f942\n \tAFTER INSERT\n \tON splice.gbm_w2v_model\n \tREFERENCING NEW AS NEWROW\n \tFOR EACH ROW\n \t\tINSERT INTO splice.gbm_w2v_model_PREDS(\tREVIEW_ID,PREDICTION) VALUES(\tNEWROW.REVIEW_ID,MLMANAGER.PREDICT_CLASSIFICATION('acbb4ff7f942',TRIM(CAST(NEWROW.ProductId as CHAR(41)))||','||TRIM(CAST(NEWROW.UserId as CHAR(41)))||','||TRIM(CAST(NEWROW.HelpfulnessNumerator as CHAR(41)))||','||TRIM(CAST(NEWROW.HelpfulnessDenominator as CHAR(41)))||','||TRIM(CAST(NEWROW.Time as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C1 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C2 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C3 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C4 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C5 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C6 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C7 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C8 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C9 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C10 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C11 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C12 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C13 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C14 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C15 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C16 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C17 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C18 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C19 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C20 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C21 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C22 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C23 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C24 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C25 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C26 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C27 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C28 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C29 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C30 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C31 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C32 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C33 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C34 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C35 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C36 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C37 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C38 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C39 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C40 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C41 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C42 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C43 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C44 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C45 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C46 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C47 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C48 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C49 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C50 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C51 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C52 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C53 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C54 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C55 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C56 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C57 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C58 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C59 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C60 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C61 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C62 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C63 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C64 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C65 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C66 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C67 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C68 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C69 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C70 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C71 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C72 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C73 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C74 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C75 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C76 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C77 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C78 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C79 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C80 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C81 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C82 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C83 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C84 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C85 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C86 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C87 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C88 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C89 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C90 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C91 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C92 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C93 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C94 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C95 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C96 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C97 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C98 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C99 as DECIMAL(38,10)) as CHAR(41)))||','||TRIM(CAST(CAST(NEWROW.C100 as DECIMAL(38,10)) as CHAR(41))),\n","name":"stdout"},{"output_type":"stream","text":"'ProductId VARCHAR(5000),UserId VARCHAR(5000),HelpfulnessNumerator SMALLINT,HelpfulnessDenominator SMALLINT,Time INTEGER,C1 DOUBLE,C2 DOUBLE,C3 DOUBLE,C4 DOUBLE,C5 DOUBLE,C6 DOUBLE,C7 DOUBLE,C8 DOUBLE,C9 DOUBLE,C10 DOUBLE,C11 DOUBLE,C12 DOUBLE,C13 DOUBLE,C14 DOUBLE,C15 DOUBLE,C16 DOUBLE,C17 DOUBLE,C18 DOUBLE,C19 DOUBLE,C20 DOUBLE,C21 DOUBLE,C22 DOUBLE,C23 DOUBLE,C24 DOUBLE,C25 DOUBLE,C26 DOUBLE,C27 DOUBLE,C28 DOUBLE,C29 DOUBLE,C30 DOUBLE,C31 DOUBLE,C32 DOUBLE,C33 DOUBLE,C34 DOUBLE,C35 DOUBLE,C36 DOUBLE,C37 DOUBLE,C38 DOUBLE,C39 DOUBLE,C40 DOUBLE,C41 DOUBLE,C42 DOUBLE,C43 DOUBLE,C44 DOUBLE,C45 DOUBLE,C46 DOUBLE,C47 DOUBLE,C48 DOUBLE,C49 DOUBLE,C50 DOUBLE,C51 DOUBLE,C52 DOUBLE,C53 DOUBLE,C54 DOUBLE,C55 DOUBLE,C56 DOUBLE,C57 DOUBLE,C58 DOUBLE,C59 DOUBLE,C60 DOUBLE,C61 DOUBLE,C62 DOUBLE,C63 DOUBLE,C64 DOUBLE,C65 DOUBLE,C66 DOUBLE,C67 DOUBLE,C68 DOUBLE,C69 DOUBLE,C70 DOUBLE,C71 DOUBLE,C72 DOUBLE,C73 DOUBLE,C74 DOUBLE,C75 DOUBLE,C76 DOUBLE,C77 DOUBLE,C78 DOUBLE,C79 DOUBLE,C80 DOUBLE,C81 DOUBLE,C82 DOUBLE,C83 DOUBLE,C84 DOUBLE,C85 DOUBLE,C86 DOUBLE,C87 DOUBLE,C88 DOUBLE,C89 DOUBLE,C90 DOUBLE,C91 DOUBLE,C92 DOUBLE,C93 DOUBLE,C94 DOUBLE,C95 DOUBLE,C96 DOUBLE,C97 DOUBLE,C98 DOUBLE,C99 DOUBLE,C100 DOUBLE'))\n\nDone.\nCreating parsing trigger ... \nCREATE TRIGGER PARSERESULT_splice_gbm_w2v_model_acbb4ff7f942\n \tAFTER INSERT\n \tON splice.gbm_w2v_model_PREDS\n \tREFERENCING NEW AS NEWROW\n \tFOR EACH ROW\n \t\tUPDATE splice.gbm_w2v_model_PREDS set \"negative\"=MLMANAGER.PARSEPROBS(NEWROW.prediction,0),\"positive\"=MLMANAGER.PARSEPROBS(NEWROW.prediction,1),PREDICTION=\n\t\tCASE\n\t\tWHEN MLMANAGER.GETPREDICTION(NEWROW.prediction)=0 then 'negative'\n\t\tWHEN MLMANAGER.GETPREDICTION(NEWROW.prediction)=1 then 'positive'\n\t\tEND WHERE REVIEW_ID=NEWROW.REVIEW_ID\n\nDone.\nModel Deployed.\n","name":"stdout"}]},{"metadata":{"trusted":false},"cell_type":"markdown","source":"## Almost Done!\n<blockquote>\n Now we have 2 models deployed in the database:\n <ul>\n <li> A model that takes a sentence and converts it into a vector</li>\n <li> A model that takes the vector + a few other features and makes a prediction about the review </li>\n </ul>\n Now, all we need to do is connect them with a pipeline. We'll connect the 2 tables together like the image above to create a full cycle ML Pipeline\n</i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"trusted":true},"cell_type":"code","source":"%%sql\n\ndrop trigger WORD2VEC_PIPELINE;\ndrop trigger WORD2VEC_PIPELINE2;\n\nCREATE TRIGGER WORD2VEC_PIPELINE\nAFTER INSERT\nON AMAZON_REVIEWS\nREFERENCING NEW AS N\nFOR EACH ROW\nINSERT INTO WORD_VEC_MODEL values(N.REVIEW, N.ID);\n\n\nCREATE TRIGGER WORD2VEC_PIPELINE2\nAFTER INSERT\nON WORD_VEC_MODEL_PREDS\nREFERENCING NEW_TABLE AS NEWROW\nFOR EACH STATEMENT\nINSERT INTO GBM_W2V_MODEL (PRODUCTID, USERID, HELPFULNESSNUMERATOR, HELPFULNESSDENOMINATOR, TIME, REVIEW_ID, C1,C2,C3,C4,C5,C6,C7,C8,C9,C10,C11,C12,C13,C14,C15,C16,C17,C18,C19,C20,C21,C22,C23,C24,C25,C26,C27,C28,C29,C30,C31,C32,C33,C34,C35,C36,C37,C38,C39,C40,C41,C42,C43,C44,C45,C46,C47,C48,C49,C50,C51,C52,C53,C54,C55,C56,C57,C58,C59,C60,C61,C62,C63,C64,C65,C66,C67,C68,C69,C70,C71,C72,C73,C74,C75,C76,C77,C78,C79,C80,C81,C82,C83,C84,C85,C86,C87,C88,C89,C90,C91,C92,C93,C94,C95,C96,C97,C98,C99,C100)\n SELECT a.PRODUCTID, a.USERID, a.HELPFULNESSNUMERATOR, a.HELPFULNESSDENOMINATOR, a.REVIEW_TIME, a.ID, N.C1,N.C2,N.C3,N.C4,N.C5,N.C6,N.C7,N.C8,N.C9,N.C10,N.C11,N.C12,N.C13,N.C14,N.C15,N.C16,N.C17,N.C18,N.C19,N.C20,N.C21,N.C22,N.C23,N.C24,N.C25,N.C26,N.C27,N.C28,N.C29,N.C30,N.C31,N.C32,N.C33,N.C34,N.C35,N.C36,N.C37,N.C38,N.C39,N.C40,N.C41,N.C42,N.C43,N.C44,N.C45,N.C46,N.C47,N.C48,N.C49,N.C50,N.C51,N.C52,N.C53,N.C54,N.C55,N.C56,N.C57,N.C58,N.C59,N.C60,N.C61,N.C62,N.C63,N.C64,N.C65,N.C66,N.C67,N.C68,N.C69,N.C70,N.C71,N.C72,N.C73,N.C74,N.C75,N.C76,N.C77,N.C78,N.C79,N.C80,N.C81,N.C82,N.C83,N.C84,N.C85,N.C86,N.C87,N.C88,N.C89,N.C90,N.C91,N.C92,N.C93,N.C94,N.C95,N.C96,N.C97,N.C98,N.C99,N.C100\n FROM NEWROW N, AMAZON_REVIEWS a --splice-properties useSpark=false\n WHERE a.ID=N.REVIEW_ID;\n ","execution_count":130,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"10d4b9c4-5f38-4279-8f35-61377bfeba96","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"24b49926-76e9-4fcc-9bf9-d2ff238319ec","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"c59e8f24-462b-4f7f-a53e-18b7321e1c67","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"31ebf383-0d94-43a0-a2dd-4d885d2c8251","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"reviews[reviews['PositiveReview']=='0']","execution_count":142,"outputs":[{"output_type":"display_data","data":{"text/html":"<table>\n<thead>\n<tr><th>ProductId </th><th>UserId </th><th>Summary </th><th style=\"text-align: right;\"> Score</th><th style=\"text-align: right;\"> HelpfulnessDenominator</th><th style=\"text-align: right;\"> Id</th><th>ProfileName </th><th style=\"text-align: right;\"> HelpfulnessNumerator</th><th style=\"text-align: right;\"> Time</th><th>Text </th><th style=\"text-align: right;\"> PositiveReview</th></tr>\n</thead>\n<tbody>\n<tr><td>B00141QYSQ </td><td>A1YS02UZZGRDCT</td><td>Do Not Buy </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 41471</td><td>Evan Eberhardt </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">1.34836e+09</td><td>These are made in China (do not buy ANY pet food from China). Dogswell has been using propylene glycol to soften their treats (what are they thinkng?). Do not purchase or support this company in any way until they clean up their act. And for whatever reason Amazon doesn&#x27;t allow returns of this item, so I had to toss mine out. Bad business all around on this one. </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B0089SPEO2 </td><td>A3JOYNYL458QHP</td><td>Less lemon and less zing </td><td style=\"text-align: right;\"> 3</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\"> 28582</td><td>coleridge </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.32391e+09</td><td>Everything is ok, except it just isn&#x27;t as good as it is in the bags. Just considerably more bland -- less lemon and less zing. Boring. </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B0041CIR62 </td><td>A16I6WJUEBJ1C3</td><td>okay but not as healthy as it appears</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">138997</td><td>doctorsirena &quot;doctorsirena&quot; </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.34369e+09</td><td>I am always looking for healthier, whole grain versions of foods I enjoy. Unfortunately, these Peacock brand noodles are yet another food masquerading as healthy. The product title in big letters on the front says &quot;Brown Rice Vermicelli&quot;, making the consumer think &quot;this is made with brown rice, so it should be a healthy choice&quot;. But the first indication that it is not is when looking at the fiber content on the nutrition facts - only 0.6g per 2oz serving. Then onto the ingredients list to see why so low... contains brown rice, sago starch and water. The sago starch comes from palms and must not have much (if any) fiber.&lt;br /&gt;&lt;br /&gt;The Annie Chun&#x27;s Maifun Brown Rice Noodles (sold on Amazon and in my local healthy grocer) has become one of my staples and is my frame of reference when comparing to the Peacock brand. The Annie Chun&#x27;s product is made with 100% whole grain, with ingredients brown rice flour and water. Per 2oz serving, it has 4g fiber and pretty much the same calories and other nutrients as the Peacock brand.&lt;br /&gt;&lt;br /&gt;If you do try this Peacock brand noodles and have not used rice noodles before, you will need to seek guidance elsewhere on preparation. As others have pointed out, the Peacock package gives almost no directions on how to prepare the product, aside from a brief mention in the recipes (in the header text it does say that they are &quot;easy-to-cook&quot; but does not say how). It also contains a very strange recipe for rice noodles: Aglio Olio style - this is an Italian recipe for noodles with olive oil/garlic/sprinkled with grated cheese that I think would not be very tasty. The second recipe appears to be for a soup with veggie strips. Neither recipe gives amounts or much direction. In comparison, the Annie Chun&#x27;s package gives clear, specific directions on rice noodle preparation and two recipes.&lt;br /&gt;&lt;br /&gt;I use rice noodles = maifun = rice sticks = sometimes called vermicelli for making the Vietnamese salad &quot;bun tofu&quot;, to serve with stir-fried veggies or in lettuce rolls. They can also be used in spring rolls/egg rolls. When cooking with thin rice noodles, be careful not to oversoak/overcook/overmix or they tend to disintegrate. Asian rice noodle vermicelli (maifun) are not the same as Italian vermicelli and are not readily interchangeable. If making an Italian recipe, the best results would be expected from Italian pasta and not maifun.&lt;br /&gt;&lt;br /&gt;A few final notes... Both Peacock and Annie Chun&#x27;s brown rice noodles are gluten free. The Peacock is made in Singapore and the Annie Chun&#x27;s in Thailand. The Peacock noodles do taste fine (kind of bland), but so do the Annie Chun&#x27;s. At this time, they are both approximately the same price. Peacock come in an plastic bag with some noodle crushage upon shipping; Annie Chun&#x27;s are perfect upon removal from their cellophane bag in a box. Overall, I highly recommend the Annie Chun&#x27;s Maifun as a healthier option over the Peacock brand. On a related note, the Annie Chun&#x27;s soba and brown rice pad thai noodles are also excellent.&lt;br /&gt;&lt;br /&gt;Rating for this product: 2.5 stars rounded down to 2 stars.</td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B005K4Q4KG </td><td>A13L66J35SMYE5</td><td>Not good </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">243065</td><td>Elizabeth Ramsoram </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.33065e+09</td><td>This does not taste like cocoa it is very weak no matter what cup size you use.I is just not good. </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B00412W76S </td><td>A1ATV7O231DXIS</td><td>Not my cup of tea.. </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\"> 33918</td><td>JuneBug1783 </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.31233e+09</td><td>Supposedly increases milk production? Yeah, I didn&#x27;t see a difference after drinking this for a good 3+ weeks. Aside from that, it doesn&#x27;t smell or taste very good. Very herbal medicine-like. </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B005BSR8GU </td><td>A3QLIJ6CHAA0BS</td><td>not seatles best... im pretty sure </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 3</td><td style=\"text-align: right;\">316099</td><td>D. E. Valazza </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.32771e+09</td><td>it seemed to have more caffeen than usual if that&#x27;s even possiable.. aNd has a weird smell that everyone else could smell, but I could not. I thought it had less of the wonderful foldgers coffee... like before you use it when you just opened the tub of the stuff. THats a big this I missed with this coffee... however I do like the decafinated hazel nut pretty well... but this is not the best coffee i&#x27;ve ever had... douncan donuts is better :-/ </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B001L4929Y </td><td>A27IP3T9ZKUMSS</td><td>Poor quality control </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">419889</td><td>W. Robert Kiser &quot;Happily living in Tidewater&quot;</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\">1.28347e+09</td><td>I&#x27;ve ordered these before and was quite pleased. This order was a disappointment. It would appear that the anchovy filets have broken down into a congealed mess of almost anchony paste. I will not order these again sight unseen. Although is product is about 20% more expensive in my local grocery, I will buy there where I can see exactly what I&#x27;ll be taking home. My advice to others is to do the same. </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B002HT1F88 </td><td>AFDVE0BR190N5 </td><td>TO MUCH VINEGAR </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 93274</td><td>maryann rudder </td><td style=\"text-align: right;\"> 0</td><td style=\"text-align: right;\">1.3373e+09 </td><td>tHIS ORGANIC DRINK WAS NOT MY CUP OF TEA...THERE WAS A STRONG VINEGAR TASTE... TOO STRONG FOR ME...HOT OR COLD ..THE SMELL AND TASTE JUST WASN&#x27;T THERE...i DID NOT LIKE IT... </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B008O3G2GG </td><td>A2IYX5W3A1N3VB</td><td>That&#x27;s what she said </td><td style=\"text-align: right;\"> 1</td><td style=\"text-align: right;\"> 5</td><td style=\"text-align: right;\">436726</td><td>Orion </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">1.32045e+09</td><td>This product is really dumb and pointless. My dog was simply not interested, and I was not interested in holding it in my hand while she licked it. When I ordered it, I thought I was getting something you could just give the dog, and she would entertain herself for a long time, by licking. It sounded like a good idea because my dog really likes to lick things. I give her something lick-able when I want her to go lay down and be quiet. This product is useless for that purpose. This is an interactive treat. You and your dog both have to interact with it. It gets old, pretty much immediately.&lt;br /&gt;&lt;br /&gt;The product itself is gross. It&#x27;s sticky and unappealing. The smell is nauseating. My dog would take a sniff, and sometimes a trial lick, and then just walk away. I&#x27;d be left standing there holding a phallic shaped bottle of mysterious smelly goo.&lt;br /&gt;&lt;br /&gt;Here are some things my dog likes better than Lickety Stiks: dirty diapers, dead pigeons, week-old garbage, and other dog&#x27;s butts.&lt;br /&gt;&lt;br /&gt;The only people who really enjoyed the product are the teenagers in my house, who think &quot;Lickety Stik&quot; is possibly the most hilarious product name they ever heard. The jokes lasted most of the afternoon. The packaging offers plenty of crude suggestions (&quot;give a lick&quot; tm they trademarked that??), as if my kids needed any encouragement. Even the shape of the bottle - a rigid cylinder with a ball on top - was fodder for hilarity. They thought it was just so funny.&lt;br /&gt;&lt;br /&gt;So this is a stupid product with a funny name and a suggestive shape, which solves no problems, has no appeal beyond comedy, and is completely worthless. </td><td style=\"text-align: right;\"> 0</td></tr>\n<tr><td>B001P76XAS </td><td>A14UH9I3P609X9</td><td>disappointed </td><td style=\"text-align: right;\"> 3</td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\"> 35521</td><td>Mike </td><td style=\"text-align: right;\"> 2</td><td style=\"text-align: right;\">1.31363e+09</td><td>I went to the store and noticed that Pop Secret changed the package size in the 6 Pack Homestyle from 3.5 to 3.2 ounces. Then they say you get more than Orville Redenbocker to try to justify the shrinking package. So, instead of getting 3.5 ounces as you did before, you&#x27;re now getting 3.2 ounces and it takes less time to cook. I don&#x27;t think this is right, I&#x27;m going to have to start shopping around even though I like this popcorn.&lt;br /&gt;&lt;a href=&quot;http://www.amazon.com/gp/product/B001P7531S&quot;&gt;Pop Secret Homestyle Popcorn, Microwavable Popcorn, 3-Count, 10.5-Ounce Box (Pack of 6)&lt;/a&gt; </td><td style=\"text-align: right;\"> 0</td></tr>\n</tbody>\n</table>"},"metadata":{}},{"output_type":"execute_result","execution_count":142,"data":{"text/plain":""},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"%%sql\nselect top 10 * from amazon_reviews where score =1","execution_count":174,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"8d62e8df-e8a9-4df6-baea-65484a646ebd","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":true},"cell_type":"code","source":"%%sql\ninsert into amazon_reviews values ('B00141QYSQ', 'A1YS02UZZGRDCT', 'Do Not Buy', 2, 7, 9993329, 'Evan Eberhardt', 0, 1314489600, 'Nothing like what i expected');\ninsert into amazon_reviews values ('B0009XLVGA', 'A1NHQNQ3TVXTZF', 'An awesome choice', 5, 2, 999350, 'Evan Eberhardt', 0, 1314433500, 'You have to buy this! Its great!');\n\nselect * from GBM_W2V_MODEL_preds","execution_count":196,"outputs":[{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"2d6f1971-a4e5-4a5d-abba-e263fc94642b","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"949371b5-0a53-4c12-97e1-bdc7f4002693","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"931dcfff-9f42-4301-806b-2c86954db4f2","version_major":2}},"metadata":{}},{"output_type":"display_data","data":{"method":"display_data","application/vnd.jupyter.widget-view+json":{"version_minor":0,"model_id":"8434442a-0a49-4197-beb5-a03b76f8555e","version_major":2}},"metadata":{}}]},{"metadata":{"trusted":false},"cell_type":"markdown","source":"# Incredible!\n## Let's recap\n<blockquote>\n We:\n <ul>\n <li> Imported data from external sources in both SQL and Python</li>\n <li> Created a simple model with decent accuracy using H2O </li>\n <li> Imroved that model drastically using a word2vec pipeline with SKLearn </li>\n <li> Tracked, compared, and persisted all of our model and run information in mlflow </li>\n <li> Deployed the better model, along with the standalone word2vec pipeline to table directly in the database </li>\n <li> Chained those tables together with simple triggers </li>\n <li> Made predictions on new Amazon reviews </li>\n </ul>\n \n That's quite the accomplishment. Congratulations!\n</i></br><footer>Splice Machine</footer></blockquote><br>"},{"metadata":{"trusted":true},"cell_type":"code","source":"","execution_count":null,"outputs":[]}],"metadata":{"kernelspec":{"name":"python3","display_name":"Python 3","language":"python"},"language_info":{"name":"python","version":"3.7.6","mimetype":"text/x-python","codemirror_mode":{"name":"ipython","version":3},"pygments_lexer":"ipython3","nbconvert_exporter":"python","file_extension":".py"},"toc":{"nav_menu":{},"number_sections":false,"sideBar":true,"skip_h1_title":false,"base_numbering":1,"title_cell":"Table of Contents","title_sidebar":"Contents","toc_cell":false,"toc_position":{"height":"calc(100% - 180px)","width":"212px","left":"10px","top":"150px"},"toc_section_display":true,"toc_window_display":true}},"nbformat":4,"nbformat_minor":4}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment