radekosmulski/train_and_serve_an_XGBoost_model.ipynb

## train_and_serve_an_XGBoost_model.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "16a4ed47",
   "metadata": {},
   "source": [
    "This notebook was created using the `nvcr.io/nvidia/merlin/merlin-tensorflow:22.07` container."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "16b312a0",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture\n",
    "!cd /models && git pull && pip install ."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "cad665f2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from merlin.core.utils import Distributed\n",
    "from merlin.models.xgb import XGBoost\n",
    "from merlin.schema.tags import Tags\n",
    "import nvtabular as nvt\n",
    "import cudf\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "938c3990",
   "metadata": {},
   "source": [
    "# Create data"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e7ef4c25",
   "metadata": {},
   "source": [
    "We could use one of the standard datasets. But it is much more fun and more instructive to create your own data.\n",
    "\n",
    "This way we will know exactly what is going an and what to expect.\n",
    "\n",
    "We will serve a very simple recommender system -- we will recommend items with a price below 6 and not recommend the more expensive ones."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3a6790bc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   price\n",
       "0      7\n",
       "1     10\n",
       "2      6\n",
       "3      3\n",
       "4      2"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = cudf.DataFrame(data={'price': np.random.randint(1,11, 100_000)})\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "9e1dcc9c",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = nvt.Dataset(df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "9248d73e",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>target</th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   target  price\n",
       "0       0      7\n",
       "1       0     10\n",
       "2       0      6\n",
       "3       1      3\n",
       "4       1      2"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "target = (['price'] \n",
    "              >> nvt.ops.LambdaOp(lambda p: (p < 6).astype(np.int64))\n",
    "              >> nvt.ops.Rename(name='target')\n",
    "              >> nvt.ops.AddTags(tags=[Tags.TARGET, Tags.BINARY_CLASSIFICATION])\n",
    "         )\n",
    "\n",
    "wf = nvt.Workflow(['price'] + target)\n",
    "dataset = wf.fit_transform(dataset)\n",
    "\n",
    "dataset.compute().head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f7195510",
   "metadata": {},
   "source": [
    "# Train an XGBoost model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "2935f4e6",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "2022-08-26 08:16:39,150 - distributed.diskutils - INFO - Found stale lock file and directory '/workspace/dask-worker-space/worker-zy4ci3ii', purging\n",
      "2022-08-26 08:16:39,150 - distributed.diskutils - INFO - Found stale lock file and directory '/workspace/dask-worker-space/worker-aefv2uyz', purging\n",
      "2022-08-26 08:16:39,151 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[0]\ttrain-logloss:0.43750\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "[08:16:40] task [xgboost.dask]:tcp://127.0.0.1:42453 got new rank 0\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[20]\ttrain-logloss:0.00084\n",
      "[40]\ttrain-logloss:0.00002\n",
      "[59]\ttrain-logloss:0.00002\n"
     ]
    }
   ],
   "source": [
    "with Distributed():\n",
    "    model = XGBoost(schema=dataset.schema, objective='binary:logistic')\n",
    "    model.fit(\n",
    "        dataset,\n",
    "        num_boost_round=60,\n",
    "        verbose_eval=20\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "648ce4de",
   "metadata": {},
   "source": [
    "# Did it train?"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "082d574b",
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.\n",
      "Perhaps you already have a cluster running?\n",
      "Hosting the HTTP server on port 37723 instead\n",
      "  warnings.warn(\n",
      "2022-08-26 08:16:42,231 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n"
     ]
    }
   ],
   "source": [
    "test_dataset = nvt.Dataset(cudf.DataFrame(data={'price': np.arange(1,11)}))\n",
    "\n",
    "with Distributed():\n",
    "    test_preds = model.predict(test_dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "e094c052",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>price</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>6</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>7</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>8</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>9</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   price\n",
       "0      1\n",
       "1      2\n",
       "2      3\n",
       "3      4\n",
       "4      5\n",
       "5      6\n",
       "6      7\n",
       "7      8\n",
       "8      9\n",
       "9     10"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_dataset.compute()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "5ac46d25",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1., 1., 1., 1., 1., 0., 0., 0., 0., 0.], dtype=float32)"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "test_preds.round()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09bb6029",
   "metadata": {},
   "source": [
    "👍👍👍"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "634a49ff",
   "metadata": {},
   "source": [
    "# Instruct the inference server how to predict"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "5204e7c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "from merlin.systems.dag.ops.fil import PredictForest\n",
    "from merlin.systems.dag.ensemble import Ensemble\n",
    "\n",
    "inference_schema = dataset.schema.without('target')  # we don't neeed the target information for inference\n",
    "inference_ops = ['price'] >> PredictForest(model, inference_schema)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "3d001f24",
   "metadata": {},
   "outputs": [],
   "source": [
    "ensemble = Ensemble(inference_ops, inference_schema)\n",
    "ensemble.export('inference_recipe');"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "766e6dcc",
   "metadata": {},
   "source": [
    "# Start Triton Inference Server"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "90382ba5",
   "metadata": {},
   "source": [
    "The Triton Inference Server is an amazing piece of technology. Among my favorite features, it offers dynamic batching with latency guarantees \n",
    "\n",
    "You can read more about Triton [here](https://developer.nvidia.com/blog/fast-and-scalable-ai-model-deployment-with-nvidia-triton-inference-server/)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "e3bfb007",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<subprocess.Popen at 0x7f33b4cd6040>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "I0826 08:17:02.873242 4303 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f5c14000000' with size 268435456\n",
      "I0826 08:17:02.873567 4303 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864\n",
      "I0826 08:17:02.875904 4303 model_repository_manager.cc:1191] loading: 0_transformworkflow:1\n",
      "I0826 08:17:02.976049 4303 model_repository_manager.cc:1191] loading: 1_fil:1\n",
      "I0826 08:17:02.979709 4303 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)\n",
      "I0826 08:17:03.076186 4303 model_repository_manager.cc:1191] loading: 0_fil:1\n",
      "I0826 08:17:03.176329 4303 model_repository_manager.cc:1191] loading: 0_predictforest:1\n",
      "I0826 08:17:03.276459 4303 model_repository_manager.cc:1191] loading: 1_predictforest:1\n",
      "I0826 08:17:04.248338 4303 model_repository_manager.cc:1345] successfully loaded '0_transformworkflow' version 1\n",
      "I0826 08:17:04.255588 4303 initialize.hpp:43] TRITONBACKEND_Initialize: fil\n",
      "I0826 08:17:04.255604 4303 backend.hpp:47] Triton TRITONBACKEND API version: 1.10\n",
      "I0826 08:17:04.255611 4303 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.10\n",
      "I0826 08:17:04.255906 4303 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: 1_fil (version 1)\n",
      "I0826 08:17:04.256406 4303 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: 0_fil (version 1)\n",
      "I0826 08:17:04.258113 4303 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: 1_fil_0 (GPU device 0)\n",
      "I0826 08:17:04.277080 4303 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: 0_fil_0 (GPU device 0)\n",
      "I0826 08:17:04.277199 4303 model_repository_manager.cc:1345] successfully loaded '1_fil' version 1\n",
      "I0826 08:17:04.282969 4303 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 0_predictforest (GPU device 0)\n",
      "I0826 08:17:04.283013 4303 model_repository_manager.cc:1345] successfully loaded '0_fil' version 1\n",
      "I0826 08:17:05.881247 4303 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 1_predictforest (GPU device 0)\n",
      "I0826 08:17:05.881321 4303 model_repository_manager.cc:1345] successfully loaded '0_predictforest' version 1\n",
      "I0826 08:17:07.435247 4303 model_repository_manager.cc:1345] successfully loaded '1_predictforest' version 1\n",
      "I0826 08:17:07.435399 4303 model_repository_manager.cc:1191] loading: ensemble_model:1\n",
      "I0826 08:17:07.535668 4303 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1\n",
      "I0826 08:17:07.535729 4303 server.cc:556] \n",
      "+------------------+------+\n",
      "| Repository Agent | Path |\n",
      "+------------------+------+\n",
      "+------------------+------+\n",
      "\n",
      "I0826 08:17:07.535771 4303 server.cc:583] \n",
      "+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
      "| Backend | Path                                                  | Config                                                                                                                                                         |\n",
      "+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
      "| python  | /opt/tritonserver/backends/python/libtriton_python.so | {\"cmdline\":{\"auto-complete-config\":\"false\",\"min-compute-capability\":\"6.000000\",\"backend-directory\":\"/opt/tritonserver/backends\",\"default-max-batch-size\":\"4\"}} |\n",
      "| fil     | /opt/tritonserver/backends/fil/libtriton_fil.so       | {\"cmdline\":{\"auto-complete-config\":\"false\",\"min-compute-capability\":\"6.000000\",\"backend-directory\":\"/opt/tritonserver/backends\",\"default-max-batch-size\":\"4\"}} |\n",
      "+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
      "\n",
      "I0826 08:17:07.535806 4303 server.cc:626] \n",
      "+---------------------+---------+--------+\n",
      "| Model               | Version | Status |\n",
      "+---------------------+---------+--------+\n",
      "| 0_fil               | 1       | READY  |\n",
      "| 0_predictforest     | 1       | READY  |\n",
      "| 0_transformworkflow | 1       | READY  |\n",
      "| 1_fil               | 1       | READY  |\n",
      "| 1_predictforest     | 1       | READY  |\n",
      "| ensemble_model      | 1       | READY  |\n",
      "+---------------------+---------+--------+\n",
      "\n",
      "I0826 08:17:07.560218 4303 metrics.cc:650] Collecting metrics for GPU 0: Quadro RTX 8000\n",
      "I0826 08:17:07.560464 4303 tritonserver.cc:2159] \n",
      "+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
      "| Option                           | Value                                                                                                                                                                                        |\n",
      "+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
      "| server_id                        | triton                                                                                                                                                                                       |\n",
      "| server_version                   | 2.23.0                                                                                                                                                                                       |\n",
      "| server_extensions                | classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace |\n",
      "| model_repository_path[0]         | ensemble                                                                                                                                                                                     |\n",
      "| model_control_mode               | MODE_NONE                                                                                                                                                                                    |\n",
      "| strict_model_config              | 1                                                                                                                                                                                            |\n",
      "| rate_limit                       | OFF                                                                                                                                                                                          |\n",
      "| pinned_memory_pool_byte_size     | 268435456                                                                                                                                                                                    |\n",
      "| cuda_memory_pool_byte_size{0}    | 67108864                                                                                                                                                                                     |\n",
      "| response_cache_byte_size         | 0                                                                                                                                                                                            |\n",
      "| min_supported_compute_capability | 6.0                                                                                                                                                                                          |\n",
      "| strict_readiness                 | 1                                                                                                                                                                                            |\n",
      "| exit_timeout                     | 30                                                                                                                                                                                           |\n",
      "+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
      "\n",
      "I0826 08:17:07.561154 4303 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001\n",
      "I0826 08:17:07.561330 4303 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000\n",
      "I0826 08:17:07.602039 4303 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002\n"
     ]
    }
   ],
   "source": [
    "import subprocess\n",
    "\n",
    "subprocess.Popen(['tritonserver', '--model-repository=inference_recipe'])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f52b160",
   "metadata": {},
   "source": [
    "# Predict"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3a108edc",
   "metadata": {},
   "source": [
    "You can talk to the Triton Inference Server how you'd talk to a server -- using HTTP/REST or GRPC.\n",
    "\n",
    "Here we will issue a request to the server from inside our notebook."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "0d4e5ce2",
   "metadata": {},
   "outputs": [],
   "source": [
    "from merlin.systems.triton import convert_df_to_triton_input\n",
    "import tritonclient.grpc as grpcclient\n",
    "\n",
    "inputs = convert_df_to_triton_input(inference_schema.column_names, test_dataset.compute())\n",
    "\n",
    "outputs = [\n",
    "    grpcclient.InferRequestedOutput(col)\n",
    "    for col in inference_ops.output_columns.names\n",
    "]\n",
    "\n",
    "# send request to tritonserver\n",
    "with grpcclient.InferenceServerClient(\"localhost:8001\") as client:\n",
    "    response = client.infer(\"ensemble_model\", inputs, outputs=outputs)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "85f822fc",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "array([1., 1., 1., 1., 1., 0., 0., 0., 0., 0.], dtype=float32)"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "response.as_numpy(outputs[0].name()).round()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "4e2d4a24",
   "metadata": {},
   "source": [
    "Success! 🚀\n",
    "\n",
    "If you would like to learn more about serving `XGBoost` models, please take a look at [this notebook](https://github.com/NVIDIA-Merlin/systems/blob/main/examples/Serving-An-XGboost-Model-With-Merlin-Systems.ipynb).\n",
    "\n",
    "And here is [an example of serving a DLRM model](https://github.com/NVIDIA-Merlin/systems/blob/main/examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb) using exactly the same approach as we use here."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"id": "16a4ed47",
	"metadata": {},
	"source": [
	"This notebook was created using the `nvcr.io/nvidia/merlin/merlin-tensorflow:22.07` container."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"id": "16b312a0",
	"metadata": {},
	"outputs": [],
	"source": [
	"%%capture\n",
	"!cd /models && git pull && pip install ."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"id": "cad665f2",
	"metadata": {},
	"outputs": [],
	"source": [
	"from merlin.core.utils import Distributed\n",
	"from merlin.models.xgb import XGBoost\n",
	"from merlin.schema.tags import Tags\n",
	"import nvtabular as nvt\n",
	"import cudf\n",
	"import numpy as np"
	]
	},
	{
	"cell_type": "markdown",
	"id": "938c3990",
	"metadata": {},
	"source": [
	"# Create data"
	]
	},
	{
	"cell_type": "markdown",
	"id": "e7ef4c25",
	"metadata": {},
	"source": [
	"We could use one of the standard datasets. But it is much more fun and more instructive to create your own data.\n",
	"\n",
	"This way we will know exactly what is going an and what to expect.\n",
	"\n",
	"We will serve a very simple recommender system -- we will recommend items with a price below 6 and not recommend the more expensive ones."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"id": "3a6790bc",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>price</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>7</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>10</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>6</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>3</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>2</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" price\n",
	"0 7\n",
	"1 10\n",
	"2 6\n",
	"3 3\n",
	"4 2"
	]
	},
	"execution_count": 3,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"df = cudf.DataFrame(data={'price': np.random.randint(1,11, 100_000)})\n",
	"df.head()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"id": "9e1dcc9c",
	"metadata": {},
	"outputs": [],
	"source": [
	"dataset = nvt.Dataset(df)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"id": "9248d73e",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>target</th>\n",
	" <th>price</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>0</td>\n",
	" <td>7</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>0</td>\n",
	" <td>10</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>0</td>\n",
	" <td>6</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>1</td>\n",
	" <td>3</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>1</td>\n",
	" <td>2</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" target price\n",
	"0 0 7\n",
	"1 0 10\n",
	"2 0 6\n",
	"3 1 3\n",
	"4 1 2"
	]
	},
	"execution_count": 5,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"target = (['price'] \n",
	" >> nvt.ops.LambdaOp(lambda p: (p < 6).astype(np.int64))\n",
	" >> nvt.ops.Rename(name='target')\n",
	" >> nvt.ops.AddTags(tags=[Tags.TARGET, Tags.BINARY_CLASSIFICATION])\n",
	" )\n",
	"\n",
	"wf = nvt.Workflow(['price'] + target)\n",
	"dataset = wf.fit_transform(dataset)\n",
	"\n",
	"dataset.compute().head()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "f7195510",
	"metadata": {},
	"source": [
	"# Train an XGBoost model"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"id": "2935f4e6",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"2022-08-26 08:16:39,150 - distributed.diskutils - INFO - Found stale lock file and directory '/workspace/dask-worker-space/worker-zy4ci3ii', purging\n",
	"2022-08-26 08:16:39,150 - distributed.diskutils - INFO - Found stale lock file and directory '/workspace/dask-worker-space/worker-aefv2uyz', purging\n",
	"2022-08-26 08:16:39,151 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n"
	]
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[0]\ttrain-logloss:0.43750\n"
	]
	},
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"[08:16:40] task [xgboost.dask]:tcp://127.0.0.1:42453 got new rank 0\n"
	]
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"[20]\ttrain-logloss:0.00084\n",
	"[40]\ttrain-logloss:0.00002\n",
	"[59]\ttrain-logloss:0.00002\n"
	]
	}
	],
	"source": [
	"with Distributed():\n",
	" model = XGBoost(schema=dataset.schema, objective='binary:logistic')\n",
	" model.fit(\n",
	" dataset,\n",
	" num_boost_round=60,\n",
	" verbose_eval=20\n",
	")"
	]
	},
	{
	"cell_type": "markdown",
	"id": "648ce4de",
	"metadata": {},
	"source": [
	"# Did it train?"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"id": "082d574b",
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"/usr/local/lib/python3.8/dist-packages/distributed/node.py:180: UserWarning: Port 8787 is already in use.\n",
	"Perhaps you already have a cluster running?\n",
	"Hosting the HTTP server on port 37723 instead\n",
	" warnings.warn(\n",
	"2022-08-26 08:16:42,231 - distributed.preloading - INFO - Import preload module: dask_cuda.initialize\n"
	]
	}
	],
	"source": [
	"test_dataset = nvt.Dataset(cudf.DataFrame(data={'price': np.arange(1,11)}))\n",
	"\n",
	"with Distributed():\n",
	" test_preds = model.predict(test_dataset)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"id": "e094c052",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/html": [
	"<div>\n",
	"<style scoped>\n",
	" .dataframe tbody tr th:only-of-type {\n",
	" vertical-align: middle;\n",
	" }\n",
	"\n",
	" .dataframe tbody tr th {\n",
	" vertical-align: top;\n",
	" }\n",
	"\n",
	" .dataframe thead th {\n",
	" text-align: right;\n",
	" }\n",
	"</style>\n",
	"<table border=\"1\" class=\"dataframe\">\n",
	" <thead>\n",
	" <tr style=\"text-align: right;\">\n",
	" <th></th>\n",
	" <th>price</th>\n",
	" </tr>\n",
	" </thead>\n",
	" <tbody>\n",
	" <tr>\n",
	" <th>0</th>\n",
	" <td>1</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>1</th>\n",
	" <td>2</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>2</th>\n",
	" <td>3</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>3</th>\n",
	" <td>4</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>4</th>\n",
	" <td>5</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>5</th>\n",
	" <td>6</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>6</th>\n",
	" <td>7</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>7</th>\n",
	" <td>8</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>8</th>\n",
	" <td>9</td>\n",
	" </tr>\n",
	" <tr>\n",
	" <th>9</th>\n",
	" <td>10</td>\n",
	" </tr>\n",
	" </tbody>\n",
	"</table>\n",
	"</div>"
	],
	"text/plain": [
	" price\n",
	"0 1\n",
	"1 2\n",
	"2 3\n",
	"3 4\n",
	"4 5\n",
	"5 6\n",
	"6 7\n",
	"7 8\n",
	"8 9\n",
	"9 10"
	]
	},
	"execution_count": 8,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"test_dataset.compute()"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"id": "5ac46d25",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([1., 1., 1., 1., 1., 0., 0., 0., 0., 0.], dtype=float32)"
	]
	},
	"execution_count": 9,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"test_preds.round()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "09bb6029",
	"metadata": {},
	"source": [
	"👍👍👍"
	]
	},
	{
	"cell_type": "markdown",
	"id": "634a49ff",
	"metadata": {},
	"source": [
	"# Instruct the inference server how to predict"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"id": "5204e7c9",
	"metadata": {},
	"outputs": [],
	"source": [
	"from merlin.systems.dag.ops.fil import PredictForest\n",
	"from merlin.systems.dag.ensemble import Ensemble\n",
	"\n",
	"inference_schema = dataset.schema.without('target') # we don't neeed the target information for inference\n",
	"inference_ops = ['price'] >> PredictForest(model, inference_schema)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 11,
	"id": "3d001f24",
	"metadata": {},
	"outputs": [],
	"source": [
	"ensemble = Ensemble(inference_ops, inference_schema)\n",
	"ensemble.export('inference_recipe');"
	]
	},
	{
	"cell_type": "markdown",
	"id": "766e6dcc",
	"metadata": {},
	"source": [
	"# Start Triton Inference Server"
	]
	},
	{
	"cell_type": "markdown",
	"id": "90382ba5",
	"metadata": {},
	"source": [
	"The Triton Inference Server is an amazing piece of technology. Among my favorite features, it offers dynamic batching with latency guarantees \n",
	"\n",
	"You can read more about Triton [here](https://developer.nvidia.com/blog/fast-and-scalable-ai-model-deployment-with-nvidia-triton-inference-server/)."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 12,
	"id": "e3bfb007",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"<subprocess.Popen at 0x7f33b4cd6040>"
	]
	},
	"execution_count": 12,
	"metadata": {},
	"output_type": "execute_result"
	},
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"I0826 08:17:02.873242 4303 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7f5c14000000' with size 268435456\n",
	"I0826 08:17:02.873567 4303 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864\n",
	"I0826 08:17:02.875904 4303 model_repository_manager.cc:1191] loading: 0_transformworkflow:1\n",
	"I0826 08:17:02.976049 4303 model_repository_manager.cc:1191] loading: 1_fil:1\n",
	"I0826 08:17:02.979709 4303 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 0_transformworkflow (GPU device 0)\n",
	"I0826 08:17:03.076186 4303 model_repository_manager.cc:1191] loading: 0_fil:1\n",
	"I0826 08:17:03.176329 4303 model_repository_manager.cc:1191] loading: 0_predictforest:1\n",
	"I0826 08:17:03.276459 4303 model_repository_manager.cc:1191] loading: 1_predictforest:1\n",
	"I0826 08:17:04.248338 4303 model_repository_manager.cc:1345] successfully loaded '0_transformworkflow' version 1\n",
	"I0826 08:17:04.255588 4303 initialize.hpp:43] TRITONBACKEND_Initialize: fil\n",
	"I0826 08:17:04.255604 4303 backend.hpp:47] Triton TRITONBACKEND API version: 1.10\n",
	"I0826 08:17:04.255611 4303 backend.hpp:52] 'fil' TRITONBACKEND API version: 1.10\n",
	"I0826 08:17:04.255906 4303 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: 1_fil (version 1)\n",
	"I0826 08:17:04.256406 4303 model_initialize.hpp:37] TRITONBACKEND_ModelInitialize: 0_fil (version 1)\n",
	"I0826 08:17:04.258113 4303 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: 1_fil_0 (GPU device 0)\n",
	"I0826 08:17:04.277080 4303 instance_initialize.hpp:46] TRITONBACKEND_ModelInstanceInitialize: 0_fil_0 (GPU device 0)\n",
	"I0826 08:17:04.277199 4303 model_repository_manager.cc:1345] successfully loaded '1_fil' version 1\n",
	"I0826 08:17:04.282969 4303 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 0_predictforest (GPU device 0)\n",
	"I0826 08:17:04.283013 4303 model_repository_manager.cc:1345] successfully loaded '0_fil' version 1\n",
	"I0826 08:17:05.881247 4303 python_be.cc:1774] TRITONBACKEND_ModelInstanceInitialize: 1_predictforest (GPU device 0)\n",
	"I0826 08:17:05.881321 4303 model_repository_manager.cc:1345] successfully loaded '0_predictforest' version 1\n",
	"I0826 08:17:07.435247 4303 model_repository_manager.cc:1345] successfully loaded '1_predictforest' version 1\n",
	"I0826 08:17:07.435399 4303 model_repository_manager.cc:1191] loading: ensemble_model:1\n",
	"I0826 08:17:07.535668 4303 model_repository_manager.cc:1345] successfully loaded 'ensemble_model' version 1\n",
	"I0826 08:17:07.535729 4303 server.cc:556] \n",
	"+------------------+------+\n",
	"\| Repository Agent \| Path \|\n",
	"+------------------+------+\n",
	"+------------------+------+\n",
	"\n",
	"I0826 08:17:07.535771 4303 server.cc:583] \n",
	"+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
	"\| Backend \| Path \| Config \|\n",
	"+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
	"\| python \| /opt/tritonserver/backends/python/libtriton_python.so \| {\"cmdline\":{\"auto-complete-config\":\"false\",\"min-compute-capability\":\"6.000000\",\"backend-directory\":\"/opt/tritonserver/backends\",\"default-max-batch-size\":\"4\"}} \|\n",
	"\| fil \| /opt/tritonserver/backends/fil/libtriton_fil.so \| {\"cmdline\":{\"auto-complete-config\":\"false\",\"min-compute-capability\":\"6.000000\",\"backend-directory\":\"/opt/tritonserver/backends\",\"default-max-batch-size\":\"4\"}} \|\n",
	"+---------+-------------------------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
	"\n",
	"I0826 08:17:07.535806 4303 server.cc:626] \n",
	"+---------------------+---------+--------+\n",
	"\| Model \| Version \| Status \|\n",
	"+---------------------+---------+--------+\n",
	"\| 0_fil \| 1 \| READY \|\n",
	"\| 0_predictforest \| 1 \| READY \|\n",
	"\| 0_transformworkflow \| 1 \| READY \|\n",
	"\| 1_fil \| 1 \| READY \|\n",
	"\| 1_predictforest \| 1 \| READY \|\n",
	"\| ensemble_model \| 1 \| READY \|\n",
	"+---------------------+---------+--------+\n",
	"\n",
	"I0826 08:17:07.560218 4303 metrics.cc:650] Collecting metrics for GPU 0: Quadro RTX 8000\n",
	"I0826 08:17:07.560464 4303 tritonserver.cc:2159] \n",
	"+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
	"\| Option \| Value \|\n",
	"+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
	"\| server_id \| triton \|\n",
	"\| server_version \| 2.23.0 \|\n",
	"\| server_extensions \| classification sequence model_repository model_repository(unload_dependents) schedule_policy model_configuration system_shared_memory cuda_shared_memory binary_tensor_data statistics trace \|\n",
	"\| model_repository_path[0] \| ensemble \|\n",
	"\| model_control_mode \| MODE_NONE \|\n",
	"\| strict_model_config \| 1 \|\n",
	"\| rate_limit \| OFF \|\n",
	"\| pinned_memory_pool_byte_size \| 268435456 \|\n",
	"\| cuda_memory_pool_byte_size{0} \| 67108864 \|\n",
	"\| response_cache_byte_size \| 0 \|\n",
	"\| min_supported_compute_capability \| 6.0 \|\n",
	"\| strict_readiness \| 1 \|\n",
	"\| exit_timeout \| 30 \|\n",
	"+----------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+\n",
	"\n",
	"I0826 08:17:07.561154 4303 grpc_server.cc:4587] Started GRPCInferenceService at 0.0.0.0:8001\n",
	"I0826 08:17:07.561330 4303 http_server.cc:3303] Started HTTPService at 0.0.0.0:8000\n",
	"I0826 08:17:07.602039 4303 http_server.cc:178] Started Metrics Service at 0.0.0.0:8002\n"
	]
	}
	],
	"source": [
	"import subprocess\n",
	"\n",
	"subprocess.Popen(['tritonserver', '--model-repository=inference_recipe'])"
	]
	},
	{
	"cell_type": "markdown",
	"id": "6f52b160",
	"metadata": {},
	"source": [
	"# Predict"
	]
	},
	{
	"cell_type": "markdown",
	"id": "3a108edc",
	"metadata": {},
	"source": [
	"You can talk to the Triton Inference Server how you'd talk to a server -- using HTTP/REST or GRPC.\n",
	"\n",
	"Here we will issue a request to the server from inside our notebook."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 13,
	"id": "0d4e5ce2",
	"metadata": {},
	"outputs": [],
	"source": [
	"from merlin.systems.triton import convert_df_to_triton_input\n",
	"import tritonclient.grpc as grpcclient\n",
	"\n",
	"inputs = convert_df_to_triton_input(inference_schema.column_names, test_dataset.compute())\n",
	"\n",
	"outputs = [\n",
	" grpcclient.InferRequestedOutput(col)\n",
	" for col in inference_ops.output_columns.names\n",
	"]\n",
	"\n",
	"# send request to tritonserver\n",
	"with grpcclient.InferenceServerClient(\"localhost:8001\") as client:\n",
	" response = client.infer(\"ensemble_model\", inputs, outputs=outputs)"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 14,
	"id": "85f822fc",
	"metadata": {},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"array([1., 1., 1., 1., 1., 0., 0., 0., 0., 0.], dtype=float32)"
	]
	},
	"execution_count": 14,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"response.as_numpy(outputs[0].name()).round()"
	]
	},
	{
	"cell_type": "markdown",
	"id": "4e2d4a24",
	"metadata": {},
	"source": [
	"Success! 🚀\n",
	"\n",
	"If you would like to learn more about serving `XGBoost` models, please take a look at [this notebook](https://github.com/NVIDIA-Merlin/systems/blob/main/examples/Serving-An-XGboost-Model-With-Merlin-Systems.ipynb).\n",
	"\n",
	"And here is [an example of serving a DLRM model](https://github.com/NVIDIA-Merlin/systems/blob/main/examples/Serving-Ranking-Models-With-Merlin-Systems.ipynb) using exactly the same approach as we use here."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3 (ipykernel)",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.8.10"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 5
	}