lukemerrick/tutorial_01_uploading.ipynb Secret

## tutorial_01_uploading.ipynb
{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import io\n",
    "import os\n",
    "import pathlib\n",
    "import zipfile\n",
    "\n",
    "import category_encoders\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "import requests\n",
    "import sklearn.metrics\n",
    "import sklearn.model_selection\n",
    "import sklearn.neighbors\n",
    "import sklearn.pipeline\n",
    "import sklearn.preprocessing"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Intro\n",
    "Welcome to the Fiddler notebook experience! This notebook will demonstrate how to effectively get started with the Fiddler platform by uploading your models and data. The notebook is organized into two sections:\n",
    "1. Loading data and building a scikit-learn model\n",
    "2. Uploading your data and model to the Fiddler platform\n",
    "\n",
    "Section 1 does not use any Fiddler code, so if you are familiar with Pandas and Scikit-Learn, you should feel comfortable skimming through and jumping into section 2."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 1: Loading data and building a model\n",
    "\n",
    "### Working with Data\n",
    "Being an effective data scientist involves using the right tool for the job. When it comes to importing, cleaning, and and exploring your data in Jupyter, we don't want to interrupt your normal workflow, so we integrate our tools with the popular Pandas DataFrame object. Thus as long as your data can be dumped into a DataFrame object, there is nothing else you need to do to get it ready to upload to Fiddler."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Downloading the UCI bikeshare dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>instant</th>\n",
       "      <th>3440</th>\n",
       "      <th>6543</th>\n",
       "      <th>15471</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>dteday</th>\n",
       "      <td>2011-05-28 00:00:00</td>\n",
       "      <td>2011-10-05 00:00:00</td>\n",
       "      <td>2012-10-11 00:00:00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>season</th>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>yr</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mnth</th>\n",
       "      <td>5</td>\n",
       "      <td>10</td>\n",
       "      <td>10</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>hr</th>\n",
       "      <td>5</td>\n",
       "      <td>4</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>holiday</th>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>weekday</th>\n",
       "      <td>6</td>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>workingday</th>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>weathersit</th>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>temp</th>\n",
       "      <td>0.56</td>\n",
       "      <td>0.44</td>\n",
       "      <td>0.44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>atemp</th>\n",
       "      <td>0.5303</td>\n",
       "      <td>0.4394</td>\n",
       "      <td>0.4394</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>hum</th>\n",
       "      <td>0.88</td>\n",
       "      <td>0.88</td>\n",
       "      <td>0.51</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>windspeed</th>\n",
       "      <td>0.2239</td>\n",
       "      <td>0</td>\n",
       "      <td>0.1343</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>casual</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>81</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>registered</th>\n",
       "      <td>3</td>\n",
       "      <td>4</td>\n",
       "      <td>662</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>cnt</th>\n",
       "      <td>7</td>\n",
       "      <td>5</td>\n",
       "      <td>743</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "instant                   3440                 6543                 15471\n",
       "dteday      2011-05-28 00:00:00  2011-10-05 00:00:00  2012-10-11 00:00:00\n",
       "season                        2                    4                    4\n",
       "yr                            0                    0                    1\n",
       "mnth                          5                   10                   10\n",
       "hr                            5                    4                   19\n",
       "holiday                   False                False                False\n",
       "weekday                       6                    3                    4\n",
       "workingday                False                 True                 True\n",
       "weathersit                    1                    1                    1\n",
       "temp                       0.56                 0.44                 0.44\n",
       "atemp                    0.5303               0.4394               0.4394\n",
       "hum                        0.88                 0.88                 0.51\n",
       "windspeed                0.2239                    0               0.1343\n",
       "casual                        4                    1                   81\n",
       "registered                    3                    4                  662\n",
       "cnt                           7                    5                  743"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Train set (bikeshare rentals in 2011) has 8645 rows, test set (bikeshare rentals in 2012) has 8734 rows\n"
     ]
    }
   ],
   "source": [
    "zip_url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip'\n",
    "z = zipfile.ZipFile(io.BytesIO(requests.get(zip_url).content))\n",
    "\n",
    "# here we pre-configure the datatypes for our dataframe\n",
    "# so it doesn't require any datatype modification after import\n",
    "bikeshare_dtypes = dict(season='category', holiday='bool',\n",
    "                        workingday='bool', weathersit='category')\n",
    "bikeshare_datetime_columns = ['dteday']\n",
    "bikeshare_index_column = 'instant'\n",
    "with z.open('hour.csv') as csv:\n",
    "    df = pd.read_csv(csv, \n",
    "                     dtype=bikeshare_dtypes, \n",
    "                     parse_dates=bikeshare_datetime_columns,\n",
    "                     index_col=bikeshare_index_column)\n",
    "\n",
    "# split train/test by year\n",
    "is_2011 = df['yr'] == 0\n",
    "df_2011 = df[is_2011].reset_index(drop=True)\n",
    "df_2012 = df[~is_2011].reset_index(drop=True)\n",
    "\n",
    "# peek at the data\n",
    "display(df.sample(3, random_state=0).T)\n",
    "\n",
    "# print info about train-test split\n",
    "print(f'Train set (bikeshare rentals in 2011) has {df_2011.shape[0]} rows,'\n",
    "      f' test set (bikeshare rentals in 2012) has {df_2012.shape[0]} rows')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Building a model\n",
    "Just like with data work, we believe in the right tools for the job. We currently integrate tightly models supporing the `sklearn` API, including non-`sklearn` packages that support the `sklearn` API, like `xgboost` and `LightGBM`. Since encoding categorical variables can be a pain in `sklearn`, we also support the `category_encoders` package. Please note that if you introduce any custom classes or transformation functions into your modeling, it may become difficult to get your models running in Fiddler. We therefore recommend using the `Transformer` objects provided by `sklearn` (and the `category_encoders` package) and combining preprocessing and inference steps using the `sklearn` `Pipeline` API."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "# specify which columns are features and which are not\n",
    "target = 'cnt'\n",
    "not_used_as_features = ['dteday', 'yr', 'casual', 'registered']\n",
    "non_feature_columns = [target] + not_used_as_features\n",
    "feature_columns = list(set(df_2011.columns) - set(non_feature_columns))\n",
    "\n",
    "# split our data into features and targets\n",
    "x_train = df_2011.drop(columns=non_feature_columns)\n",
    "x_test = df_2012.drop(columns=non_feature_columns)\n",
    "y_train = df_2011[target]\n",
    "y_test = df_2012[target]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "# modeling approach: \n",
    "# 1) onehot encode categorical variables\n",
    "# 2) standard scale all variables\n",
    "# 3) fit a k-Nearest-Neighbors model with k=10 and l1 distance as the distance metric\n",
    "onehot = category_encoders.OneHotEncoder(cols=df.select_dtypes('category').columns.tolist())\n",
    "standard_scaler = sklearn.preprocessing.StandardScaler()\n",
    "knn = sklearn.neighbors.KNeighborsRegressor(\n",
    "    n_neighbors=10, \n",
    "    weights='distance', metric='l1',\n",
    "    n_jobs=-1)\n",
    "model = sklearn.pipeline.make_pipeline(onehot, standard_scaler, knn)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "r2 scores: 1.00 Train | 0.38 Test\n"
     ]
    }
   ],
   "source": [
    "# fit the model\n",
    "model.fit(x_train, y_train)\n",
    "\n",
    "# score the model\n",
    "train_r2 = sklearn.metrics.r2_score(y_train, model.predict(x_train))\n",
    "test_r2 = sklearn.metrics.r2_score(y_test, model.predict(x_test))\n",
    "print(f'r2 scores: {train_r2:.2f} Train | {test_r2:.2f} Test')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Section 2: Uploading to Fiddler\n",
    "Up until now, we haven't done anything Fiddler-specific. Now we'll go ahead and change that. Let's begin by importing the Fiddler package."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "import fiddler as fdl"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Before you start: set up your API connection\n",
    "\n",
    "### Launch onebox or authenticate with a remote server\n",
    "Before you can start working with a Fiddler-integrated Jupyter environment, you should set up access to a running instance of Fiddler.\n",
    "\n",
    "#### Onebox\n",
    "In onebox, this means running the `start.sh` script to launch onebox locally.\n",
    "\n",
    "#### Cloud\n",
    "For the cloud version of our product, this means looking up your authentication token in the [Fiddler settings dashboard](https://app.fiddler.ai/settings/credentials)\n",
    "\n",
    "### Create a FiddlerApi object\n",
    "\n",
    "In order to get your data and models into the Fiddler Engine, you'll need to connect using the API. The `FiddlerApi` object to handles most of the nitty-gritty for you, so all you have to do is specify some details about the Fiddler system you're connecting to."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [],
   "source": [
    "# NOTE: typically the API url for your running instance of Fiddler will be \"https://api.fiddler.ai\" (or \"http://localhost:4100\" for onebox)\n",
    "# however, use \"http://host.docker.internal:4100\" as our URL if Jupyter is running in a docker VM on the same macOS machine as onebox\n",
    "url = 'http://host.docker.internal:4100'\n",
    "\n",
    "# see <Fiddler URL>/settings/credentials to find, create, or change this token\n",
    "token = os.getenv('FIDDLER_API_TOKEN')\n",
    "\n",
    "# see <Fiddler URL>/settings/general to find this id (listed as \"Organization Name\")\n",
    "org_id = 'onebox'\n",
    "\n",
    "fiddler_api = fdl.FiddlerApi(url=url, org_id=org_id, auth_token=token)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Dataset Upload\n",
    "Now that we have our dataset in working order, let's upload it to the Fiddler platform.  As mentioned above, our `Dataset` class directly integrates with Pandas to make this a snap. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['imdb_rnn', 'iris', 'bank_churn', '20news', 'p2p_loans', 'winequality']"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fiddler_api.list_datasets()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Heads up! We are inferring the details of your dataset from the dataframe(s) provided. Please take a second to check our work.\n",
      "\n",
      "If the following DatasetInfo is an incorrect representation of your data, you can construct a DatasetInfo with the DatasetInfo.from_dataframe() method and modify that object to reflect the correct details of your dataset.\n",
      "\n",
      "After constructing a corrected DatasetInfo, please re-upload your dataset with that DatasetInfo object explicitly passed via the `info` parameter of FiddlerApi.upload_dataset().\n",
      "\n",
      "You may need to delete the initially uploaded versionvia FiddlerApi.delete_dataset('bikeshare').\n",
      "\n",
      "Inferred DatasetInfo to check:\n",
      "  DatasetInfo:\n",
      "    display_name: bikeshare\n",
      "    files: []\n",
      "    columns:\n",
      "              column     dtype count(possible_values)\n",
      "      0       dteday    STRING                      -\n",
      "      1       season  CATEGORY                      4\n",
      "      2           yr   INTEGER                      -\n",
      "      3         mnth   INTEGER                      -\n",
      "      4           hr   INTEGER                      -\n",
      "      5      holiday   BOOLEAN                      -\n",
      "      6      weekday   INTEGER                      -\n",
      "      7   workingday   BOOLEAN                      -\n",
      "      8   weathersit  CATEGORY                      4\n",
      "      9         temp     FLOAT                      -\n",
      "      10       atemp     FLOAT                      -\n",
      "      11         hum     FLOAT                      -\n",
      "      12   windspeed     FLOAT                      -\n",
      "      13      casual   INTEGER                      -\n",
      "      14  registered   INTEGER                      -\n",
      "      15         cnt   INTEGER                      -\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'row_count': 17379,\n",
       " 'col_count': 16,\n",
       " 'log': ['Importing dataset bikeshare',\n",
       "  'Found old data. Deleting it',\n",
       "  'Creating table for bikeshare',\n",
       "  'Importing data file: test.csv',\n",
       "  'Importing data file: train.csv']}"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# now that we have a Dataset, we just need to pass\n",
    "# it to the FiddlerApi to perform an upload\n",
    "upload_result = fiddler_api.upload_dataset(\n",
    "    dataset={'train': df_2011, 'test': df_2012}, \n",
    "    dataset_id='bikeshare')\n",
    "upload_result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "'Dataset deleted bikeshare'"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fiddler_api.delete_dataset('bikeshare')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['imdb_rnn', 'iris', 'bank_churn', '20news', 'p2p_loans', 'winequality']"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we see that the 'bikeshare' dataset now shows up in the list of all datasets\n",
    "fiddler_api.list_datasets()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "We customized the DatasetInfo for this dataset with a custom display_name and more `weathersit` possible-values.\n",
      "DatasetInfo:\n",
      "  display_name: Bikeshare Dataset\n",
      "  files: []\n",
      "  columns:\n",
      "            column     dtype count(possible_values)\n",
      "    0       dteday    STRING                      -\n",
      "    1       season  CATEGORY                      4\n",
      "    2           yr   INTEGER                      -\n",
      "    3         mnth   INTEGER                      -\n",
      "    4           hr   INTEGER                      -\n",
      "    5      holiday   BOOLEAN                      -\n",
      "    6      weekday   INTEGER                      -\n",
      "    7   workingday   BOOLEAN                      -\n",
      "    8   weathersit  CATEGORY                      7\n",
      "    9         temp     FLOAT                      -\n",
      "    10       atemp     FLOAT                      -\n",
      "    11         hum     FLOAT                      -\n",
      "    12   windspeed     FLOAT                      -\n",
      "    13      casual   INTEGER                      -\n",
      "    14  registered   INTEGER                      -\n",
      "    15         cnt   INTEGER                      -\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'row_count': 17379,\n",
       " 'col_count': 16,\n",
       " 'log': ['Importing dataset bikeshare',\n",
       "  'Creating table for bikeshare',\n",
       "  'Importing data file: test.csv',\n",
       "  'Importing data file: train.csv']}"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Upload example with custom DatasetInfo\n",
    "bikeshare_info = fdl.DatasetInfo.from_dataframe(df_2011, display_name='Bikeshare Dataset')\n",
    "bikeshare_info['weathersit'].possible_values.extend([123, 456, 789])\n",
    "print('We customized the DatasetInfo for this dataset '\n",
    "      'with a custom display_name and more `weathersit` possible-values.')\n",
    "print(bikeshare_info)\n",
    "\n",
    "# upload\n",
    "upload_result = fiddler_api.upload_dataset(\n",
    "    dataset={'train': df_2011, 'test': df_2012},\n",
    "    dataset_id='bikeshare',\n",
    "    info=bikeshare_info\n",
    ")\n",
    "upload_result"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['imdb_rnn',\n",
       " 'iris',\n",
       " 'bank_churn',\n",
       " '20news',\n",
       " 'p2p_loans',\n",
       " 'winequality',\n",
       " 'bikeshare']"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we see that the 'bikeshare' dataset now shows up in the list of all datasets\n",
    "fiddler_api.list_datasets()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Accessing the data on Fiddler\n",
    "We can also verify everything worked by looking at the web UI:\n",
    "- http://localhost:4100/datasets\n",
    "\n",
    "(or if you used cloud instead of onebox)\n",
    "- https://app.fiddler.ai/datasets\n",
    "\n",
    "### Model Upload\n",
    "We currently support the upload of scikit-learn models directly through the `fiddler` package. While custom code is tricky to deploy to Fiddler, we support a number of additional packages beyond `sklearn` that enable the deployment of powerful black-box models. These include:\n",
    "1. `xgboost` (as long as the scikit-learn API is used)\n",
    "2. `lightgbm` (as long as the scikit-learn API is used)\n",
    "3. `category_encoders`\n",
    "\n",
    "For best explainability results, we recommend organizing your modeling pipeline using the scikit-learn `Pipeline` API so that your feature transformations are integrated with your model. This is because pre-transforming your data can have a negative effect on explanation interpretability."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "{'project_name': 'bikeshare_forecasting'}"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# To organize our models, let's first create a project on Fiddler.\n",
    "fiddler_api.create_project('bikeshare_forecasting')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "['imdb_rnn',\n",
       " 'bank_churn',\n",
       " 'newsgroup_text_topics',\n",
       " 'lending',\n",
       " 'bikeshare_forecasting',\n",
       " 'iris_classification',\n",
       " 'wine_quality']"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# we see that the 'bikeshare_forecasting' project now shows up in the list of all datasets\n",
    "fiddler_api.list_projects()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### ModelInfo\n",
    "For Fiddler to properly run and explain your model, you need to provide some information about model inputs and outputs that is not captured by the `sklearn` object itself. Luckily the `Dataset` we created above has a `DatasetInfo` component that can help us infer the `ModelInfo` of models trained on that dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "ModelInfo:\n",
       "  display_name: Bikeshare kNN\n",
       "  description: A kNN model trained for predict the `cnt` feature of the bikeshare dataset.\n",
       "  input_type: ModelInputType.TABULAR\n",
       "  model_task: ModelTask.REGRESSION\n",
       "  inputs and outputs:\n",
       "               column column_type     dtype count(possible_values)\n",
       "    0          season       input  CATEGORY                      4\n",
       "    1            mnth       input   INTEGER                      -\n",
       "    2              hr       input   INTEGER                      -\n",
       "    3         holiday       input   BOOLEAN                      -\n",
       "    4         weekday       input   INTEGER                      -\n",
       "    5      workingday       input   BOOLEAN                      -\n",
       "    6      weathersit       input  CATEGORY                      7\n",
       "    7            temp       input     FLOAT                      -\n",
       "    8           atemp       input     FLOAT                      -\n",
       "    9             hum       input     FLOAT                      -\n",
       "    10      windspeed       input     FLOAT                      -\n",
       "    11  predicted_cnt      output     FLOAT                      -\n",
       "  misc:\n",
       "    {}"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model_info = fdl.ModelInfo.from_dataset_info(\n",
    "    dataset_info=fiddler_api.get_dataset_info('bikeshare'),\n",
    "    target=target, \n",
    "    features=feature_columns,\n",
    "    display_name='Bikeshare kNN',\n",
    "    description='A kNN model trained for predict the `cnt` feature of the bikeshare dataset.'\n",
    ")\n",
    "model_info"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "You are uploading a scikit-learn model using the Fiddler API.\n",
      "If this model uses any custom (non-sklearn) code, it will not run properly on the Fiddler Engine.\n",
      "The Fiddler engine may not be able to detect this in advance.\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "{'model': {'display name': 'Bikeshare kNN',\n",
       "  'input-type': 'structured',\n",
       "  'model-task': 'regression',\n",
       "  'inputs': [{'column-name': 'season',\n",
       "    'data-type': 'category',\n",
       "    'possible-values': ['1', '2', '3', '4']},\n",
       "   {'column-name': 'mnth', 'data-type': 'int'},\n",
       "   {'column-name': 'hr', 'data-type': 'int'},\n",
       "   {'column-name': 'holiday', 'data-type': 'bool'},\n",
       "   {'column-name': 'weekday', 'data-type': 'int'},\n",
       "   {'column-name': 'workingday', 'data-type': 'bool'},\n",
       "   {'column-name': 'weathersit',\n",
       "    'data-type': 'category',\n",
       "    'possible-values': ['1', '2', '3', '4', '123', '456', '789']},\n",
       "   {'column-name': 'temp', 'data-type': 'float'},\n",
       "   {'column-name': 'atemp', 'data-type': 'float'},\n",
       "   {'column-name': 'hum', 'data-type': 'float'},\n",
       "   {'column-name': 'windspeed', 'data-type': 'float'}],\n",
       "  'outputs': [{'column-name': 'predicted_cnt', 'data-type': 'float'}],\n",
       "  'description': 'A kNN model trained for predict the `cnt` feature of the bikeshare dataset.',\n",
       "  'datasets': ['bikeshare']}}"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "fiddler_api.upload_model_sklearn(\n",
    "    model=model,\n",
    "    info=model_info,\n",
    "    project_id='bikeshare_forecasting',\n",
    "    model_id='knn_model',\n",
    "    associated_dataset_ids=['bikeshare'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We can now look at explanations!\n",
    "- http://localhost:4100/projects/bikeshare_forecasting/explain\n",
    "\n",
    "(or if you used cloud instead of onebox)\n",
    "- https://app.fiddler.ai/projects/bikeshare_forecasting/explain"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}