Skip to content

Instantly share code, notes, and snippets.

@ssayyah
Last active February 4, 2022 04:17
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save ssayyah/da6ae6a02e5f1dfc214a89d4a64c4caa to your computer and use it in GitHub Desktop.
Save ssayyah/da6ae6a02e5f1dfc214a89d4a64c4caa to your computer and use it in GitHub Desktop.
Notebook walking through PyCaret CPU vs GPU timing.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"id": "11cbe4ce-49c1-4f7b-a901-0cac6abedf65",
"metadata": {},
"source": [
"## PyCaret CPU vs. GPU Benchmarking\n",
"-------"
]
},
{
"cell_type": "markdown",
"id": "70b69e43-6437-45ff-b83f-a52b2390ba05",
"metadata": {},
"source": [
"## Import Libraries"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "28d49a60-53af-48aa-9053-e9edc357eca0",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"import pycaret\n",
"\n",
"import pandas as pd\n",
"import numpy as np\n",
"\n",
"import time\n",
"\n",
"from pycaret.utils import version\n",
"version()"
]
},
{
"cell_type": "markdown",
"id": "dcaa5c97-2d16-4519-a40c-bfa16fa33d86",
"metadata": {},
"source": [
"## Timing"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b5a5b658-f070-40a3-af91-5cec455012b1",
"metadata": {},
"outputs": [],
"source": [
"import json\n",
"import time\n",
"\n",
"class Timer:\n",
" def __enter__(self, *args, **kwargs):\n",
" self.tick = time.time()\n",
" return self\n",
" \n",
" def __exit__(self, *args, **kwargs):\n",
" self.elapsed = time.time() - self.tick\n",
" \n",
"benchmark_list = []"
]
},
{
"cell_type": "markdown",
"id": "24f90bf1-87b6-4d83-982d-c95e1fabdfe2",
"metadata": {},
"source": [
"## Get Data\n",
"\n",
"The dataset we used can be found [here](https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3fe845ef-5279-4600-a1d9-fcc5284ccc70",
"metadata": {},
"outputs": [],
"source": [
"dataset = pd.read_csv('YearPredictionMSd.txt')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "3812c536-c9d6-47a5-a895-864ad7780806",
"metadata": {},
"outputs": [],
"source": [
"#fixing attribute labels\n",
"names = ['Year']\n",
"for x in range(1,13):\n",
" names.append('t_avg_' + str(x)) #these attributes are timbre averages\n",
"for x in range(1,79):\n",
" names.append('t_cov_' + str(x)) #these attributes are timbre covariances\n",
"dataset.columns = names\n",
"\n",
"dataset.head()"
]
},
{
"cell_type": "markdown",
"id": "b6098079-9bed-4d93-82e2-37ea3ee29d0c",
"metadata": {},
"source": [
"Withhold a sample of 600 records from the original dataset to be used for predictions (not to be confused with train/test split)."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "416dec2d-73d2-4bd2-8403-138861b3ac65",
"metadata": {},
"outputs": [],
"source": [
"#gpu data\n",
"df = dataset[:463716]\n",
"unseen_df = dataset[463716:515346]\n",
"unseen_df.reset_index(drop=True, inplace=True)\n",
"\n",
"print('Data for Modeling: ' + str(df.shape))\n",
"print('Unseen Data For Predictions: ' + str(unseen_df.shape))"
]
},
{
"cell_type": "markdown",
"id": "160a929a-79d3-4abf-a403-10c5e5b34093",
"metadata": {},
"source": [
"### Set up Environment in PyCaret\n",
"\n",
"To record CPU times, keep `use_gpu=False`, and to record GPU times, set it to `True`. Be sure to update the labels in the timing module at the end of each cell to match what's being recorded."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "874dda98-c59d-4178-b0b2-830c18cf96d7",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"from pycaret.regression import *\n",
"exp_reg = setup(data = df, target = 'Year', session_id = 123, normalize = True, use_gpu=False)"
]
},
{
"cell_type": "markdown",
"id": "ad48bce8-1e93-4885-b1c3-3de3343806d6",
"metadata": {},
"source": [
"## Compare All Models\n",
"\n",
"Not all models can be run on GPU, so even when `use_gpu=True`, those that cannot be run on GPU will automatically be run on CPU. To compare the times of only those models that can be run on GPU, `exclude = ['ransac', 'huber', 'par', 'ada', 'omp', 'llar']`."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b058eb81-c64b-4f56-bd0f-525696c69eee",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" best_models = compare_models(exclude = ['ransac'], n_select = 3)\n",
" \n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"compare models\"\n",
"benchmark_payload[\"model\"] = \"all\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)"
]
},
{
"cell_type": "markdown",
"id": "aa8b0595-e0d1-4fef-b8d2-9914346d7a8d",
"metadata": {},
"source": [
"## Create Models\n",
"\n",
"Here we can time the fitting of an individual model. Linear regression is used for example."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "2bdfb4e9-e4b4-4308-b0e9-a1d4f22cd6ac",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" lr = create_model('lr', fold = 5)\n",
"\n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"create model\"\n",
"benchmark_payload[\"model\"] = \"lr\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "markdown",
"id": "db932d22-751f-4262-be82-2ebee17ae821",
"metadata": {},
"source": [
"## Tune Models\n",
"\n",
"Here we can time the tuning of a model we've created."
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9e266c5d-000e-43bb-9f02-7651a824fd81",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"with Timer() as elasped:\n",
" tuned_lr = tune_model(lr)\n",
" \n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"tune model\"\n",
"benchmark_payload[\"model\"] = \"lr\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "markdown",
"id": "159e99ab-238e-4acb-9d6b-3b6e80a13a6b",
"metadata": {},
"source": [
"## Ensemble a Model"
]
},
{
"cell_type": "markdown",
"id": "3d3487bf-6a0f-4b06-b3ef-1a330811146c",
"metadata": {},
"source": [
"### Blending"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "479697f8-7491-4867-af25-125fbbf285d7",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" #train individual models to blend\n",
" xgboost = create_model('xgboost', verbose = False)\n",
" lr = create_model('lr', verbose = False)\n",
" knn = create_model('knn', verbose = False)\n",
" \n",
" #blend individual models\n",
" blender = blend_models(estimator_list = [xgboost, lr, knn])\n",
" \n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"ensemble - blending\"\n",
"benchmark_payload[\"model\"] = \"xgboost, lr, knn\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "markdown",
"id": "a05c5ecc-7d27-4c71-aa20-255f86e222e0",
"metadata": {},
"source": [
"### Stacking"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "19aee3ba-1189-4b85-a659-be880aa7a4dc",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" stacker = stack_models(best_models)\n",
"\n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"ensemble - stacking\"\n",
"benchmark_payload[\"model\"] = \"best_models cpu\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)"
]
},
{
"cell_type": "markdown",
"id": "9eb6ddf0-07da-4cd1-b3f3-454e9b55fc2c",
"metadata": {},
"source": [
"## Plot Error"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f3e0f036-a38f-40b4-84ba-9034ce198a2e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"plot_model(blender, plot = 'error')"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "1aafc057-ae18-4cd9-abd7-de64e4a36f8a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"plot_model(stacker, plot = 'error')"
]
},
{
"cell_type": "markdown",
"id": "eb7c79c7-b853-417a-93e0-8ccae9aff5ae",
"metadata": {},
"source": [
"## Predict on Hold-Out Sample"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "dfd27453-ed2d-40b8-a443-12dc156c7d34",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" predict_model(stacker);\n",
" \n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"predict model\"\n",
"benchmark_payload[\"model\"] = \"stacker\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "markdown",
"id": "6514502c-f0c0-461a-b2d3-19c93946b6ce",
"metadata": {},
"source": [
"## Finalize Model"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "42a46352-b1c6-4460-90d5-834ba80d096a",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" final_stacker = finalize_model(stacker)\n",
" \n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"finalize model\"\n",
"benchmark_payload[\"model\"] = \"stacker\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "31b6615e-5ee2-45a7-87e0-8b74b501994f",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" predict_model(final_stacker);\n",
"\n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"predict model\"\n",
"benchmark_payload[\"model\"] = \"final stacker\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "markdown",
"id": "e41dbbf2-fe43-486c-9fe3-7e69793edca4",
"metadata": {},
"source": [
"## Predict on Unseen Data"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a2d4e091-d280-4bdc-843c-d89aac772659",
"metadata": {},
"outputs": [],
"source": [
"with Timer() as elapsed:\n",
" unseen_predictions = predict_model(final_stacker, data=data_unseen)\n",
" unseen_predictions.head()\n",
" \n",
"benchmark_payload = {}\n",
"benchmark_payload[\"function\"] = \"predict on unseen\"\n",
"benchmark_payload[\"model\"] = \"final stacker\"\n",
"benchmark_payload[\"processor\"] = \"cpu\"\n",
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n",
"\n",
"benchmark_list.append(benchmark_payload)\n"
]
},
{
"cell_type": "markdown",
"id": "e07a37ad-e5e8-4a2f-bd98-761c2b52b7eb",
"metadata": {},
"source": [
"## Write Times to File"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "075af62f-e55b-488c-a675-924474307645",
"metadata": {},
"outputs": [],
"source": [
"outpath = \"pycaret_benchmarksCPU.json\"\n",
"\n",
"with open(outpath, \"a\") as fh:\n",
" fh.write(json.dumps(benchmark_list))\n",
" fh.write(\"\\n\")"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.10"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment