Last active
February 4, 2022 04:17
-
-
Save ssayyah/da6ae6a02e5f1dfc214a89d4a64c4caa to your computer and use it in GitHub Desktop.
Notebook walking through PyCaret CPU vs GPU timing.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"id": "11cbe4ce-49c1-4f7b-a901-0cac6abedf65", | |
"metadata": {}, | |
"source": [ | |
"## PyCaret CPU vs. GPU Benchmarking\n", | |
"-------" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "70b69e43-6437-45ff-b83f-a52b2390ba05", | |
"metadata": {}, | |
"source": [ | |
"## Import Libraries" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "28d49a60-53af-48aa-9053-e9edc357eca0", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"import pycaret\n", | |
"\n", | |
"import pandas as pd\n", | |
"import numpy as np\n", | |
"\n", | |
"import time\n", | |
"\n", | |
"from pycaret.utils import version\n", | |
"version()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "dcaa5c97-2d16-4519-a40c-bfa16fa33d86", | |
"metadata": {}, | |
"source": [ | |
"## Timing" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "b5a5b658-f070-40a3-af91-5cec455012b1", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"import json\n", | |
"import time\n", | |
"\n", | |
"class Timer:\n", | |
" def __enter__(self, *args, **kwargs):\n", | |
" self.tick = time.time()\n", | |
" return self\n", | |
" \n", | |
" def __exit__(self, *args, **kwargs):\n", | |
" self.elapsed = time.time() - self.tick\n", | |
" \n", | |
"benchmark_list = []" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "24f90bf1-87b6-4d83-982d-c95e1fabdfe2", | |
"metadata": {}, | |
"source": [ | |
"## Get Data\n", | |
"\n", | |
"The dataset we used can be found [here](https://archive.ics.uci.edu/ml/datasets/YearPredictionMSD)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "3fe845ef-5279-4600-a1d9-fcc5284ccc70", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"dataset = pd.read_csv('YearPredictionMSd.txt')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "3812c536-c9d6-47a5-a895-864ad7780806", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"#fixing attribute labels\n", | |
"names = ['Year']\n", | |
"for x in range(1,13):\n", | |
" names.append('t_avg_' + str(x)) #these attributes are timbre averages\n", | |
"for x in range(1,79):\n", | |
" names.append('t_cov_' + str(x)) #these attributes are timbre covariances\n", | |
"dataset.columns = names\n", | |
"\n", | |
"dataset.head()" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "b6098079-9bed-4d93-82e2-37ea3ee29d0c", | |
"metadata": {}, | |
"source": [ | |
"Withhold a sample of 600 records from the original dataset to be used for predictions (not to be confused with train/test split)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "416dec2d-73d2-4bd2-8403-138861b3ac65", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"#gpu data\n", | |
"df = dataset[:463716]\n", | |
"unseen_df = dataset[463716:515346]\n", | |
"unseen_df.reset_index(drop=True, inplace=True)\n", | |
"\n", | |
"print('Data for Modeling: ' + str(df.shape))\n", | |
"print('Unseen Data For Predictions: ' + str(unseen_df.shape))" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "160a929a-79d3-4abf-a403-10c5e5b34093", | |
"metadata": {}, | |
"source": [ | |
"### Set up Environment in PyCaret\n", | |
"\n", | |
"To record CPU times, keep `use_gpu=False`, and to record GPU times, set it to `True`. Be sure to update the labels in the timing module at the end of each cell to match what's being recorded." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "874dda98-c59d-4178-b0b2-830c18cf96d7", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"from pycaret.regression import *\n", | |
"exp_reg = setup(data = df, target = 'Year', session_id = 123, normalize = True, use_gpu=False)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "ad48bce8-1e93-4885-b1c3-3de3343806d6", | |
"metadata": {}, | |
"source": [ | |
"## Compare All Models\n", | |
"\n", | |
"Not all models can be run on GPU, so even when `use_gpu=True`, those that cannot be run on GPU will automatically be run on CPU. To compare the times of only those models that can be run on GPU, `exclude = ['ransac', 'huber', 'par', 'ada', 'omp', 'llar']`." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "b058eb81-c64b-4f56-bd0f-525696c69eee", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" best_models = compare_models(exclude = ['ransac'], n_select = 3)\n", | |
" \n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"compare models\"\n", | |
"benchmark_payload[\"model\"] = \"all\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "aa8b0595-e0d1-4fef-b8d2-9914346d7a8d", | |
"metadata": {}, | |
"source": [ | |
"## Create Models\n", | |
"\n", | |
"Here we can time the fitting of an individual model. Linear regression is used for example." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "2bdfb4e9-e4b4-4308-b0e9-a1d4f22cd6ac", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" lr = create_model('lr', fold = 5)\n", | |
"\n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"create model\"\n", | |
"benchmark_payload[\"model\"] = \"lr\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "db932d22-751f-4262-be82-2ebee17ae821", | |
"metadata": {}, | |
"source": [ | |
"## Tune Models\n", | |
"\n", | |
"Here we can time the tuning of a model we've created." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "9e266c5d-000e-43bb-9f02-7651a824fd81", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elasped:\n", | |
" tuned_lr = tune_model(lr)\n", | |
" \n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"tune model\"\n", | |
"benchmark_payload[\"model\"] = \"lr\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "159e99ab-238e-4acb-9d6b-3b6e80a13a6b", | |
"metadata": {}, | |
"source": [ | |
"## Ensemble a Model" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "3d3487bf-6a0f-4b06-b3ef-1a330811146c", | |
"metadata": {}, | |
"source": [ | |
"### Blending" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "479697f8-7491-4867-af25-125fbbf285d7", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" #train individual models to blend\n", | |
" xgboost = create_model('xgboost', verbose = False)\n", | |
" lr = create_model('lr', verbose = False)\n", | |
" knn = create_model('knn', verbose = False)\n", | |
" \n", | |
" #blend individual models\n", | |
" blender = blend_models(estimator_list = [xgboost, lr, knn])\n", | |
" \n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"ensemble - blending\"\n", | |
"benchmark_payload[\"model\"] = \"xgboost, lr, knn\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "a05c5ecc-7d27-4c71-aa20-255f86e222e0", | |
"metadata": {}, | |
"source": [ | |
"### Stacking" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "19aee3ba-1189-4b85-a659-be880aa7a4dc", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" stacker = stack_models(best_models)\n", | |
"\n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"ensemble - stacking\"\n", | |
"benchmark_payload[\"model\"] = \"best_models cpu\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "9eb6ddf0-07da-4cd1-b3f3-454e9b55fc2c", | |
"metadata": {}, | |
"source": [ | |
"## Plot Error" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "f3e0f036-a38f-40b4-84ba-9034ce198a2e", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_model(blender, plot = 'error')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "1aafc057-ae18-4cd9-abd7-de64e4a36f8a", | |
"metadata": { | |
"tags": [] | |
}, | |
"outputs": [], | |
"source": [ | |
"plot_model(stacker, plot = 'error')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "eb7c79c7-b853-417a-93e0-8ccae9aff5ae", | |
"metadata": {}, | |
"source": [ | |
"## Predict on Hold-Out Sample" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "dfd27453-ed2d-40b8-a443-12dc156c7d34", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" predict_model(stacker);\n", | |
" \n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"predict model\"\n", | |
"benchmark_payload[\"model\"] = \"stacker\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "6514502c-f0c0-461a-b2d3-19c93946b6ce", | |
"metadata": {}, | |
"source": [ | |
"## Finalize Model" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "42a46352-b1c6-4460-90d5-834ba80d096a", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" final_stacker = finalize_model(stacker)\n", | |
" \n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"finalize model\"\n", | |
"benchmark_payload[\"model\"] = \"stacker\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "31b6615e-5ee2-45a7-87e0-8b74b501994f", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" predict_model(final_stacker);\n", | |
"\n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"predict model\"\n", | |
"benchmark_payload[\"model\"] = \"final stacker\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e41dbbf2-fe43-486c-9fe3-7e69793edca4", | |
"metadata": {}, | |
"source": [ | |
"## Predict on Unseen Data" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "a2d4e091-d280-4bdc-843c-d89aac772659", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"with Timer() as elapsed:\n", | |
" unseen_predictions = predict_model(final_stacker, data=data_unseen)\n", | |
" unseen_predictions.head()\n", | |
" \n", | |
"benchmark_payload = {}\n", | |
"benchmark_payload[\"function\"] = \"predict on unseen\"\n", | |
"benchmark_payload[\"model\"] = \"final stacker\"\n", | |
"benchmark_payload[\"processor\"] = \"cpu\"\n", | |
"benchmark_payload[\"fit_time\"] = elapsed.elapsed\n", | |
"\n", | |
"benchmark_list.append(benchmark_payload)\n" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"id": "e07a37ad-e5e8-4a2f-bd98-761c2b52b7eb", | |
"metadata": {}, | |
"source": [ | |
"## Write Times to File" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"id": "075af62f-e55b-488c-a675-924474307645", | |
"metadata": {}, | |
"outputs": [], | |
"source": [ | |
"outpath = \"pycaret_benchmarksCPU.json\"\n", | |
"\n", | |
"with open(outpath, \"a\") as fh:\n", | |
" fh.write(json.dumps(benchmark_list))\n", | |
" fh.write(\"\\n\")" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 3", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.10" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 5 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment