Skip to content

Instantly share code, notes, and snippets.

@serithemage
Last active January 26, 2023 00:55
Show Gist options
  • Star 21 You must be signed in to star a gist
  • Fork 5 You must be signed in to fork a gist
  • Save serithemage/75fcb7cf439ba503a3b1d8911d1404a9 to your computer and use it in GitHub Desktop.
Save serithemage/75fcb7cf439ba503a3b1d8911d1404a9 to your computer and use it in GitHub Desktop.
PyCaret튜토리얼-회귀.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "PyCaret튜토리얼-회귀.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyP9iNHWw5WgIABeGMyL8ibG",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
},
"widgets": {
"application/vnd.jupyter.widget-state+json": {
"9f7b6a46a9684e49ac3c91efb60a0222": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_fdb4d90a52b141a6837c83eed07141f3",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 3,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 3,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_2fdb30e0dd4948fd83a51fe1339b6628"
}
},
"fdb4d90a52b141a6837c83eed07141f3": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"2fdb30e0dd4948fd83a51fe1339b6628": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"a0dad4a9859d450da319e2bfd8fcf133": {
"model_module": "@jupyter-widgets/controls",
"model_name": "TextModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "TextView",
"style": "IPY_MODEL_f83d4e068eaa4663a917556fcbf0839c",
"_dom_classes": [],
"description": "",
"_model_name": "TextModel",
"placeholder": "​",
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": "Following data types have been inferred automatically, if they are correct press enter to continue or type 'quit' otherwise.",
"_view_count": null,
"disabled": false,
"_view_module_version": "1.5.0",
"continuous_update": true,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_52398e08eaa845e3b941a3963d4e5926"
}
},
"f83d4e068eaa4663a917556fcbf0839c": {
"model_module": "@jupyter-widgets/controls",
"model_name": "DescriptionStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "DescriptionStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"_model_module": "@jupyter-widgets/controls"
}
},
"52398e08eaa845e3b941a3963d4e5926": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": "100%",
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"73596d1a1a904456b11da3fe4571731e": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_8181e471c2894b40a2aaf52e61a574b0",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 94,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 94,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_400d536c059a43279ddcb9f4bcb434d0"
}
},
"8181e471c2894b40a2aaf52e61a574b0": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"400d536c059a43279ddcb9f4bcb434d0": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"c96575810e9645879a74caf082ec51b4": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_01a6b7ac8c0446a58ad83864a77acf9a",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 4,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 4,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_0febddd05ca14cef901a0bd6548d3f0e"
}
},
"01a6b7ac8c0446a58ad83864a77acf9a": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"0febddd05ca14cef901a0bd6548d3f0e": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"7d6a997e72454d6e9010793540ba9ead": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_bed1185cf2694996ba4f382336054777",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 5,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 5,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_e57e4eb9160d4879be00834a1baca394"
}
},
"bed1185cf2694996ba4f382336054777": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"e57e4eb9160d4879be00834a1baca394": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"704fa978c6c04b49bc561e547da2b637": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_406a205a1c3f4e2e973777726d72da0f",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 5,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 5,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_cd2d0dd380854d108f3cca927fe05b49"
}
},
"406a205a1c3f4e2e973777726d72da0f": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"cd2d0dd380854d108f3cca927fe05b49": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"a0267b8ea717436ebc4bdf02760f8ab1": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_3e6d4b76c994492f9167a1b4ea0416f1",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 7,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 7,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_94e5d288a37f45309d7b12e820d7bb87"
}
},
"3e6d4b76c994492f9167a1b4ea0416f1": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"94e5d288a37f45309d7b12e820d7bb87": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
},
"9f84f9037515423383eaba950fd5dcfc": {
"model_module": "@jupyter-widgets/controls",
"model_name": "IntProgressModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "ProgressView",
"style": "IPY_MODEL_80611e8365a147dd92e4c2050fb9f86b",
"_dom_classes": [],
"description": "Processing: ",
"_model_name": "IntProgressModel",
"bar_style": "",
"max": 5,
"_view_module": "@jupyter-widgets/controls",
"_model_module_version": "1.5.0",
"value": 5,
"_view_count": null,
"_view_module_version": "1.5.0",
"orientation": "horizontal",
"min": 0,
"description_tooltip": null,
"_model_module": "@jupyter-widgets/controls",
"layout": "IPY_MODEL_7b21858d2e4947fc838f122705b19ae1"
}
},
"80611e8365a147dd92e4c2050fb9f86b": {
"model_module": "@jupyter-widgets/controls",
"model_name": "ProgressStyleModel",
"model_module_version": "1.5.0",
"state": {
"_view_name": "StyleView",
"_model_name": "ProgressStyleModel",
"description_width": "",
"_view_module": "@jupyter-widgets/base",
"_model_module_version": "1.5.0",
"_view_count": null,
"_view_module_version": "1.2.0",
"bar_color": null,
"_model_module": "@jupyter-widgets/controls"
}
},
"7b21858d2e4947fc838f122705b19ae1": {
"model_module": "@jupyter-widgets/base",
"model_name": "LayoutModel",
"model_module_version": "1.2.0",
"state": {
"_view_name": "LayoutView",
"grid_template_rows": null,
"right": null,
"justify_content": null,
"_view_module": "@jupyter-widgets/base",
"overflow": null,
"_model_module_version": "1.2.0",
"_view_count": null,
"flex_flow": null,
"width": null,
"min_width": null,
"border": null,
"align_items": null,
"bottom": null,
"_model_module": "@jupyter-widgets/base",
"top": null,
"grid_column": null,
"overflow_y": null,
"overflow_x": null,
"grid_auto_flow": null,
"grid_area": null,
"grid_template_columns": null,
"flex": null,
"_model_name": "LayoutModel",
"justify_items": null,
"grid_row": null,
"max_height": null,
"align_content": null,
"visibility": null,
"align_self": null,
"height": null,
"min_height": null,
"padding": null,
"grid_auto_rows": null,
"grid_gap": null,
"max_width": null,
"order": null,
"_view_module_version": "1.2.0",
"grid_template_areas": null,
"object_position": null,
"object_fit": null,
"grid_auto_columns": null,
"margin": null,
"display": null,
"left": null
}
}
}
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/serithemage/75fcb7cf439ba503a3b1d8911d1404a9/pycaret.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"source": [
"# PyCaret 튜토리얼\n",
"\n",
"## PyCaret이란?\n",
"PyCaret은 본질적으로 scikit-learn , XGBoost , Microsoft LightGBM , spaCy 등과 같은 여러 기계 학습 라이브러리 및 프레임워크를 좀 더 간편하게 사용할 수 있게 해 주는 Python래퍼 입니다. \n",
"\n",
"PyCaret은 다음의 분들에게 적합합니다.\n",
"- 생산성을 높이고자 하는 숙련된 데이터 과학자\n",
"- 적은양의 코딩으로 가능한 기계학습 솔루션을 선호하는 시민 데이터 과학자(Citizen Data Scientists)\n",
"- 데이터 과학을 배우려는 학생\n",
"- 개념 증명 프로젝트 구축에 관련된 데이터 과학자 및 컨설턴트\n",
"\n",
"![](https://i2.wp.com/pycaret.org/wp-content/uploads/2020/07/pycaret2-features.png?resize=1033%2C613&ssl=1)"
],
"metadata": {
"id": "CJeNY6nyrhng"
}
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "mKn8hdKiyTjV"
},
"outputs": [],
"source": [
"!pip install pycaret -q"
]
},
{
"cell_type": "code",
"source": [
"from pycaret.regression import *\n",
"from pycaret.datasets import get_data"
],
"metadata": {
"id": "hHpJ82b1yWHu"
},
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"source": [
"PyCaret는 기계학습 공부에 사용할 수 있는 다양한 [데이터셋](https://pycaret.org/get-data/)을 제공합니다.\n",
"\n",
"여기서는 다이아몬드의 가격을 예측하는 데이터셋을 사용해 보겠습니다. 다이아몬드는 무게, 색, 가공등에 따라 감정되고 이에 대해 가격이 책정됩니다. \n",
"다이아몬드 데이터셋의 상세는 [여기](https://www.kaggle.com/shivam2503/diamonds)에서 살펴볼 수 있습니다."
],
"metadata": {
"id": "BZQ82Q7n3h0c"
}
},
{
"cell_type": "code",
"source": [
"dataset = get_data('diamond')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "llIQ5DrFyyTD",
"outputId": "9e06671d-5af8-4d22-ca49-8bf465e958ec"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Carat Weight</th>\n",
" <th>Cut</th>\n",
" <th>Color</th>\n",
" <th>Clarity</th>\n",
" <th>Polish</th>\n",
" <th>Symmetry</th>\n",
" <th>Report</th>\n",
" <th>Price</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.10</td>\n",
" <td>Ideal</td>\n",
" <td>H</td>\n",
" <td>SI1</td>\n",
" <td>VG</td>\n",
" <td>EX</td>\n",
" <td>GIA</td>\n",
" <td>5169</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>0.83</td>\n",
" <td>Ideal</td>\n",
" <td>H</td>\n",
" <td>VS1</td>\n",
" <td>ID</td>\n",
" <td>ID</td>\n",
" <td>AGSL</td>\n",
" <td>3470</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0.85</td>\n",
" <td>Ideal</td>\n",
" <td>H</td>\n",
" <td>SI1</td>\n",
" <td>EX</td>\n",
" <td>EX</td>\n",
" <td>GIA</td>\n",
" <td>3183</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>0.91</td>\n",
" <td>Ideal</td>\n",
" <td>E</td>\n",
" <td>SI1</td>\n",
" <td>VG</td>\n",
" <td>VG</td>\n",
" <td>GIA</td>\n",
" <td>4370</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.83</td>\n",
" <td>Ideal</td>\n",
" <td>G</td>\n",
" <td>SI1</td>\n",
" <td>EX</td>\n",
" <td>EX</td>\n",
" <td>GIA</td>\n",
" <td>3171</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Carat Weight Cut Color Clarity Polish Symmetry Report Price\n",
"0 1.10 Ideal H SI1 VG EX GIA 5169\n",
"1 0.83 Ideal H VS1 ID ID AGSL 3470\n",
"2 0.85 Ideal H SI1 EX EX GIA 3183\n",
"3 0.91 Ideal E SI1 VG VG GIA 4370\n",
"4 0.83 Ideal G SI1 EX EX GIA 3171"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"이제 학습을 위해 데이터를 설정합시다."
],
"metadata": {
"id": "Qa1DIUQvLWOJ"
}
},
{
"cell_type": "code",
"source": [
"exp = setup(dataset, target='Price')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000,
"referenced_widgets": [
"9f7b6a46a9684e49ac3c91efb60a0222",
"fdb4d90a52b141a6837c83eed07141f3",
"2fdb30e0dd4948fd83a51fe1339b6628",
"a0dad4a9859d450da319e2bfd8fcf133",
"f83d4e068eaa4663a917556fcbf0839c",
"52398e08eaa845e3b941a3963d4e5926"
]
},
"id": "ovt1ai3Cy3VF",
"outputId": "012d7b58-eb61-41ba-ee34-db7e29b14d99"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Description</th>\n",
" <th>Value</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>session_id</td>\n",
" <td>4417</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Target</td>\n",
" <td>Price</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Original Data</td>\n",
" <td>(6000, 8)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Missing Values</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Numeric Features</td>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>Categorical Features</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>Ordinal Features</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>High Cardinality Features</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>High Cardinality Method</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>Transformed Train Set</td>\n",
" <td>(4199, 28)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>Transformed Test Set</td>\n",
" <td>(1801, 28)</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>Shuffle Train-Test</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>Stratify Train-Test</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>Fold Generator</td>\n",
" <td>KFold</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>Fold Number</td>\n",
" <td>10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>CPU Jobs</td>\n",
" <td>-1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>Use GPU</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>Log Experiment</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>Experiment Name</td>\n",
" <td>reg-default-name</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>USI</td>\n",
" <td>79cc</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>Imputation Type</td>\n",
" <td>simple</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>Iterative Imputation Iteration</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>Numeric Imputer</td>\n",
" <td>mean</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>Iterative Imputation Numeric Model</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>24</th>\n",
" <td>Categorical Imputer</td>\n",
" <td>constant</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25</th>\n",
" <td>Iterative Imputation Categorical Model</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>26</th>\n",
" <td>Unknown Categoricals Handling</td>\n",
" <td>least_frequent</td>\n",
" </tr>\n",
" <tr>\n",
" <th>27</th>\n",
" <td>Normalize</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>28</th>\n",
" <td>Normalize Method</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>29</th>\n",
" <td>Transformation</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>30</th>\n",
" <td>Transformation Method</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>31</th>\n",
" <td>PCA</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>32</th>\n",
" <td>PCA Method</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>33</th>\n",
" <td>PCA Components</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>34</th>\n",
" <td>Ignore Low Variance</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>35</th>\n",
" <td>Combine Rare Levels</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>36</th>\n",
" <td>Rare Level Threshold</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>37</th>\n",
" <td>Numeric Binning</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>38</th>\n",
" <td>Remove Outliers</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>39</th>\n",
" <td>Outliers Threshold</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>40</th>\n",
" <td>Remove Multicollinearity</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>41</th>\n",
" <td>Multicollinearity Threshold</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>42</th>\n",
" <td>Remove Perfect Collinearity</td>\n",
" <td>True</td>\n",
" </tr>\n",
" <tr>\n",
" <th>43</th>\n",
" <td>Clustering</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>44</th>\n",
" <td>Clustering Iteration</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>45</th>\n",
" <td>Polynomial Features</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>46</th>\n",
" <td>Polynomial Degree</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>47</th>\n",
" <td>Trignometry Features</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>48</th>\n",
" <td>Polynomial Threshold</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>49</th>\n",
" <td>Group Features</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50</th>\n",
" <td>Feature Selection</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>51</th>\n",
" <td>Feature Selection Method</td>\n",
" <td>classic</td>\n",
" </tr>\n",
" <tr>\n",
" <th>52</th>\n",
" <td>Features Selection Threshold</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>53</th>\n",
" <td>Feature Interaction</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>54</th>\n",
" <td>Feature Ratio</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>55</th>\n",
" <td>Interaction Threshold</td>\n",
" <td>None</td>\n",
" </tr>\n",
" <tr>\n",
" <th>56</th>\n",
" <td>Transform Target</td>\n",
" <td>False</td>\n",
" </tr>\n",
" <tr>\n",
" <th>57</th>\n",
" <td>Transform Target Method</td>\n",
" <td>box-cox</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Description Value\n",
"0 session_id 4417\n",
"1 Target Price\n",
"2 Original Data (6000, 8)\n",
"3 Missing Values False\n",
"4 Numeric Features 1\n",
"5 Categorical Features 6\n",
"6 Ordinal Features False\n",
"7 High Cardinality Features False\n",
"8 High Cardinality Method None\n",
"9 Transformed Train Set (4199, 28)\n",
"10 Transformed Test Set (1801, 28)\n",
"11 Shuffle Train-Test True\n",
"12 Stratify Train-Test False\n",
"13 Fold Generator KFold\n",
"14 Fold Number 10\n",
"15 CPU Jobs -1\n",
"16 Use GPU False\n",
"17 Log Experiment False\n",
"18 Experiment Name reg-default-name\n",
"19 USI 79cc\n",
"20 Imputation Type simple\n",
"21 Iterative Imputation Iteration None\n",
"22 Numeric Imputer mean\n",
"23 Iterative Imputation Numeric Model None\n",
"24 Categorical Imputer constant\n",
"25 Iterative Imputation Categorical Model None\n",
"26 Unknown Categoricals Handling least_frequent\n",
"27 Normalize False\n",
"28 Normalize Method None\n",
"29 Transformation False\n",
"30 Transformation Method None\n",
"31 PCA False\n",
"32 PCA Method None\n",
"33 PCA Components None\n",
"34 Ignore Low Variance False\n",
"35 Combine Rare Levels False\n",
"36 Rare Level Threshold None\n",
"37 Numeric Binning False\n",
"38 Remove Outliers False\n",
"39 Outliers Threshold None\n",
"40 Remove Multicollinearity False\n",
"41 Multicollinearity Threshold None\n",
"42 Remove Perfect Collinearity True\n",
"43 Clustering False\n",
"44 Clustering Iteration None\n",
"45 Polynomial Features False\n",
"46 Polynomial Degree None\n",
"47 Trignometry Features False\n",
"48 Polynomial Threshold None\n",
"49 Group Features False\n",
"50 Feature Selection False\n",
"51 Feature Selection Method classic\n",
"52 Features Selection Threshold None\n",
"53 Feature Interaction False\n",
"54 Feature Ratio False\n",
"55 Interaction Threshold None\n",
"56 Transform Target False\n",
"57 Transform Target Method box-cox"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"이제 모형을 만들 차례입니다. \n",
"PyCaret는 compare_models() 이거 한 방으로 가장 잘 작동하는 모형을 찾아 줍니다!"
],
"metadata": {
"id": "MRIfbnlrgIQ3"
}
},
{
"cell_type": "code",
"source": [
"compare_models()"
],
"metadata": {
"id": "vEbiedsGzEOW",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 741,
"referenced_widgets": [
"73596d1a1a904456b11da3fe4571731e",
"8181e471c2894b40a2aaf52e61a574b0",
"400d536c059a43279ddcb9f4bcb434d0"
]
},
"outputId": "4b58b013-8cf4-4d00-e0a5-e363f15c3a93"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Model</th>\n",
" <th>MAE</th>\n",
" <th>MSE</th>\n",
" <th>RMSE</th>\n",
" <th>R2</th>\n",
" <th>RMSLE</th>\n",
" <th>MAPE</th>\n",
" <th>TT (Sec)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>et</th>\n",
" <td>Extra Trees Regressor</td>\n",
" <td>7.253200e+02</td>\n",
" <td>2.089617e+06</td>\n",
" <td>1.412893e+03</td>\n",
" <td>9.809000e-01</td>\n",
" <td>0.0775</td>\n",
" <td>0.0584</td>\n",
" <td>1.451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>rf</th>\n",
" <td>Random Forest Regressor</td>\n",
" <td>7.194286e+02</td>\n",
" <td>2.380044e+06</td>\n",
" <td>1.492959e+03</td>\n",
" <td>9.785000e-01</td>\n",
" <td>0.0778</td>\n",
" <td>0.0575</td>\n",
" <td>1.451</td>\n",
" </tr>\n",
" <tr>\n",
" <th>gbr</th>\n",
" <td>Gradient Boosting Regressor</td>\n",
" <td>9.013263e+02</td>\n",
" <td>3.084118e+06</td>\n",
" <td>1.726864e+03</td>\n",
" <td>9.717000e-01</td>\n",
" <td>0.1013</td>\n",
" <td>0.0767</td>\n",
" <td>0.302</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lightgbm</th>\n",
" <td>Light Gradient Boosting Machine</td>\n",
" <td>7.622670e+02</td>\n",
" <td>3.445734e+06</td>\n",
" <td>1.771217e+03</td>\n",
" <td>9.692000e-01</td>\n",
" <td>0.0774</td>\n",
" <td>0.0562</td>\n",
" <td>0.114</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dt</th>\n",
" <td>Decision Tree Regressor</td>\n",
" <td>9.402904e+02</td>\n",
" <td>3.919769e+06</td>\n",
" <td>1.943954e+03</td>\n",
" <td>9.636000e-01</td>\n",
" <td>0.1006</td>\n",
" <td>0.0737</td>\n",
" <td>0.033</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ridge</th>\n",
" <td>Ridge Regression</td>\n",
" <td>2.522849e+03</td>\n",
" <td>1.553797e+07</td>\n",
" <td>3.911392e+03</td>\n",
" <td>8.557000e-01</td>\n",
" <td>0.6438</td>\n",
" <td>0.2985</td>\n",
" <td>0.038</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lasso</th>\n",
" <td>Lasso Regression</td>\n",
" <td>2.519646e+03</td>\n",
" <td>1.555177e+07</td>\n",
" <td>3.913367e+03</td>\n",
" <td>8.555000e-01</td>\n",
" <td>0.6405</td>\n",
" <td>0.2977</td>\n",
" <td>0.063</td>\n",
" </tr>\n",
" <tr>\n",
" <th>br</th>\n",
" <td>Bayesian Ridge</td>\n",
" <td>2.522436e+03</td>\n",
" <td>1.556019e+07</td>\n",
" <td>3.914523e+03</td>\n",
" <td>8.554000e-01</td>\n",
" <td>0.6393</td>\n",
" <td>0.2983</td>\n",
" <td>0.022</td>\n",
" </tr>\n",
" <tr>\n",
" <th>llar</th>\n",
" <td>Lasso Least Angle Regression</td>\n",
" <td>2.463327e+03</td>\n",
" <td>1.559329e+07</td>\n",
" <td>3.916103e+03</td>\n",
" <td>8.553000e-01</td>\n",
" <td>0.6665</td>\n",
" <td>0.2839</td>\n",
" <td>0.020</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lr</th>\n",
" <td>Linear Regression</td>\n",
" <td>2.535285e+03</td>\n",
" <td>1.558196e+07</td>\n",
" <td>3.917170e+03</td>\n",
" <td>8.552000e-01</td>\n",
" <td>0.6496</td>\n",
" <td>0.3017</td>\n",
" <td>0.598</td>\n",
" </tr>\n",
" <tr>\n",
" <th>huber</th>\n",
" <td>Huber Regressor</td>\n",
" <td>2.003904e+03</td>\n",
" <td>2.115274e+07</td>\n",
" <td>4.544653e+03</td>\n",
" <td>8.051000e-01</td>\n",
" <td>0.4180</td>\n",
" <td>0.1688</td>\n",
" <td>0.145</td>\n",
" </tr>\n",
" <tr>\n",
" <th>par</th>\n",
" <td>Passive Aggressive Regressor</td>\n",
" <td>2.014468e+03</td>\n",
" <td>2.291784e+07</td>\n",
" <td>4.721545e+03</td>\n",
" <td>7.896000e-01</td>\n",
" <td>0.4229</td>\n",
" <td>0.1610</td>\n",
" <td>0.051</td>\n",
" </tr>\n",
" <tr>\n",
" <th>ada</th>\n",
" <td>AdaBoost Regressor</td>\n",
" <td>4.232378e+03</td>\n",
" <td>2.571827e+07</td>\n",
" <td>5.048940e+03</td>\n",
" <td>7.565000e-01</td>\n",
" <td>0.4781</td>\n",
" <td>0.5518</td>\n",
" <td>0.263</td>\n",
" </tr>\n",
" <tr>\n",
" <th>omp</th>\n",
" <td>Orthogonal Matching Pursuit</td>\n",
" <td>3.020520e+03</td>\n",
" <td>2.692980e+07</td>\n",
" <td>5.143359e+03</td>\n",
" <td>7.506000e-01</td>\n",
" <td>0.3995</td>\n",
" <td>0.2722</td>\n",
" <td>0.017</td>\n",
" </tr>\n",
" <tr>\n",
" <th>knn</th>\n",
" <td>K Neighbors Regressor</td>\n",
" <td>3.041608e+03</td>\n",
" <td>3.021692e+07</td>\n",
" <td>5.488381e+03</td>\n",
" <td>7.132000e-01</td>\n",
" <td>0.3716</td>\n",
" <td>0.2806</td>\n",
" <td>0.080</td>\n",
" </tr>\n",
" <tr>\n",
" <th>en</th>\n",
" <td>Elastic Net</td>\n",
" <td>5.100566e+03</td>\n",
" <td>6.056820e+07</td>\n",
" <td>7.746799e+03</td>\n",
" <td>4.335000e-01</td>\n",
" <td>0.5389</td>\n",
" <td>0.5845</td>\n",
" <td>0.039</td>\n",
" </tr>\n",
" <tr>\n",
" <th>dummy</th>\n",
" <td>Dummy Regressor</td>\n",
" <td>7.345813e+03</td>\n",
" <td>1.063003e+08</td>\n",
" <td>1.028637e+04</td>\n",
" <td>-1.800000e-03</td>\n",
" <td>0.7600</td>\n",
" <td>0.8936</td>\n",
" <td>0.013</td>\n",
" </tr>\n",
" <tr>\n",
" <th>lar</th>\n",
" <td>Least Angle Regression</td>\n",
" <td>2.793011e+06</td>\n",
" <td>3.301584e+14</td>\n",
" <td>5.919992e+06</td>\n",
" <td>-2.375630e+06</td>\n",
" <td>1.7392</td>\n",
" <td>380.9053</td>\n",
" <td>0.023</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Model MAE MSE \\\n",
"et Extra Trees Regressor 7.253200e+02 2.089617e+06 \n",
"rf Random Forest Regressor 7.194286e+02 2.380044e+06 \n",
"gbr Gradient Boosting Regressor 9.013263e+02 3.084118e+06 \n",
"lightgbm Light Gradient Boosting Machine 7.622670e+02 3.445734e+06 \n",
"dt Decision Tree Regressor 9.402904e+02 3.919769e+06 \n",
"ridge Ridge Regression 2.522849e+03 1.553797e+07 \n",
"lasso Lasso Regression 2.519646e+03 1.555177e+07 \n",
"br Bayesian Ridge 2.522436e+03 1.556019e+07 \n",
"llar Lasso Least Angle Regression 2.463327e+03 1.559329e+07 \n",
"lr Linear Regression 2.535285e+03 1.558196e+07 \n",
"huber Huber Regressor 2.003904e+03 2.115274e+07 \n",
"par Passive Aggressive Regressor 2.014468e+03 2.291784e+07 \n",
"ada AdaBoost Regressor 4.232378e+03 2.571827e+07 \n",
"omp Orthogonal Matching Pursuit 3.020520e+03 2.692980e+07 \n",
"knn K Neighbors Regressor 3.041608e+03 3.021692e+07 \n",
"en Elastic Net 5.100566e+03 6.056820e+07 \n",
"dummy Dummy Regressor 7.345813e+03 1.063003e+08 \n",
"lar Least Angle Regression 2.793011e+06 3.301584e+14 \n",
"\n",
" RMSE R2 RMSLE MAPE TT (Sec) \n",
"et 1.412893e+03 9.809000e-01 0.0775 0.0584 1.451 \n",
"rf 1.492959e+03 9.785000e-01 0.0778 0.0575 1.451 \n",
"gbr 1.726864e+03 9.717000e-01 0.1013 0.0767 0.302 \n",
"lightgbm 1.771217e+03 9.692000e-01 0.0774 0.0562 0.114 \n",
"dt 1.943954e+03 9.636000e-01 0.1006 0.0737 0.033 \n",
"ridge 3.911392e+03 8.557000e-01 0.6438 0.2985 0.038 \n",
"lasso 3.913367e+03 8.555000e-01 0.6405 0.2977 0.063 \n",
"br 3.914523e+03 8.554000e-01 0.6393 0.2983 0.022 \n",
"llar 3.916103e+03 8.553000e-01 0.6665 0.2839 0.020 \n",
"lr 3.917170e+03 8.552000e-01 0.6496 0.3017 0.598 \n",
"huber 4.544653e+03 8.051000e-01 0.4180 0.1688 0.145 \n",
"par 4.721545e+03 7.896000e-01 0.4229 0.1610 0.051 \n",
"ada 5.048940e+03 7.565000e-01 0.4781 0.5518 0.263 \n",
"omp 5.143359e+03 7.506000e-01 0.3995 0.2722 0.017 \n",
"knn 5.488381e+03 7.132000e-01 0.3716 0.2806 0.080 \n",
"en 7.746799e+03 4.335000e-01 0.5389 0.5845 0.039 \n",
"dummy 1.028637e+04 -1.800000e-03 0.7600 0.8936 0.013 \n",
"lar 5.919992e+06 -2.375630e+06 1.7392 380.9053 0.023 "
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',\n",
" max_depth=None, max_features='auto', max_leaf_nodes=None,\n",
" max_samples=None, min_impurity_decrease=0.0,\n",
" min_impurity_split=None, min_samples_leaf=1,\n",
" min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
" n_estimators=100, n_jobs=-1, oob_score=False,\n",
" random_state=4417, verbose=0, warm_start=False)"
]
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "markdown",
"source": [
"여러 모형들의 추론 결과가 나왔습니다. 가장 성능이 좋은 순으로 정렬이 되므로 맨 위에 표시된 Extra Trees Regressor을 사용해 보겠습니다.\n"
],
"metadata": {
"id": "EFhGloueMJuu"
}
},
{
"cell_type": "code",
"source": [
"model = create_model('et')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 426,
"referenced_widgets": [
"c96575810e9645879a74caf082ec51b4",
"01a6b7ac8c0446a58ad83864a77acf9a",
"0febddd05ca14cef901a0bd6548d3f0e"
]
},
"id": "smAZZaLMgCBN",
"outputId": "bfb573a5-4e8e-44f4-cd1d-4f6c619fb26f"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MAE</th>\n",
" <th>MSE</th>\n",
" <th>RMSE</th>\n",
" <th>R2</th>\n",
" <th>RMSLE</th>\n",
" <th>MAPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>706.8477</td>\n",
" <td>1.851739e+06</td>\n",
" <td>1360.7860</td>\n",
" <td>0.9820</td>\n",
" <td>0.0792</td>\n",
" <td>0.0595</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>714.7612</td>\n",
" <td>1.574976e+06</td>\n",
" <td>1254.9804</td>\n",
" <td>0.9833</td>\n",
" <td>0.0794</td>\n",
" <td>0.0602</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>837.1378</td>\n",
" <td>3.162244e+06</td>\n",
" <td>1778.2700</td>\n",
" <td>0.9717</td>\n",
" <td>0.0836</td>\n",
" <td>0.0619</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>723.5708</td>\n",
" <td>1.655954e+06</td>\n",
" <td>1286.8388</td>\n",
" <td>0.9843</td>\n",
" <td>0.0777</td>\n",
" <td>0.0569</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>658.3075</td>\n",
" <td>1.331229e+06</td>\n",
" <td>1153.7888</td>\n",
" <td>0.9844</td>\n",
" <td>0.0773</td>\n",
" <td>0.0593</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>829.7322</td>\n",
" <td>4.622626e+06</td>\n",
" <td>2150.0294</td>\n",
" <td>0.9667</td>\n",
" <td>0.0807</td>\n",
" <td>0.0603</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>693.2810</td>\n",
" <td>1.520736e+06</td>\n",
" <td>1233.1813</td>\n",
" <td>0.9866</td>\n",
" <td>0.0731</td>\n",
" <td>0.0563</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>756.1417</td>\n",
" <td>2.342081e+06</td>\n",
" <td>1530.3860</td>\n",
" <td>0.9793</td>\n",
" <td>0.0764</td>\n",
" <td>0.0574</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>655.1617</td>\n",
" <td>1.369641e+06</td>\n",
" <td>1170.3167</td>\n",
" <td>0.9874</td>\n",
" <td>0.0729</td>\n",
" <td>0.0555</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>678.2579</td>\n",
" <td>1.464946e+06</td>\n",
" <td>1210.3496</td>\n",
" <td>0.9832</td>\n",
" <td>0.0749</td>\n",
" <td>0.0563</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mean</th>\n",
" <td>725.3200</td>\n",
" <td>2.089617e+06</td>\n",
" <td>1412.8927</td>\n",
" <td>0.9809</td>\n",
" <td>0.0775</td>\n",
" <td>0.0584</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SD</th>\n",
" <td>61.2087</td>\n",
" <td>9.973598e+05</td>\n",
" <td>305.5347</td>\n",
" <td>0.0063</td>\n",
" <td>0.0032</td>\n",
" <td>0.0020</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MAE MSE RMSE R2 RMSLE MAPE\n",
"0 706.8477 1.851739e+06 1360.7860 0.9820 0.0792 0.0595\n",
"1 714.7612 1.574976e+06 1254.9804 0.9833 0.0794 0.0602\n",
"2 837.1378 3.162244e+06 1778.2700 0.9717 0.0836 0.0619\n",
"3 723.5708 1.655954e+06 1286.8388 0.9843 0.0777 0.0569\n",
"4 658.3075 1.331229e+06 1153.7888 0.9844 0.0773 0.0593\n",
"5 829.7322 4.622626e+06 2150.0294 0.9667 0.0807 0.0603\n",
"6 693.2810 1.520736e+06 1233.1813 0.9866 0.0731 0.0563\n",
"7 756.1417 2.342081e+06 1530.3860 0.9793 0.0764 0.0574\n",
"8 655.1617 1.369641e+06 1170.3167 0.9874 0.0729 0.0555\n",
"9 678.2579 1.464946e+06 1210.3496 0.9832 0.0749 0.0563\n",
"Mean 725.3200 2.089617e+06 1412.8927 0.9809 0.0775 0.0584\n",
"SD 61.2087 9.973598e+05 305.5347 0.0063 0.0032 0.0020"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"표시된 내용은 교차 검증을 10회 수행한 결과와 그 평균(Mean) 및 표준편차(Standard Deviation)입니다. \n",
"\n",
"잠시 하이퍼파라미터를 포함해 모형의 상세를 살펴보겠습니다."
],
"metadata": {
"id": "Zs6lDo2AhdDU"
}
},
{
"cell_type": "code",
"source": [
"model"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "OtWcuSSIhWd5",
"outputId": "d38fc1c7-65d3-4057-942b-5b24cdcb856d"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"ExtraTreesRegressor(bootstrap=False, ccp_alpha=0.0, criterion='mse',\n",
" max_depth=None, max_features='auto', max_leaf_nodes=None,\n",
" max_samples=None, min_impurity_decrease=0.0,\n",
" min_impurity_split=None, min_samples_leaf=1,\n",
" min_samples_split=2, min_weight_fraction_leaf=0.0,\n",
" n_estimators=100, n_jobs=-1, oob_score=False,\n",
" random_state=4417, verbose=0, warm_start=False)"
]
},
"metadata": {},
"execution_count": 12
}
]
},
{
"cell_type": "markdown",
"source": [
"이제 시각화를 통해 이 잘 만들어 졌는지 확인해 보겠습니다."
],
"metadata": {
"id": "sdn1Mu4DOpNu"
}
},
{
"cell_type": "code",
"source": [
"plot_model(model)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 376,
"referenced_widgets": [
"704fa978c6c04b49bc561e547da2b637",
"406a205a1c3f4e2e973777726d72da0f",
"cd2d0dd380854d108f3cca927fe05b49"
]
},
"id": "cmahKk9qOeC5",
"outputId": "50b29431-ddc2-425a-de81-c206207dff74"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 576x396 with 2 Axes>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"만들어진 모형을 대상으로 하이퍼파라미터 최적화를 수행해 보겠습니니다. 이 작업은 40분 정도의 시간이 걸립니다."
],
"metadata": {
"id": "O2n-LgzQOXWq"
}
},
{
"cell_type": "code",
"source": [
"tuned_model = tune_model(model)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 426,
"referenced_widgets": [
"a0267b8ea717436ebc4bdf02760f8ab1",
"3e6d4b76c994492f9167a1b4ea0416f1",
"94e5d288a37f45309d7b12e820d7bb87"
]
},
"id": "QP3zgOQPhi8Q",
"outputId": "7fb0e67d-55b8-4983-a4ad-a7531b13f6f4"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>MAE</th>\n",
" <th>MSE</th>\n",
" <th>RMSE</th>\n",
" <th>R2</th>\n",
" <th>RMSLE</th>\n",
" <th>MAPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>796.6646</td>\n",
" <td>2.933526e+06</td>\n",
" <td>1712.7540</td>\n",
" <td>0.9716</td>\n",
" <td>0.0876</td>\n",
" <td>0.0675</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>745.6158</td>\n",
" <td>1.720768e+06</td>\n",
" <td>1311.7806</td>\n",
" <td>0.9818</td>\n",
" <td>0.0824</td>\n",
" <td>0.0622</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>867.8531</td>\n",
" <td>5.487956e+06</td>\n",
" <td>2342.6386</td>\n",
" <td>0.9508</td>\n",
" <td>0.0877</td>\n",
" <td>0.0637</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>924.7672</td>\n",
" <td>4.513364e+06</td>\n",
" <td>2124.4678</td>\n",
" <td>0.9573</td>\n",
" <td>0.0924</td>\n",
" <td>0.0669</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>708.7673</td>\n",
" <td>1.937939e+06</td>\n",
" <td>1392.0986</td>\n",
" <td>0.9772</td>\n",
" <td>0.0854</td>\n",
" <td>0.0652</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>978.4641</td>\n",
" <td>8.216249e+06</td>\n",
" <td>2866.4001</td>\n",
" <td>0.9409</td>\n",
" <td>0.0911</td>\n",
" <td>0.0679</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>732.4930</td>\n",
" <td>2.022965e+06</td>\n",
" <td>1422.3099</td>\n",
" <td>0.9821</td>\n",
" <td>0.0750</td>\n",
" <td>0.0583</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>934.0893</td>\n",
" <td>4.480752e+06</td>\n",
" <td>2116.7787</td>\n",
" <td>0.9603</td>\n",
" <td>0.0926</td>\n",
" <td>0.0702</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>751.4517</td>\n",
" <td>1.965643e+06</td>\n",
" <td>1402.0137</td>\n",
" <td>0.9819</td>\n",
" <td>0.0859</td>\n",
" <td>0.0656</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>774.1197</td>\n",
" <td>1.587920e+06</td>\n",
" <td>1260.1269</td>\n",
" <td>0.9818</td>\n",
" <td>0.0854</td>\n",
" <td>0.0673</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Mean</th>\n",
" <td>821.4286</td>\n",
" <td>3.486708e+06</td>\n",
" <td>1795.1369</td>\n",
" <td>0.9686</td>\n",
" <td>0.0866</td>\n",
" <td>0.0655</td>\n",
" </tr>\n",
" <tr>\n",
" <th>SD</th>\n",
" <td>91.8066</td>\n",
" <td>2.056709e+06</td>\n",
" <td>513.9958</td>\n",
" <td>0.0144</td>\n",
" <td>0.0050</td>\n",
" <td>0.0032</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" MAE MSE RMSE R2 RMSLE MAPE\n",
"0 796.6646 2.933526e+06 1712.7540 0.9716 0.0876 0.0675\n",
"1 745.6158 1.720768e+06 1311.7806 0.9818 0.0824 0.0622\n",
"2 867.8531 5.487956e+06 2342.6386 0.9508 0.0877 0.0637\n",
"3 924.7672 4.513364e+06 2124.4678 0.9573 0.0924 0.0669\n",
"4 708.7673 1.937939e+06 1392.0986 0.9772 0.0854 0.0652\n",
"5 978.4641 8.216249e+06 2866.4001 0.9409 0.0911 0.0679\n",
"6 732.4930 2.022965e+06 1422.3099 0.9821 0.0750 0.0583\n",
"7 934.0893 4.480752e+06 2116.7787 0.9603 0.0926 0.0702\n",
"8 751.4517 1.965643e+06 1402.0137 0.9819 0.0859 0.0656\n",
"9 774.1197 1.587920e+06 1260.1269 0.9818 0.0854 0.0673\n",
"Mean 821.4286 3.486708e+06 1795.1369 0.9686 0.0866 0.0655\n",
"SD 91.8066 2.056709e+06 513.9958 0.0144 0.0050 0.0032"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"최적화가 수행된 이후의 하이퍼파미터의 내용을 살펴봅시다."
],
"metadata": {
"id": "f0vpQnVcOLFU"
}
},
{
"cell_type": "code",
"source": [
"tuned_model"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "5FKgWGxpxI_i",
"outputId": "7fe30185-912e-4099-cd4f-443abda953eb"
},
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"ExtraTreesRegressor(bootstrap=True, ccp_alpha=0.0, criterion='mse',\n",
" max_depth=10, max_features=1.0, max_leaf_nodes=None,\n",
" max_samples=None, min_impurity_decrease=0.0002,\n",
" min_impurity_split=None, min_samples_leaf=4,\n",
" min_samples_split=5, min_weight_fraction_leaf=0.0,\n",
" n_estimators=250, n_jobs=-1, oob_score=False,\n",
" random_state=4417, verbose=0, warm_start=False)"
]
},
"metadata": {},
"execution_count": 15
}
]
},
{
"cell_type": "markdown",
"source": [
"이제 모형을 이용해 예측한 결과와 실제 가격을 비교해 봅시다. Price가 실제 가격이고 Label은 모형이 추론한 가격 입니다."
],
"metadata": {
"id": "0-NfoSzOPCoR"
}
},
{
"cell_type": "code",
"source": [
"predict_model(tuned_model)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 525
},
"id": "RAf7rQjqmIUm",
"outputId": "2390b05d-ed46-4f2c-81eb-61f139a0c59b"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Model</th>\n",
" <th>MAE</th>\n",
" <th>MSE</th>\n",
" <th>RMSE</th>\n",
" <th>R2</th>\n",
" <th>RMSLE</th>\n",
" <th>MAPE</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>Extra Trees Regressor</td>\n",
" <td>802.8863</td>\n",
" <td>1.913203e+06</td>\n",
" <td>1383.1859</td>\n",
" <td>0.9804</td>\n",
" <td>0.0887</td>\n",
" <td>0.0678</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" Model MAE MSE ... R2 RMSLE MAPE\n",
"0 Extra Trees Regressor 802.8863 1.913203e+06 ... 0.9804 0.0887 0.0678\n",
"\n",
"[1 rows x 7 columns]"
]
},
"metadata": {}
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Carat Weight</th>\n",
" <th>Cut_Fair</th>\n",
" <th>Cut_Good</th>\n",
" <th>Cut_Ideal</th>\n",
" <th>Cut_Signature-Ideal</th>\n",
" <th>Cut_Very Good</th>\n",
" <th>Color_D</th>\n",
" <th>Color_E</th>\n",
" <th>Color_F</th>\n",
" <th>Color_G</th>\n",
" <th>Color_H</th>\n",
" <th>Color_I</th>\n",
" <th>Clarity_FL</th>\n",
" <th>Clarity_IF</th>\n",
" <th>Clarity_SI1</th>\n",
" <th>Clarity_VS1</th>\n",
" <th>Clarity_VS2</th>\n",
" <th>Clarity_VVS1</th>\n",
" <th>Clarity_VVS2</th>\n",
" <th>Polish_EX</th>\n",
" <th>Polish_G</th>\n",
" <th>Polish_ID</th>\n",
" <th>Polish_VG</th>\n",
" <th>Symmetry_EX</th>\n",
" <th>Symmetry_G</th>\n",
" <th>Symmetry_ID</th>\n",
" <th>Symmetry_VG</th>\n",
" <th>Report_GIA</th>\n",
" <th>Price</th>\n",
" <th>Label</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1.01</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>6825</td>\n",
" <td>7320.343803</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1.61</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>16205</td>\n",
" <td>16308.234711</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2.17</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>25707</td>\n",
" <td>25670.884109</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1.21</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>7810</td>\n",
" <td>7237.169775</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0.91</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>3525</td>\n",
" <td>3841.674563</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1796</th>\n",
" <td>1.32</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>9513</td>\n",
" <td>9058.079597</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1797</th>\n",
" <td>0.79</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>2888</td>\n",
" <td>2916.601843</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1798</th>\n",
" <td>1.06</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>6003</td>\n",
" <td>5460.893484</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1799</th>\n",
" <td>1.19</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>1.0</td>\n",
" <td>6335</td>\n",
" <td>6428.452253</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1800</th>\n",
" <td>0.75</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>3525</td>\n",
" <td>3712.824582</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1801 rows × 30 columns</p>\n",
"</div>"
],
"text/plain": [
" Carat Weight Cut_Fair Cut_Good ... Report_GIA Price Label\n",
"0 1.01 0.0 1.0 ... 1.0 6825 7320.343803\n",
"1 1.61 0.0 0.0 ... 1.0 16205 16308.234711\n",
"2 2.17 0.0 0.0 ... 1.0 25707 25670.884109\n",
"3 1.21 0.0 0.0 ... 0.0 7810 7237.169775\n",
"4 0.91 0.0 0.0 ... 1.0 3525 3841.674563\n",
"... ... ... ... ... ... ... ...\n",
"1796 1.32 0.0 0.0 ... 1.0 9513 9058.079597\n",
"1797 0.79 0.0 0.0 ... 1.0 2888 2916.601843\n",
"1798 1.06 0.0 0.0 ... 1.0 6003 5460.893484\n",
"1799 1.19 0.0 0.0 ... 1.0 6335 6428.452253\n",
"1800 0.75 0.0 0.0 ... 0.0 3525 3712.824582\n",
"\n",
"[1801 rows x 30 columns]"
]
},
"metadata": {},
"execution_count": 14
}
]
},
{
"cell_type": "markdown",
"source": [
"예측치와 실제 가격의 차이가 크진 않은것 같지만 눈으로 파악하기는 어렵습니다. \n",
"아까와 마찬가지로 시각화를 통해 모형의 성능을 확인해 보겠습니다."
],
"metadata": {
"id": "AdeicLiW135T"
}
},
{
"cell_type": "code",
"source": [
"plot_model(tuned_model)"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 376,
"referenced_widgets": [
"9f84f9037515423383eaba950fd5dcfc",
"80611e8365a147dd92e4c2050fb9f86b",
"7b21858d2e4947fc838f122705b19ae1"
]
},
"id": "6aha5m-YvjL9",
"outputId": "08bbe356-0229-4e32-b9d6-5e737ca41561"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 576x396 with 2 Axes>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"파란색 점은 학습에 의해 예측된 값이고 녹색은 테스트에 의해 예측된 값 입니다. 0에 가까울 수록 좀 더 정확하게 맞췄다고 볼 수 있는데 튜닝 후에 좀 더 분산이 좋아진 것을 알 수 있습니다. \n",
"그럼 어떤 특성이 가격에 가장 큰 영향을 미쳤는지 살펴봅시다. plot_model에 plot='feature'를 추가해 주는것 만으로 끝납니다."
],
"metadata": {
"id": "Hv9aL08i2Ma0"
}
},
{
"cell_type": "code",
"source": [
"plot_model(tuned_model, plot='feature')"
],
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 478,
"referenced_widgets": [
"7d6a997e72454d6e9010793540ba9ead",
"bed1185cf2694996ba4f382336054777",
"e57e4eb9160d4879be00834a1baca394"
]
},
"id": "ljGaNg8LzJXB",
"outputId": "d4e99ec8-b5b6-4056-e942-439c5e9b8c50"
},
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 800x500 with 1 Axes>"
]
},
"metadata": {}
}
]
},
{
"cell_type": "markdown",
"source": [
"어떻습니까? 정말 몇 줄 안되는 코드로 모형을 만들고 검증까지 수행했습니다.\n"
],
"metadata": {
"id": "sqouKcNuRylC"
}
},
{
"cell_type": "code",
"source": [
""
],
"metadata": {
"id": "J3srB-h3auA8"
},
"execution_count": null,
"outputs": []
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment