Skip to content

Instantly share code, notes, and snippets.

@whan0623

whan0623/6.ipynb Secret

Created October 21, 2021 03:54
Show Gist options
  • Save whan0623/92d7cbe35ef29c594538f12c64485865 to your computer and use it in GitHub Desktop.
Save whan0623/92d7cbe35ef29c594538f12c64485865 to your computer and use it in GitHub Desktop.
6. 머신러닝으로 업무 효율화하기.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "6. 머신러닝으로 업무 효율화하기.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyNkawQzTLzXHsvPwljRj7dM",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/whan0623/92d7cbe35ef29c594538f12c64485865/6.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WoJ8ubWfasPl"
},
"source": [
"#6-1 업무 시스템에 머신러닝 적용하기\n",
"- 사용자가 거의 없는 야간에 DB에 ETL(Extract/Transform/Load) 처리"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "RkTzNgGca9Kd"
},
"source": [
"#6-2 학습 모델을 저장하고 읽어 들이는 방법"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Cf_mJ0wsdnrf"
},
"source": [
"##scikit-learn에서 학습데이터 읽고 저장하기"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "FnRht00FbP8N"
},
"source": [
"###scikit-learn에서 학습데이터 저장하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "HOnuJwHGaMb5",
"outputId": "bd0ed4fa-0f13-418a-be92-9e36a31ff0e9"
},
"source": [
"from sklearn import datasets, svm\n",
"from sklearn.externals import joblib\n",
"\n",
"# 붓꽃 데이터 읽어 들이기\n",
"iris = datasets.load_iris()\n",
"\n",
"# 데이터 학습하기\n",
"clf = svm.SVC()\n",
"clf.fit(iris.data, iris.target)\n",
"\n",
"# 학습한 데이터 저장하기\n",
"joblib.dump(clf, 'iris.pkl', compress=True)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stderr",
"text": [
"/usr/local/lib/python3.7/dist-packages/sklearn/externals/joblib/__init__.py:15: FutureWarning: sklearn.externals.joblib is deprecated in 0.21 and will be removed in 0.23. Please import this functionality directly from joblib, which can be installed with: pip install joblib. If this warning is raised when loading pickled models, you may need to re-serialize those models with scikit-learn 0.21+.\n",
" warnings.warn(msg, category=FutureWarning)\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['iris.pkl']"
]
},
"metadata": {},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "WPUGS8vYb_CU"
},
"source": [
"### 구글 드라이브(google drive)에 저장하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cavoph4gbkVm",
"outputId": "5ec3e27b-55d0-4090-83a6-1d802cef2a99"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"from sklearn.externals import joblib\n",
"joblib.dump(clf, '/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris.plk', compress=True)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n"
]
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris.plk']"
]
},
"metadata": {},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "xl2xm6EocD37"
},
"source": [
"###scikit-learn에서 학습된 데이터 읽어오기\n",
"\n",
"\n",
"\n"
]
},
{
"cell_type": "code",
"metadata": {
"id": "5kyusomNcQVC"
},
"source": [
"from sklearn.externals import joblib\n",
"\n",
"# 이전에 저장한 학습된 데이터 읽어 들이기\n",
"clf = joblib.load('iris.pkl')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "vXkOq5QgcK7T"
},
"source": [
"###구글 드라이브(google drive)에 저장하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "S22lRzQTchrK",
"outputId": "2ce7326d-dab7-4b69-f4f3-8e83ba96b635"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"from sklearn.externals import joblib\n",
"clf = joblib.load('/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris.plk')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "K71gmkLEfWEu"
},
"source": [
"### 평가하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "4jzYN9Z0cpJy",
"outputId": "25d2470d-8751-45bb-e6e8-ab7d93c51fbc"
},
"source": [
"from sklearn import datasets, svm\n",
"from sklearn.metrics import accuracy_score\n",
"\n",
"# 붓꽃 데이터 읽어 들이기\n",
"iris = datasets.load_iris()\n",
"# 예측하기\n",
"pre = clf.predict(iris.data)\n",
"# 정답률 확인하기\n",
"print(accuracy_score(iris.target, pre))"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"0.9733333333333334\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "slhqKgQgdyTm"
},
"source": [
"## Tensorflow와 Keras에서 학습데이터 읽고 저장하기"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BBjvwiw3d4So"
},
"source": [
"### Tensorflow와 Keras에서 학습데이터 저장하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "dQVQFXj_eMuG",
"outputId": "dae1227d-a35a-4cb6-c90a-b596ad1d9d6f"
},
"source": [
"from sklearn import datasets\n",
"import keras\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout\n",
"from keras.utils.np_utils import to_categorical\n",
"\n",
"# 붓꽃 데이터 읽어 들이기\n",
"iris = datasets.load_iris()\n",
"in_size = 4\n",
"nb_classes=3\n",
"# 레이블 데이터를 One-hot 형식으로 변환하기\n",
"x = iris.data\n",
"y = to_categorical(iris.target, nb_classes)\n",
"\n",
"# 모델 정의하기 --- (*1)\n",
"model = Sequential()\n",
"model.add(Dense(512, activation='relu', input_shape=(in_size,)))\n",
"model.add(Dense(512, activation='relu'))\n",
"model.add(Dropout(0.2))\n",
"model.add(Dense(nb_classes, activation='softmax'))\n",
"# 컴파일하기 --- (*2)\n",
"model.compile(\n",
" loss='categorical_crossentropy',\n",
" optimizer='adam',\n",
" metrics=['accuracy'])\n",
"# 학습 실행하기 --- (*3)\n",
"model.fit(x, y, batch_size=20, epochs=50)\n",
"\n",
"# 모델 저장하기 --- (*4)\n",
"model.save('iris_model.h5')\n",
"# 학습한 가중치 데이터 저장하기 --- (*5)\n",
"model.save_weights('iris_weight.h5')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/50\n",
"8/8 [==============================] - 1s 5ms/step - loss: 0.8119 - accuracy: 0.6267\n",
"Epoch 2/50\n",
"8/8 [==============================] - 0s 6ms/step - loss: 0.4872 - accuracy: 0.7533\n",
"Epoch 3/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.3727 - accuracy: 0.8000\n",
"Epoch 4/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.3047 - accuracy: 0.8533\n",
"Epoch 5/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.2168 - accuracy: 0.9333\n",
"Epoch 6/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.2075 - accuracy: 0.9533\n",
"Epoch 7/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.1665 - accuracy: 0.9200\n",
"Epoch 8/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.1430 - accuracy: 0.9600\n",
"Epoch 9/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1338 - accuracy: 0.9533\n",
"Epoch 10/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1604 - accuracy: 0.9333\n",
"Epoch 11/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1916 - accuracy: 0.9133\n",
"Epoch 12/50\n",
"8/8 [==============================] - 0s 6ms/step - loss: 0.1872 - accuracy: 0.9267\n",
"Epoch 13/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1294 - accuracy: 0.9467\n",
"Epoch 14/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1166 - accuracy: 0.9733\n",
"Epoch 15/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1040 - accuracy: 0.9600\n",
"Epoch 16/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1207 - accuracy: 0.9600\n",
"Epoch 17/50\n",
"8/8 [==============================] - 0s 6ms/step - loss: 0.1142 - accuracy: 0.9533\n",
"Epoch 18/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0797 - accuracy: 0.9733\n",
"Epoch 19/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.1177 - accuracy: 0.9467\n",
"Epoch 20/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0944 - accuracy: 0.9533\n",
"Epoch 21/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0915 - accuracy: 0.9733\n",
"Epoch 22/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0876 - accuracy: 0.9467\n",
"Epoch 23/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0994 - accuracy: 0.9733\n",
"Epoch 24/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.1277 - accuracy: 0.9533\n",
"Epoch 25/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.1238 - accuracy: 0.9333\n",
"Epoch 26/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0937 - accuracy: 0.9467\n",
"Epoch 27/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.1045 - accuracy: 0.9667\n",
"Epoch 28/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0990 - accuracy: 0.9600\n",
"Epoch 29/50\n",
"8/8 [==============================] - 0s 6ms/step - loss: 0.0658 - accuracy: 0.9800\n",
"Epoch 30/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0906 - accuracy: 0.9733\n",
"Epoch 31/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0883 - accuracy: 0.9667\n",
"Epoch 32/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0833 - accuracy: 0.9733\n",
"Epoch 33/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0851 - accuracy: 0.9667\n",
"Epoch 34/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0748 - accuracy: 0.9733\n",
"Epoch 35/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.1006 - accuracy: 0.9533\n",
"Epoch 36/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0846 - accuracy: 0.9600\n",
"Epoch 37/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0923 - accuracy: 0.9733\n",
"Epoch 38/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0720 - accuracy: 0.9667\n",
"Epoch 39/50\n",
"8/8 [==============================] - 0s 6ms/step - loss: 0.0850 - accuracy: 0.9600\n",
"Epoch 40/50\n",
"8/8 [==============================] - 0s 6ms/step - loss: 0.0771 - accuracy: 0.9733\n",
"Epoch 41/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0665 - accuracy: 0.9733\n",
"Epoch 42/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0617 - accuracy: 0.9800\n",
"Epoch 43/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0659 - accuracy: 0.9733\n",
"Epoch 44/50\n",
"8/8 [==============================] - 0s 5ms/step - loss: 0.0768 - accuracy: 0.9667\n",
"Epoch 45/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0725 - accuracy: 0.9600\n",
"Epoch 46/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0778 - accuracy: 0.9600\n",
"Epoch 47/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0786 - accuracy: 0.9667\n",
"Epoch 48/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0792 - accuracy: 0.9867\n",
"Epoch 49/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0695 - accuracy: 0.9733\n",
"Epoch 50/50\n",
"8/8 [==============================] - 0s 4ms/step - loss: 0.0820 - accuracy: 0.9667\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PFD3BvkweB1e"
},
"source": [
"### 구글 드라이브(google drive)에 저장하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FunaqbEqedBn",
"outputId": "ea1bda18-4af5-4400-a035-4df0190f520e"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"# 모델 저장하기 --- (*4)\n",
"model.save('/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris_model.h5')\n",
"# 학습한 가중치 데이터 저장하기 --- (*5)\n",
"model.save_weights('/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris_weight.h5')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "6fakAlQqd-zW"
},
"source": [
"### Tensorflow와 Keras에서 학습데이터 읽어오기"
]
},
{
"cell_type": "code",
"metadata": {
"id": "8_4fFpaOe0BF"
},
"source": [
"from sklearn import datasets\n",
"import keras\n",
"from keras.models import load_model\n",
"from keras.utils.np_utils import to_categorical\n",
"\n",
"# 붓꽃 데이터 읽어 들이기\n",
"iris = datasets.load_iris()\n",
"in_size = 4\n",
"nb_classes=3\n",
"# 레이블 데이터를 One-hot 형식으로 변환하기\n",
"x = iris.data\n",
"y = to_categorical(iris.target, nb_classes)\n",
"\n",
"# 모델 읽어 들이기 --- (*1)\n",
"model = load_model('iris_model.h5')\n",
"# 가중치 데이터 읽어 들이기 --- (*2)\n",
"model.load_weights('iris_weight.h5')"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "oYd-PrxreI-l"
},
"source": [
"### 구글 드라이브(google drive)에서 읽어오기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "yDMT1Pq2e99M",
"outputId": "5d710a2c-25cc-4a91-b948-8472d82691d4"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"# 모델 읽어 들이기 --- (*1)\n",
"model = load_model('/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris_model.h5')\n",
"# 가중치 데이터 읽어 들이기 --- (*2)\n",
"model.load_weights('/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/iris_weight.h5')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OQdNCs38faT3"
},
"source": [
"### 평가하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "xhGSnJKie-Bc",
"outputId": "0e09ed3f-ce93-4d7a-f774-49978c07013e"
},
"source": [
"# 모델 평가하기 --- (*3)\n",
"score = model.evaluate(x, y, verbose=1)\n",
"print(\"정답률=\", score[1])"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"5/5 [==============================] - 0s 3ms/step - loss: 0.0577 - accuracy: 0.9733\n",
"정답률= 0.9733333587646484\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "txJfW3JQfnVo"
},
"source": [
"#6-3 뉴스 기사의 카테고리 판정하기\n",
"- BoW(Bag-of-Words) : 문장을 벡터 데이터로 변환\n",
"- TF-IDF : 문장을 수리초 변환, 출현빈도와 함께 문장 전체에서 단어의 중요도 고려"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "e9hDVBkYgm0R"
},
"source": [
"##TF-IDF 모듈 만들기"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MmSxoWi-hLx5"
},
"source": [
"###koNLpy 설치\n",
"- !pip install konlpy\n",
" - 느낌표(!) 뒤에 쉘명령어를 쓰면 코랩에서 실행됨\n",
"- 참조 : https://couplewith.tistory.com/entry/Python-KoNLPy-%ED%98%95%ED%83%9C%EC%86%8C-%EB%B6%84%EC%84%9D%EA%B8%B0-%EB%B9%84%EA%B5%90-Komoran-Okt-Kkma"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "IFAqcDQphK2o",
"outputId": "672a2ab7-1ba7-4885-d6b3-ecf66160b0d1"
},
"source": [
"!pip install konlpy"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Collecting konlpy\n",
" Downloading konlpy-0.5.2-py2.py3-none-any.whl (19.4 MB)\n",
"\u001b[K |████████████████████████████████| 19.4 MB 131 kB/s \n",
"\u001b[?25hRequirement already satisfied: lxml>=4.1.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (4.2.6)\n",
"Collecting beautifulsoup4==4.6.0\n",
" Downloading beautifulsoup4-4.6.0-py3-none-any.whl (86 kB)\n",
"\u001b[K |████████████████████████████████| 86 kB 6.6 MB/s \n",
"\u001b[?25hRequirement already satisfied: numpy>=1.6 in /usr/local/lib/python3.7/dist-packages (from konlpy) (1.19.5)\n",
"Collecting colorama\n",
" Downloading colorama-0.4.4-py2.py3-none-any.whl (16 kB)\n",
"Requirement already satisfied: tweepy>=3.7.0 in /usr/local/lib/python3.7/dist-packages (from konlpy) (3.10.0)\n",
"Collecting JPype1>=0.7.0\n",
" Downloading JPype1-1.3.0-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (448 kB)\n",
"\u001b[K |████████████████████████████████| 448 kB 69.9 MB/s \n",
"\u001b[?25hRequirement already satisfied: typing-extensions in /usr/local/lib/python3.7/dist-packages (from JPype1>=0.7.0->konlpy) (3.7.4.3)\n",
"Requirement already satisfied: requests-oauthlib>=0.7.0 in /usr/local/lib/python3.7/dist-packages (from tweepy>=3.7.0->konlpy) (1.3.0)\n",
"Requirement already satisfied: requests[socks]>=2.11.1 in /usr/local/lib/python3.7/dist-packages (from tweepy>=3.7.0->konlpy) (2.23.0)\n",
"Requirement already satisfied: six>=1.10.0 in /usr/local/lib/python3.7/dist-packages (from tweepy>=3.7.0->konlpy) (1.15.0)\n",
"Requirement already satisfied: oauthlib>=3.0.0 in /usr/local/lib/python3.7/dist-packages (from requests-oauthlib>=0.7.0->tweepy>=3.7.0->konlpy) (3.1.1)\n",
"Requirement already satisfied: idna<3,>=2.5 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (2.10)\n",
"Requirement already satisfied: chardet<4,>=3.0.2 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (3.0.4)\n",
"Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (1.24.3)\n",
"Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (2021.5.30)\n",
"Requirement already satisfied: PySocks!=1.5.7,>=1.5.6 in /usr/local/lib/python3.7/dist-packages (from requests[socks]>=2.11.1->tweepy>=3.7.0->konlpy) (1.7.1)\n",
"Installing collected packages: JPype1, colorama, beautifulsoup4, konlpy\n",
" Attempting uninstall: beautifulsoup4\n",
" Found existing installation: beautifulsoup4 4.6.3\n",
" Uninstalling beautifulsoup4-4.6.3:\n",
" Successfully uninstalled beautifulsoup4-4.6.3\n",
"Successfully installed JPype1-1.3.0 beautifulsoup4-4.6.0 colorama-0.4.4 konlpy-0.5.2\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "q-60kS5Nfq4s"
},
"source": [
"# TF-IDF로 텍스트를 벡터로 변환하는 모듈\n",
"from konlpy.tag import Okt\n",
"import pickle\n",
"import numpy as np\n",
"\n",
"# KoNLPy의 Okt객체 초기화 ---- ( ※ 1)\n",
"okt = Okt()\n",
"# 전역 변수 --- ( ※ 2)\n",
"word_dic = {'_id': 0} # 단어 사전\n",
"dt_dic = {} # 문장 전체에서의 단어 출현 횟수\n",
"id_files = [] # 문서들을 저장할 리스트\n",
"\n",
"def tokenize(text):\n",
" '''KoNLPy로 형태소 분석하기''' # --- ( ※ 3) \n",
" result = []\n",
" word_s = okt.pos(text, norm=True, stem=True)\n",
" for n, h in word_s:\n",
" if not (h in ['Noun', 'Verb ', 'Adjective']): continue\n",
" if h == 'Punctuation' and h2 == 'Number': continue\n",
" result.append(n)\n",
" return result\n",
"\n",
"def words_to_ids(words, auto_add = True):\n",
" ''' 단어를 ID로 변환하기 ''' # --- ( ※ 4)\n",
" result = []\n",
" for w in words:\n",
" if w in word_dic:\n",
" result.append(word_dic[w])\n",
" continue\n",
" elif auto_add:\n",
" id = word_dic[w] = word_dic['_id']\n",
" word_dic['_id'] += 1\n",
" result.append(id)\n",
" return result\n",
"\n",
"def add_text(text):\n",
" '''텍스트를 ID 리스트로 변환해서 추가하기''' # --- (*5)\n",
" ids = words_to_ids(tokenize(text))\n",
" id_files.append(ids)\n",
"\n",
"def add_file(path):\n",
" '''텍스트 파일을 학습 전용으로 추가하기''' # --- (*6)\n",
" with open(path, \"r\", encoding=\"utf-8\") as f:\n",
" s = f.read()\n",
" add_text(s)\n",
"\n",
"def calc_files():\n",
" '''추가한 파일 계산하기''' # --- (*7)\n",
" global dt_dic\n",
" result = []\n",
" doc_count = len(id_files)\n",
" dt_dic = {}\n",
" # 단어 출현 횟수 세기 --- (*8)\n",
" for words in id_files:\n",
" used_word = {}\n",
" data = np.zeros(word_dic['_id'])\n",
" for id in words:\n",
" data[id] += 1\n",
" used_word[id] = 1\n",
" # 단어 t가 사용되고 있을 경우 dt_dic의 수를 1 더하기 --- (*9)\n",
" for id in used_word:\n",
" if not(id in dt_dic): dt_dic[id] = 0\n",
" dt_dic[id] += 1\n",
" # 정규화하기 --- (*10)\n",
" data = data / len(words) \n",
" result.append(data)\n",
" # TF-IDF 계산하기 --- (*11)\n",
" for i, doc in enumerate(result):\n",
" for id, v in enumerate(doc):\n",
" idf = np.log(doc_count / dt_dic[id]) + 1\n",
" doc[id] = min([doc[id] * idf, 1.0])\n",
" result[i] = doc\n",
" return result\n",
"\n",
"def save_dic(fname):\n",
" '''사전을 파일로 저장하기''' # --- (*12)\n",
" pickle.dump(\n",
" [word_dic, dt_dic, id_files],\n",
" open(fname, \"wb\"))\n",
"\n",
"def load_dic(fname):\n",
" '''사전 파일 읽어 들이기''' # --- (*13)\n",
" global word_dic, dt_dic, id_files\n",
" n = pickle.load(open(fname, 'rb'))\n",
" word_dic, dt_dic, id_files = n\n",
"\n",
"def calc_text(text):\n",
" ''' 문장을 벡터로 변환하기 ''' # --- ( ※ 14)\n",
" data = np.zeros(word_dic['_id'])\n",
" words = words_to_ids(tokenize(text), False)\n",
" for w in words:\n",
" data[w] += 1\n",
" data = data / len(words)\n",
" for id, v in enumerate(data):\n",
" idf = np.log(len(id_files) / dt_dic[id]) + 1\n",
" data[id] = min([data[id] * idf, 1.0])\n",
" return data\n"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "3DyOJleeRatH"
},
"source": [
"### 모듈 테스트하기 --- ( ※ 15)\n",
"- 아래 코드는 실행시키면 4개의 files가 추가되므로 뒤의 신문기사에서 오류 발생\n",
" if __name__ == '__main__':\n",
" add_text('비')\n",
" add_text('오늘은 비가 내렸어요.') \n",
" add_text('오늘은 더웠지만 오후부터 비가 내렸다.') \n",
" add_text('비가 내리는 일요일이다.') \n",
" print(calc_files())\n",
" print(word_dic)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "lQSnU9Z3kFs8"
},
"source": [
"##신문기사 2400개의 데이터를 코랩에서 사용\n",
"- 구글 드라이브 연결\n",
"- genre.tar.gz 파일 복사\n",
"- 압축풀기 "
]
},
{
"cell_type": "code",
"metadata": {
"id": "SOdM6Xiukvm8"
},
"source": [
"import os\n",
"\n",
"paths = ['./100/','./101/','./103/','./105/',]\n",
"for path in paths:\n",
" files = os.listdir(path)\n",
" for file in files:\n",
" if file.endswith(\".txt\"):\n",
" os.remove(path + file)\n",
" os.rmdir(path)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 53
},
"id": "qajBE3c5lffq",
"outputId": "5b4332e5-83b4-422e-c67c-0143a44410fd"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"path = \"/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/\"\n",
"file = \"genre.tar.gz\"\n",
"\n",
"import shutil\n",
"shutil.copy(path+file, file)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n"
]
},
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'genre.tar.gz'"
]
},
"metadata": {},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "TrQNporUjEic"
},
"source": [
"!tar -zxvf genre.tar.gz"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "IMUxaEUnR8c2"
},
"source": [
"### 리스트 id_files가 비어있음을 확인한 후 진행"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "RugMoCgWKb2P",
"outputId": "ce00b6e6-968d-48b3-e25b-5f7a073dd427"
},
"source": [
"id_files\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[]"
]
},
"metadata": {},
"execution_count": 40
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "Y87Hp3iZuSKt"
},
"source": [
"### 텍스트 분류하기\n",
"- 실행시간 : 16분"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AIhy-gj3ihuR",
"outputId": "3e8f6d45-09e6-4800-bce0-e897cd0ec73a"
},
"source": [
"import os, glob, pickle\n",
"#import tfidf #코랩에서 앞 셀에 정의되어 있어 import 생략 가능\n",
"\n",
"# 변수 초기화\n",
"y = []\n",
"x = []\n",
"\n",
"# 디렉터리 내부의 파일 목록 전체에 대해 처리하기 --- (*1)\n",
"def read_files(path, label):\n",
" print(\"read_files=\", path)\n",
" files = glob.glob(path + \"/*.txt\")\n",
" for f in files:\n",
" if os.path.basename(f) == 'LICENSE.txt': continue\n",
" #tfidf.add_file(f)\n",
" add_file(f) #코랩에서 앞 셀에 정의되어 있어 tfidf 생략 가능\n",
" y.append(label)\n",
"\n",
"# 기사를 넣은 디렉터리 읽어 들이기 --- ( ※ 2)\n",
"#read_files('text/100', 0)\n",
"#read_files('text/101', 1)\n",
"#read_files('text/103', 2)\n",
"#read_files('text/105', 3)\n",
"\n",
"read_files('./100', 0)\n",
"read_files('./101', 1)\n",
"read_files('./103', 2)\n",
"read_files('./105', 3)\n",
"\n",
"\n",
"# TF-IDF 벡터로 변환하기 --- (*3)\n",
"#x = tfidf.calc_files()\n",
"x = calc_files() #코랩에서 앞 셀에 정의되어 있어 tfidf 생략 가능"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"read_files= ./100\n",
"read_files= ./101\n",
"read_files= ./103\n",
"read_files= ./105\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "mHfErRwaW3OV",
"outputId": "b2661337-b242-4701-d32d-3d993130baee"
},
"source": [
"len(x), len(y)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(3197, 3197)"
]
},
"metadata": {},
"execution_count": 42
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "YdmbgweTDJHY"
},
"source": [
"#### 저장하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "ntp37QlNpfM4",
"outputId": "82cb87e7-a6e9-4a06-819a-3cb25701e1f4"
},
"source": [
"# 저장하기 --- (*4)\n",
"pickle.dump([y, x], open('genre.pickle', 'wb'))\n",
"#tfidf.save_dic('text/genre-tdidf.dic')\n",
"save_dic('genre-tdidf.dic') #코랩에서 앞 셀에 정의되어 있어 tfidf 생략 가능\n",
"print('ok')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"ok\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "8OdYGTI0DN1_"
},
"source": [
"#### 구글 드라이브에 저장하기\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "pmAZKVZ6pOAA",
"outputId": "6e5bb66a-30b2-4f17-899a-b9f572544a9e"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"path = \"/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/\"\n",
"\n",
"pickle.dump([y, x], open(path + 'genre.pickle', 'wb'))\n",
"#tfidf.save_dic('text/genre-tdidf.dic')\n",
"save_dic(path + 'genre-tdidf.dic') #코랩에서 앞 셀에 정의되어 있어 tfidf 생략 가능\n",
"print('ok')"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n",
"ok\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-I_upKd7DQ7H"
},
"source": [
"#### 구글 드라이브에서 읽어오기\n"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "CkBmXbjJDUyn",
"outputId": "9e6eb62f-d606-4458-f94a-7d5a2a7a57a6"
},
"source": [
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"path = \"/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/\"\n",
"\n",
"import pickle\n",
"y, x = pickle.load(open(path + 'genre.pickle', 'rb'))\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "-sIuUamHEttA"
},
"source": [
"#### 읽어오기"
]
},
{
"cell_type": "code",
"metadata": {
"id": "hMdyEqzgEiVJ"
},
"source": [
"path = \"./\"\n",
"import pickle\n",
"y, x = pickle.load(open(path + 'genre.pickle', 'rb'))"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "6BY63bECE6qf"
},
"source": [
"## TF-IDF를 나이브 베이즈로 학습시키기\n",
"- 머신러닝 대표 평가 지표 : precision, recall, f1-score\n",
"- 참조 : https://gaussian37.github.io/ml-concept-ml-evaluation/"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "Ece10yE0E-R_",
"outputId": "f9d8ba65-9142-46e5-bd47-18aff3d3936b"
},
"source": [
"import pickle\n",
"from sklearn.naive_bayes import GaussianNB\n",
"from sklearn.model_selection import train_test_split\n",
"import sklearn.metrics as metrics\n",
"import numpy as np\n",
"\n",
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"\n",
"path = \"/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/\"\n",
"\n",
"# TF-IDF 데이터베이스 읽어 들이기 --- (*1)\n",
"data = pickle.load(open(path+\"genre.pickle\", \"rb\"))\n",
"y = data[0] # 레이블\n",
"x = data[1] # TF-IDF\n",
"\n",
"# 학습 전용과 테스트 전용으로 구분하기 --- (*2)\n",
"x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.2)\n",
"\n",
"# 나이브 베이즈로 학습하기 --- (*3)\n",
"model = GaussianNB()\n",
"model.fit(x_train, y_train)\n",
"\n",
"# 평가하고 결과 출력하기 --- (*4)\n",
"y_pred = model.predict(x_test)\n",
"acc = metrics.accuracy_score(y_test, y_pred)\n",
"rep = metrics.classification_report(y_test, y_pred)\n",
"\n",
"print(\"정답률=\", acc)\n",
"print(rep)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n",
"정답률= 0.8265625\n",
" precision recall f1-score support\n",
"\n",
" 0 0.90 0.87 0.88 167\n",
" 1 0.86 0.74 0.80 164\n",
" 2 0.76 0.90 0.82 165\n",
" 3 0.80 0.80 0.80 144\n",
"\n",
" accuracy 0.83 640\n",
" macro avg 0.83 0.83 0.83 640\n",
"weighted avg 0.83 0.83 0.83 640\n",
"\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "r-kxf8DuasU8"
},
"source": [
"## 딥러닝으로 정답률 개선하기\n",
"- scikit-learn에서 딥러닝으로 변경하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uioQwBScayCL",
"outputId": "62e3caac-8a33-45b2-e28b-50feeae9aa49"
},
"source": [
"import pickle\n",
"from sklearn.model_selection import train_test_split\n",
"import sklearn.metrics as metrics\n",
"import keras\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout\n",
"from tensorflow.keras.optimizers import RMSprop\n",
"import matplotlib.pyplot as plt\n",
"import numpy as np\n",
"import h5py\n",
"\n",
"# 분류할 레이블 수 --- (*1)\n",
"nb_classes = 4\n",
"\n",
"# 데이터베이스 읽어 들이기 --- (*2)\n",
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"path = \"/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/\"\n",
"data = pickle.load(open(path+\"genre.pickle\", \"rb\"))\n",
"y = data[0] # 레이블\n",
"x = data[1] # TF-IDF\n",
"# 레이블 데이터를 One-hot 형식으로 변환하기 --- (*3)\n",
"y = keras.utils.np_utils.to_categorical(y, nb_classes)\n",
"in_size = x[0].shape[0]\n",
"\n",
"# 학습 전용과 테스트 전용으로 구분하기 --- (*4)\n",
"x_train, x_test, y_train, y_test = train_test_split(\n",
" np.array(x), np.array(y), test_size=0.2)\n",
"\n",
"# MLP모델의 구조 정의하기 --- (*5)\n",
"model = Sequential()\n",
"model.add(Dense(512, activation='relu', input_shape=(in_size,)))\n",
"model.add(Dropout(0.2))\n",
"model.add(Dense(512, activation='relu'))\n",
"model.add(Dropout(0.2))\n",
"model.add(Dense(nb_classes, activation='softmax'))\n",
"\n",
"# 모델 컴파일하기 --- (*6)\n",
"model.compile(\n",
" loss='categorical_crossentropy',\n",
" optimizer=RMSprop(),\n",
" metrics=['accuracy'])\n",
"\n",
"# 학습 실행하기 --- (*7)\n",
"hist = model.fit(x_train, y_train,\n",
" batch_size=128, \n",
" epochs=20,\n",
" verbose=1,\n",
" validation_data=(x_test, y_test))\n",
"\n",
"# 평가하기 ---(*8)\n",
"score = model.evaluate(x_test, y_test, verbose=1)\n",
"print(\"정답률=\", score[1], 'loss=', score[0])\n",
"\n",
"# 가중치데이터 저장하기 --- (*9)\n",
"model.save_weights('./genre-model.hdf5')\n",
"model.save_weights(path + 'genre-model.hdf5')\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n",
"Epoch 1/20\n",
"20/20 [==============================] - 7s 318ms/step - loss: 0.8204 - accuracy: 0.7454 - val_loss: 0.4381 - val_accuracy: 0.8500\n",
"Epoch 2/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 0.2233 - accuracy: 0.9320 - val_loss: 0.3576 - val_accuracy: 0.8766\n",
"Epoch 3/20\n",
"20/20 [==============================] - 6s 292ms/step - loss: 0.0966 - accuracy: 0.9699 - val_loss: 0.3964 - val_accuracy: 0.8797\n",
"Epoch 4/20\n",
"20/20 [==============================] - 6s 288ms/step - loss: 0.0437 - accuracy: 0.9879 - val_loss: 0.3888 - val_accuracy: 0.8813\n",
"Epoch 5/20\n",
"20/20 [==============================] - 6s 287ms/step - loss: 0.0193 - accuracy: 0.9957 - val_loss: 0.4514 - val_accuracy: 0.8734\n",
"Epoch 6/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 0.0087 - accuracy: 0.9977 - val_loss: 0.4943 - val_accuracy: 0.8828\n",
"Epoch 7/20\n",
"20/20 [==============================] - 6s 287ms/step - loss: 0.0070 - accuracy: 0.9988 - val_loss: 0.5501 - val_accuracy: 0.8797\n",
"Epoch 8/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 0.0050 - accuracy: 0.9973 - val_loss: 0.5541 - val_accuracy: 0.8734\n",
"Epoch 9/20\n",
"20/20 [==============================] - 6s 306ms/step - loss: 0.0038 - accuracy: 0.9980 - val_loss: 0.5626 - val_accuracy: 0.8828\n",
"Epoch 10/20\n",
"20/20 [==============================] - 6s 288ms/step - loss: 0.0016 - accuracy: 0.9992 - val_loss: 0.6297 - val_accuracy: 0.8859\n",
"Epoch 11/20\n",
"20/20 [==============================] - 6s 287ms/step - loss: 0.0027 - accuracy: 0.9992 - val_loss: 0.6161 - val_accuracy: 0.8734\n",
"Epoch 12/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 0.0028 - accuracy: 0.9984 - val_loss: 0.6291 - val_accuracy: 0.8781\n",
"Epoch 13/20\n",
"20/20 [==============================] - 6s 290ms/step - loss: 7.0816e-04 - accuracy: 1.0000 - val_loss: 0.6514 - val_accuracy: 0.8828\n",
"Epoch 14/20\n",
"20/20 [==============================] - 6s 288ms/step - loss: 0.0012 - accuracy: 0.9992 - val_loss: 0.6843 - val_accuracy: 0.8766\n",
"Epoch 15/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 8.6862e-04 - accuracy: 0.9996 - val_loss: 0.6957 - val_accuracy: 0.8781\n",
"Epoch 16/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 3.6986e-04 - accuracy: 1.0000 - val_loss: 0.7144 - val_accuracy: 0.8828\n",
"Epoch 17/20\n",
"20/20 [==============================] - 6s 288ms/step - loss: 3.6865e-04 - accuracy: 1.0000 - val_loss: 0.7641 - val_accuracy: 0.8750\n",
"Epoch 18/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 9.1236e-04 - accuracy: 0.9996 - val_loss: 0.8215 - val_accuracy: 0.8844\n",
"Epoch 19/20\n",
"20/20 [==============================] - 6s 289ms/step - loss: 2.1180e-04 - accuracy: 1.0000 - val_loss: 0.7527 - val_accuracy: 0.8766\n",
"Epoch 20/20\n",
"20/20 [==============================] - 6s 288ms/step - loss: 3.1430e-04 - accuracy: 1.0000 - val_loss: 0.7906 - val_accuracy: 0.8750\n",
"20/20 [==============================] - 1s 32ms/step - loss: 0.7906 - accuracy: 0.8750\n",
"정답률= 0.875 loss= 0.7905627489089966\n"
]
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 281
},
"id": "xiIZ3GAobbX1",
"outputId": "7e014aec-9085-4bc4-fdcf-cfb2679b769e"
},
"source": [
"# 학습 상태를 그래프로 그리기 --- (*10)\n",
"plt.plot(hist.history['accuracy'])\n",
"plt.plot(hist.history['val_accuracy'])\n",
"plt.title('Accuracy')\n",
"plt.legend(['train', 'test'], loc='upper left')\n",
"plt.show()"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0j9bBua6c7-j"
},
"source": [
"## 직접 문장을 지정해 판정하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "FRg8FXNFdkwE",
"outputId": "6a1517ca-69ec-4583-967c-09af99229dd6"
},
"source": [
"import pickle\n",
"import numpy as np\n",
"import keras\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout\n",
"from tensorflow.keras.optimizers import RMSprop\n",
"from keras.models import model_from_json\n",
"\n",
"# 텍스트 준비하기 --- ( ※ 1)\n",
"text1 = \"\"\"\n",
"대통령이 북한과 관련된 이야기로 한미 정상회담을 준비하고 있습니다.\n",
"\"\"\"\n",
"text2 = \"\"\"\n",
"iPhone과 iPad를 모두 가지고 다니므로 USB를 2개 연결할 수 있는 휴대용 배터리를 선호합니다.\n",
"\"\"\"\n",
"text3 = \"\"\"\n",
"이번 주에는 미세먼지가 많을 것으로 예상되므로 노약자는 외출을 자제하는 것이 좋습니다.\n",
"\"\"\"\n",
"\n",
"# TF-IDF 사전 읽어 들이기 --- (*2)\n",
"from google.colab import drive\n",
"drive.mount('/gdrive', force_remount=True)\n",
"path = \"/gdrive/My Drive/Colab Notebooks/파이썬을 이용한 머신러닝,딥러닝 실전앱개발/\"\n",
"#tfidf.load_dic(path + \"genre-tdidf.dic\")\n",
"load_dic(path + \"genre-tdidf.dic\")\n",
"\n",
"# Keras 모델 정의하고 가중치 데이터 읽어 들이기 --- (*3)\n",
"nb_classes = 4\n",
"model = Sequential()\n",
"#model.add(Dense(512, activation='relu', input_shape=(52800,)))\n",
"model.add(Dense(512, activation='relu', input_shape=(36120,)))\n",
"model.add(Dropout(0.2))\n",
"model.add(Dense(512, activation='relu'))\n",
"model.add(Dropout(0.2))\n",
"model.add(Dense(nb_classes, activation='softmax'))\n",
"model.compile(\n",
" loss='categorical_crossentropy',\n",
" optimizer=RMSprop(),\n",
" metrics=['accuracy'])\n",
"model.load_weights(path + 'genre-model.hdf5')\n",
"\n",
"# 텍스트 지정해서 판별하기 --- (*4)\n",
"def check_genre(text):\n",
" # 레이블 정의하기\n",
" LABELS = [\"정치\", \"경제\", \"생활 \", \"IT/과학\"]\n",
" # TF-IDF 벡터로 변환하기 -- (*5)\n",
"# data = tfidf.calc_text(text)\n",
" data = calc_text(text)\n",
" # MLP로 예측하기 --- (*6)\n",
" pre = model.predict(np.array([data]))[0]\n",
" n = pre.argmax()\n",
" print(LABELS[n], \"(\", pre[n], \")\")\n",
" return LABELS[n], float(pre[n]), int(n) \n",
"\n",
"if __name__ == '__main__':\n",
" check_genre(text1)\n",
" check_genre(text2)\n",
" check_genre(text3)\n",
"\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Mounted at /gdrive\n",
"정치 ( 1.0 )\n",
"IT/과학 ( 0.9999517 )\n",
"생활 ( 0.99997985 )\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ZZekGxjDf55H"
},
"source": [
"# 6-4 웹에서 사용할 수 있는 뉴스 카테고리 판정 애플리케이션 만들기\n",
"- 콘솔에서 작업해야 하고\n",
"- 버전 오류로 추후 작업"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "fvUSlVnn6Jtp"
},
"source": [
"#6-5 머신러닝에 데이터베이스(RDBMS) 사용하기"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "HdwgqZqtXs1U"
},
"source": [
"## sqlite DB에 테이블 만들기"
]
},
{
"cell_type": "code",
"metadata": {
"id": "IZbqDPnMgCgL"
},
"source": [
"import sqlite3\n",
"\n",
"dbpath = \"./hw.sqlite3\"\n",
"sql = '''\n",
" CREATE TABLE IF NOT EXISTS person (\n",
" id INTEGER PRIMARY KEY,\n",
" height NUMBER,\n",
" weight NUMBER,\n",
" typeNo INTEGER\n",
" )\n",
"'''\n",
"with sqlite3.connect(dbpath) as conn:\n",
" conn.execute(sql)"
],
"execution_count": null,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "1KHjJJthYAmU"
},
"source": [
"## sqlite DB에 데이터 입력하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "-QY5qSvlYE7k",
"outputId": "1dc6d09e-4b29-45d3-c8ec-242f84224d0a"
},
"source": [
"import sqlite3\n",
"import random\n",
"\n",
"dbpath = \"./hw.sqlite3\"\n",
"\n",
"def insert_db(conn):\n",
" # 더미 데이터 만들기 --- (*1)\n",
" height = random.randint(130, 180)\n",
" weight = random.randint(30, 100)\n",
" # 더미 데이터를 기반으로 체형 데이터 생성하기 --- (*2)\n",
" type_no = 1\n",
" bmi = weight / (height / 100) ** 2\n",
" if bmi < 18.5:\n",
" type_no = 0\n",
" elif bmi < 25:\n",
" type_no = 1\n",
" elif bmi < 30:\n",
" type_no = 2\n",
" elif bmi < 35:\n",
" type_no = 3\n",
" elif bmi < 40:\n",
" type_no = 4\n",
" else:\n",
" type_no = 5\n",
" # 데이터베이스에 저장하기 --- (*3)\n",
" sql = '''\n",
" INSERT INTO person (height, weight, typeNo) \n",
" VALUES (?,?,?)\n",
" '''\n",
" values = (height,weight, type_no)\n",
" print(values)\n",
" conn.executemany(sql,[values])\n",
"\n",
"# 100개의 데이터 삽입하기\n",
"with sqlite3.connect(dbpath) as conn:\n",
" # 데이터 100개 삽입하기 --- (*4)\n",
" for i in range(100):\n",
" insert_db(conn)\n",
" # 확인하기 --- (*5)\n",
" c = conn.execute('SELECT count(*) FROM person')\n",
" cnt = c.fetchone()\n",
" print(cnt[0])\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"(173, 57, 1)\n",
"(171, 52, 0)\n",
"(159, 70, 2)\n",
"(141, 53, 2)\n",
"(174, 64, 1)\n",
"(132, 80, 5)\n",
"(132, 34, 1)\n",
"(173, 58, 1)\n",
"(151, 78, 3)\n",
"(151, 78, 3)\n",
"(151, 40, 0)\n",
"(164, 59, 1)\n",
"(178, 40, 0)\n",
"(171, 51, 0)\n",
"(158, 70, 2)\n",
"(164, 63, 1)\n",
"(137, 76, 5)\n",
"(148, 64, 2)\n",
"(143, 58, 2)\n",
"(130, 85, 5)\n",
"(147, 41, 1)\n",
"(136, 43, 1)\n",
"(175, 84, 2)\n",
"(158, 86, 3)\n",
"(131, 78, 5)\n",
"(167, 91, 3)\n",
"(153, 91, 4)\n",
"(178, 59, 1)\n",
"(140, 55, 2)\n",
"(159, 40, 0)\n",
"(149, 73, 3)\n",
"(166, 64, 1)\n",
"(135, 57, 3)\n",
"(164, 87, 3)\n",
"(179, 91, 2)\n",
"(144, 55, 2)\n",
"(146, 59, 2)\n",
"(165, 54, 1)\n",
"(175, 46, 0)\n",
"(160, 88, 3)\n",
"(158, 43, 0)\n",
"(171, 69, 1)\n",
"(145, 91, 5)\n",
"(179, 76, 1)\n",
"(158, 85, 3)\n",
"(145, 94, 5)\n",
"(158, 35, 0)\n",
"(131, 46, 2)\n",
"(139, 67, 3)\n",
"(145, 97, 5)\n",
"(145, 66, 3)\n",
"(164, 64, 1)\n",
"(145, 70, 3)\n",
"(151, 94, 5)\n",
"(159, 89, 4)\n",
"(156, 55, 1)\n",
"(131, 90, 5)\n",
"(166, 99, 4)\n",
"(146, 70, 3)\n",
"(165, 38, 0)\n",
"(180, 80, 1)\n",
"(179, 89, 2)\n",
"(148, 99, 5)\n",
"(155, 76, 3)\n",
"(168, 68, 1)\n",
"(169, 44, 0)\n",
"(169, 63, 1)\n",
"(137, 90, 5)\n",
"(171, 46, 0)\n",
"(152, 34, 0)\n",
"(149, 47, 1)\n",
"(140, 61, 3)\n",
"(158, 65, 2)\n",
"(148, 74, 3)\n",
"(176, 42, 0)\n",
"(135, 91, 5)\n",
"(177, 85, 2)\n",
"(167, 73, 2)\n",
"(154, 83, 3)\n",
"(148, 100, 5)\n",
"(150, 86, 4)\n",
"(171, 90, 3)\n",
"(177, 100, 3)\n",
"(159, 55, 1)\n",
"(170, 96, 3)\n",
"(177, 41, 0)\n",
"(130, 76, 5)\n",
"(177, 69, 1)\n",
"(139, 77, 4)\n",
"(147, 42, 1)\n",
"(142, 51, 2)\n",
"(145, 41, 1)\n",
"(155, 76, 3)\n",
"(163, 49, 0)\n",
"(180, 75, 1)\n",
"(151, 68, 2)\n",
"(146, 53, 1)\n",
"(179, 88, 2)\n",
"(169, 50, 0)\n",
"(148, 57, 2)\n",
"100\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ywOB9E2_YOs8"
},
"source": [
"## 키, 체중, 체형 학습하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "AWtp6d8aYSXN",
"outputId": "7d962a11-f59b-4a17-a855-aca402ab5c07"
},
"source": [
"import keras\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout\n",
"from tensorflow.keras.optimizers import RMSprop\n",
"\n",
"in_size = 2 # 체중과 키를 입력으로\n",
"nb_classes = 6 # 체형은 6단계로 구별\n",
"\n",
"# MLP모델의 구조 정의하기\n",
"model = Sequential()\n",
"model.add(Dense(512, activation='relu', input_shape=(in_size,)))\n",
"model.add(Dropout(0.5))\n",
"model.add(Dense(nb_classes, activation='softmax'))\n",
"\n",
"# 모델 컴파일하기\n",
"model.compile(\n",
" loss='categorical_crossentropy',\n",
" optimizer=RMSprop(),\n",
" metrics=['accuracy'])\n",
"\n",
"model.save('hw_model.h5')\n",
"print(\"saved\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"saved\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "bMwfmR75Yxp1"
},
"source": [
"## DB에서 값을 읽어 학습하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "EweKriQhYpvE",
"outputId": "30178029-a4f0-4b3b-9cde-894f1e9d6e58"
},
"source": [
"import keras\n",
"from keras.models import load_model\n",
"from keras.utils.np_utils import to_categorical\n",
"import numpy as np\n",
"import sqlite3\n",
"import os\n",
"\n",
"# 데이터베이스에서 데이터 100개 읽어 들이기 --- (*1)\n",
"dbpath = \"./hw.sqlite3\"\n",
"select_sql = \"SELECT * FROM person ORDER BY id DESC LIMIT 100\"\n",
"# 읽어 들인 데이터를 리스트에 추가하기 --- (*2)\n",
"x = []\n",
"y = []\n",
"with sqlite3.connect(dbpath) as conn:\n",
" for row in conn.execute(select_sql):\n",
" id, height, weight, type_no = row\n",
" # 데이터를 정규화하기 --- (*3)\n",
" height = height / 200\n",
" weight = weight / 150\n",
" y.append(type_no)\n",
" x.append(np.array([height, weight]))\n",
"\n",
"# 모델 읽어 들이기 --- (*4)\n",
"model = load_model('hw_model.h5')\n",
"\n",
"# 이미 학습 데이터가 있는 경우 읽어 들이기 --- (*5)\n",
"if os.path.exists('hw_weights.h5'):\n",
" model.load_weights('hw_weights.h5')\n",
"\n",
"nb_classes = 6 # 체형은 6단계로 구별\n",
"y = to_categorical(y, nb_classes) # One-hot 벡터로 변환하기\n",
"\n",
"# 학습하기 --- (*6)\n",
"model.fit(np.array(x), y,\n",
" batch_size=50,\n",
" epochs=100)\n",
"\n",
"# 결과 저장하기 --- (*7)\n",
"model.save_weights('hw_weights.h5')\n",
"\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"Epoch 1/100\n",
"2/2 [==============================] - 1s 8ms/step - loss: 1.8102 - accuracy: 0.0900\n",
"Epoch 2/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.7673 - accuracy: 0.2800\n",
"Epoch 3/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.7499 - accuracy: 0.3000\n",
"Epoch 4/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.7398 - accuracy: 0.2500\n",
"Epoch 5/100\n",
"2/2 [==============================] - 0s 10ms/step - loss: 1.7235 - accuracy: 0.2900\n",
"Epoch 6/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.7292 - accuracy: 0.2400\n",
"Epoch 7/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.7179 - accuracy: 0.2900\n",
"Epoch 8/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.7175 - accuracy: 0.2600\n",
"Epoch 9/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.7044 - accuracy: 0.2300\n",
"Epoch 10/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6993 - accuracy: 0.2700\n",
"Epoch 11/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6972 - accuracy: 0.2700\n",
"Epoch 12/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6986 - accuracy: 0.2700\n",
"Epoch 13/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6756 - accuracy: 0.2400\n",
"Epoch 14/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.6900 - accuracy: 0.2400\n",
"Epoch 15/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.6787 - accuracy: 0.2900\n",
"Epoch 16/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6681 - accuracy: 0.2800\n",
"Epoch 17/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.6769 - accuracy: 0.2700\n",
"Epoch 18/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6572 - accuracy: 0.2700\n",
"Epoch 19/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6685 - accuracy: 0.2800\n",
"Epoch 20/100\n",
"2/2 [==============================] - 0s 9ms/step - loss: 1.6665 - accuracy: 0.2900\n",
"Epoch 21/100\n",
"2/2 [==============================] - 0s 9ms/step - loss: 1.6613 - accuracy: 0.2800\n",
"Epoch 22/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.6565 - accuracy: 0.2900\n",
"Epoch 23/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.6542 - accuracy: 0.2800\n",
"Epoch 24/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6533 - accuracy: 0.2800\n",
"Epoch 25/100\n",
"2/2 [==============================] - 0s 9ms/step - loss: 1.6515 - accuracy: 0.3000\n",
"Epoch 26/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6505 - accuracy: 0.3000\n",
"Epoch 27/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6340 - accuracy: 0.2900\n",
"Epoch 28/100\n",
"2/2 [==============================] - 0s 3ms/step - loss: 1.6530 - accuracy: 0.2900\n",
"Epoch 29/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6493 - accuracy: 0.2800\n",
"Epoch 30/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6416 - accuracy: 0.2700\n",
"Epoch 31/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6340 - accuracy: 0.2900\n",
"Epoch 32/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.6299 - accuracy: 0.2900\n",
"Epoch 33/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6374 - accuracy: 0.2900\n",
"Epoch 34/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6313 - accuracy: 0.2800\n",
"Epoch 35/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6288 - accuracy: 0.3600\n",
"Epoch 36/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.6222 - accuracy: 0.3400\n",
"Epoch 37/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6153 - accuracy: 0.3000\n",
"Epoch 38/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.6144 - accuracy: 0.3500\n",
"Epoch 39/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6030 - accuracy: 0.3800\n",
"Epoch 40/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6114 - accuracy: 0.3400\n",
"Epoch 41/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6092 - accuracy: 0.3300\n",
"Epoch 42/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6095 - accuracy: 0.3300\n",
"Epoch 43/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6095 - accuracy: 0.3200\n",
"Epoch 44/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6030 - accuracy: 0.3700\n",
"Epoch 45/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.6091 - accuracy: 0.3400\n",
"Epoch 46/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.6019 - accuracy: 0.3100\n",
"Epoch 47/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.5766 - accuracy: 0.3700\n",
"Epoch 48/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.5911 - accuracy: 0.3700\n",
"Epoch 49/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5911 - accuracy: 0.3500\n",
"Epoch 50/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5820 - accuracy: 0.3600\n",
"Epoch 51/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5691 - accuracy: 0.3600\n",
"Epoch 52/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5629 - accuracy: 0.3900\n",
"Epoch 53/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5712 - accuracy: 0.3400\n",
"Epoch 54/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5642 - accuracy: 0.3800\n",
"Epoch 55/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5683 - accuracy: 0.3500\n",
"Epoch 56/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5630 - accuracy: 0.4100\n",
"Epoch 57/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5581 - accuracy: 0.3700\n",
"Epoch 58/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5495 - accuracy: 0.3700\n",
"Epoch 59/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5441 - accuracy: 0.4200\n",
"Epoch 60/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5447 - accuracy: 0.3500\n",
"Epoch 61/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.5363 - accuracy: 0.3600\n",
"Epoch 62/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.5434 - accuracy: 0.3800\n",
"Epoch 63/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5348 - accuracy: 0.4000\n",
"Epoch 64/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5346 - accuracy: 0.3800\n",
"Epoch 65/100\n",
"2/2 [==============================] - 0s 9ms/step - loss: 1.5393 - accuracy: 0.3800\n",
"Epoch 66/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.5240 - accuracy: 0.4200\n",
"Epoch 67/100\n",
"2/2 [==============================] - 0s 11ms/step - loss: 1.5110 - accuracy: 0.4200\n",
"Epoch 68/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.5150 - accuracy: 0.3900\n",
"Epoch 69/100\n",
"2/2 [==============================] - 0s 10ms/step - loss: 1.5031 - accuracy: 0.4100\n",
"Epoch 70/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.5057 - accuracy: 0.4100\n",
"Epoch 71/100\n",
"2/2 [==============================] - 0s 9ms/step - loss: 1.4972 - accuracy: 0.4400\n",
"Epoch 72/100\n",
"2/2 [==============================] - 0s 17ms/step - loss: 1.4966 - accuracy: 0.4100\n",
"Epoch 73/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.4998 - accuracy: 0.4000\n",
"Epoch 74/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.5072 - accuracy: 0.4300\n",
"Epoch 75/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.5017 - accuracy: 0.4000\n",
"Epoch 76/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4846 - accuracy: 0.4100\n",
"Epoch 77/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4822 - accuracy: 0.3800\n",
"Epoch 78/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4723 - accuracy: 0.4200\n",
"Epoch 79/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4755 - accuracy: 0.4000\n",
"Epoch 80/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4819 - accuracy: 0.3600\n",
"Epoch 81/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4739 - accuracy: 0.4300\n",
"Epoch 82/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4784 - accuracy: 0.4000\n",
"Epoch 83/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.4562 - accuracy: 0.4400\n",
"Epoch 84/100\n",
"2/2 [==============================] - 0s 6ms/step - loss: 1.4505 - accuracy: 0.4200\n",
"Epoch 85/100\n",
"2/2 [==============================] - 0s 7ms/step - loss: 1.4425 - accuracy: 0.4700\n",
"Epoch 86/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4608 - accuracy: 0.4000\n",
"Epoch 87/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4614 - accuracy: 0.4400\n",
"Epoch 88/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4396 - accuracy: 0.4100\n",
"Epoch 89/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4385 - accuracy: 0.4400\n",
"Epoch 90/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4416 - accuracy: 0.4300\n",
"Epoch 91/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.4369 - accuracy: 0.4300\n",
"Epoch 92/100\n",
"2/2 [==============================] - 0s 9ms/step - loss: 1.4261 - accuracy: 0.4200\n",
"Epoch 93/100\n",
"2/2 [==============================] - 0s 8ms/step - loss: 1.4423 - accuracy: 0.4300\n",
"Epoch 94/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4230 - accuracy: 0.4600\n",
"Epoch 95/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4228 - accuracy: 0.4500\n",
"Epoch 96/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4027 - accuracy: 0.4600\n",
"Epoch 97/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4114 - accuracy: 0.4600\n",
"Epoch 98/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.4030 - accuracy: 0.4700\n",
"Epoch 99/100\n",
"2/2 [==============================] - 0s 5ms/step - loss: 1.4049 - accuracy: 0.4400\n",
"Epoch 100/100\n",
"2/2 [==============================] - 0s 4ms/step - loss: 1.3888 - accuracy: 0.4500\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_7Uo03AnZH1U"
},
"source": [
"## 정답률 확인하기(임의의 데이터로 테스트)"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "owF0feuQYXXz",
"outputId": "c66852e6-3ddd-4a49-a0d3-79c96caa420f"
},
"source": [
"from keras.models import load_model\n",
"import numpy as np\n",
"\n",
"# 학습하기모델 읽어 들이기 --- (*1)\n",
"model = load_model('hw_model.h5')\n",
"# 학습한 데이터 읽어 들이기 --- (*2)\n",
"model.load_weights('hw_weights.h5')\n",
"# 레이블\n",
"LABELS = [\n",
" '저체중', '표준 체중 ', '1비만(1도)',\n",
" '비만(2도)', '비만(3도)', '비만(4도)' \n",
"]\n",
"\n",
"# 테스트 데이터 지정하기 --- (*3)\n",
"height = 160\n",
"weight = 50\n",
"# 정규화하기 --- (*4)\n",
"test_x = [height / 200, weight / 150]\n",
"# 예측하기 --- (*5)\n",
"pre = model.predict(np.array([test_x]))\n",
"idx = pre[0].argmax()\n",
"print(LABELS[idx], '/ 가능성', pre[0][idx])\n",
"\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"표준 체중 / 가능성 0.33491102\n"
]
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MAYcUAcBZlAs"
},
"source": [
"### 분류 정답률 확인하기"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "C6i-_ZwWZoEb",
"outputId": "8871f022-fb60-48ca-d786-860f18714fa8"
},
"source": [
"from keras.models import load_model\n",
"import numpy as np\n",
"import random\n",
"from keras.utils.np_utils import to_categorical\n",
"\n",
"# 학습하기모델 읽어 들이기 --- (*1)\n",
"model = load_model('hw_model.h5')\n",
"# 학습한 데이터 읽어 들이기 --- (*2)\n",
"model.load_weights('hw_weights.h5')\n",
"\n",
"# 정답 데이터를 1000개 만들기 --- (*3)\n",
"x = []\n",
"y = []\n",
"for i in range(1000):\n",
" h = random.randint(130, 180)\n",
" w = random.randint(30, 100)\n",
" bmi = w / ((h / 100) ** 2)\n",
" type_no = 1\n",
" if bmi < 18.5:\n",
" type_no = 0\n",
" elif bmi < 25:\n",
" type_no = 1\n",
" elif bmi < 30:\n",
" type_no = 2\n",
" elif bmi < 35:\n",
" type_no = 3\n",
" elif bmi < 40:\n",
" type_no = 4\n",
" else:\n",
" type_no = 5\n",
" x.append(np.array([h / 200, w / 150]))\n",
" y.append(type_no)\n",
"\n",
"# 형식 변환하기 --- (*4)\n",
"x = np.array(x)\n",
"y = to_categorical(y, 6)\n",
"# 정답률 확인하기 --- (*5)\n",
"score = model.evaluate(x, y, verbose=1)\n",
"print(\"정답률=\", score[1], \"손실 =\", score[0])\n",
"\n"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
"32/32 [==============================] - 0s 1ms/step - loss: 1.4534 - accuracy: 0.3600\n",
"정답률= 0.36000001430511475 손실 = 1.4534181356430054\n"
]
}
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment