Skip to content

Instantly share code, notes, and snippets.

@KyoungHa-Park
Last active October 19, 2018 02:29
Show Gist options
  • Save KyoungHa-Park/f0fb9235c0f683b0c5df735fa1239754 to your computer and use it in GitHub Desktop.
Save KyoungHa-Park/f0fb9235c0f683b0c5df735fa1239754 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "SBNsM6Y6pfqP",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Team 비타민(Btamin) 중간발표"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "PnlgE5clXgmE",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"**한국 인공지능 연구소**\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "y6oU844h7MW_",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"\n",
"![](https://static.wixstatic.com/media/a27d24_352f25e117c849fa90516caded3945bb~mv2_d_6000_4000_s_4_2.jpg/v1/fill/w_740,h_493,al_c,q_90,usm_0.66_1.00_0.01/a27d24_352f25e117c849fa90516caded3945bb~mv2_d_6000_4000_s_4_2.webp)\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "6FFszGZmx_fB",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"**팀명** : 비타민(Btamin, AURA 계승)\n",
"\n",
"**목표**\n",
"+ 한글 문장이 주어지면, 이를 감정으로 분류 한다(mulit-classification 문제)\n",
"+ 개별 단어(token)별로, 감정에 대한 Scroeing을 진행한다.\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "iJFKFR05lOVV",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"**연구원**(가나다 순) : "
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1711,
"status": "ok",
"timestamp": 1539828750189,
"user": {
"displayName": "박경하",
"photoUrl": "",
"userId": "14970276928355419245"
},
"user_tz": -540
},
"id": "-pULohKRx9R7",
"outputId": "6d832b7c-4e4a-4e40-deab-5c8dde7af3a1",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>1.이름</th>\n",
" <th>2.연구분야(NLP)</th>\n",
" <th>3.업종/분야</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>김현우</td>\n",
" <td>감정분석(sentence)</td>\n",
" <td>알고리즘 연구</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>박경하</td>\n",
" <td>감정분석(word)</td>\n",
" <td>Data 분석</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>오윤석</td>\n",
" <td>감정분석(word)</td>\n",
" <td>웹 개발</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>임동조</td>\n",
" <td>딥러닝 Basic 정리</td>\n",
" <td>켄텐츠 발굴/제작</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>허형완</td>\n",
" <td>감정분석(sentence)</td>\n",
" <td>클라이언트 개발</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 1.이름 2.연구분야(NLP) 3.업종/분야\n",
"0 김현우 감정분석(sentence) 알고리즘 연구\n",
"1 박경하 감정분석(word) Data 분석\n",
"2 오윤석 감정분석(word) 웹 개발\n",
"3 임동조 딥러닝 Basic 정리 켄텐츠 발굴/제작\n",
"4 허형완 감정분석(sentence) 클라이언트 개발"
]
},
"execution_count": 1,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd \n",
"member = {'1.이름' : ['김현우', '박경하', '오윤석', '임동조','허형완'],\n",
" '2.연구분야(NLP)' : ['감정분석(sentence)', '감정분석(word)','감정분석(word)', '딥러닝 Basic 정리', '감정분석(sentence)'],\n",
" '3.업종/분야' : ['IT', '식품', 'IT', '교육','IT'],\n",
" '3.업종/분야' : ['알고리즘 연구', 'Data 분석', '웹 개발', '켄텐츠 발굴/제작','클라이언트 개발']\n",
" }\n",
"pd.DataFrame(member)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "xCk5CFTf1ETj",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"**진행현황** : \n",
"<table> \n",
" <tbody> \n",
" <tr> \n",
" <td style=\"background-color: #ccc\">일자</td> \n",
" <td style=\"background-color: #ccc\">논의내용</td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-09-15</strong></td> \n",
" <td > \n",
" <ul> 명사구에 대한 분류</ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-09-29</strong></td> \n",
" <td > \n",
" <ul> (연휴) </ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-10-08</strong></td> \n",
" <td > \n",
" <ul> 1차 결과물 공유/추가내용 발굴</ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-10-20</strong></td> \n",
" <td > \n",
" <ul> 중간발표</ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-11-03</strong></td> \n",
" <td > \n",
" <ul> </ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-11-17</strong></td> \n",
" <td > \n",
" <ul> </ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-12-01\t\t</strong></td> \n",
" <td > \n",
" <ul> </ul>\n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>2018-12-15\t</strong></td> \n",
" <td > \n",
" <ul> </ul>\n",
" </td> \n",
" </tr> "
]
},
{
"cell_type": "raw",
"metadata": {
"colab_type": "text",
"id": "LlKrKyzccygU",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"<history>\n",
"- 팀원 소개 : https://www.ai-lab.kr/labs/btamin-raebjang-heohyeongwan/team-btamin\n",
"- 모임-1일차 : https://www.ai-lab.kr/labs/btamin-raebjang-gimhyeonu/team-btamin-1st-meeting-summary\n",
"- 모임-2일차 : https://www.ai-lab.kr/labs/btamin-raebjang-heohyeongwan/team-btamin-2nd-meeting-summary"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "canLMslDpGP4",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# 주제1 : \"단어\" 기반의 감정분류(미시적 접근)\n",
"\n",
"1. idea : 감정분석이 된 단어사전을 적용해보자\n",
"2. 접근 : Kaggle (영문)자료를 이용함\n",
"3. 모형 : 7개의 감정으로 분류된 data를 찾을 수 있음 \n",
"4. 결과 : 단어사전이 작은 관계로, 결과가 좋지 않음\n",
"5. 활용 : 2차원으로 시각화 / 다른 감정사전에 대한 확보"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "oUArfgeY1ETp",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 1.idea : 감정분석이 된 단어사전을 적용해보자\n",
"\n",
"+ 영어의 경우, 감정분석이 가능하다.\n",
"+ 감정분석된 단어(사전)을 자료를 이용하여, 이를 문장에 scoring 한 다음, 그 결과를 확인하자\n",
"+ [NLP] 분류 문제에서 각 감정(target)는 독립적(상호 배태적)이여야 한다."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1330,
"status": "ok",
"timestamp": 1539828790788,
"user": {
"displayName": "박경하",
"photoUrl": "",
"userId": "14970276928355419245"
},
"user_tz": -540
},
"id": "Jt4Qx0uk1ETs",
"outputId": "589ff85c-1c81-4ab5-9a9b-8ebd90b23507",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>word</th>\n",
" <th>disgust</th>\n",
" <th>surprise</th>\n",
" <th>neutral</th>\n",
" <th>anger</th>\n",
" <th>sad</th>\n",
" <th>happy</th>\n",
" <th>fear</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1099</th>\n",
" <td>wrong</td>\n",
" <td>0.001244</td>\n",
" <td>0.004231</td>\n",
" <td>0.000083</td>\n",
" <td>0.116725</td>\n",
" <td>0.009209</td>\n",
" <td>0.002904</td>\n",
" <td>0.008877</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1100</th>\n",
" <td>yeah</td>\n",
" <td>0.001401</td>\n",
" <td>0.037815</td>\n",
" <td>0.001401</td>\n",
" <td>0.021008</td>\n",
" <td>0.026611</td>\n",
" <td>0.032213</td>\n",
" <td>0.029412</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1101</th>\n",
" <td>yesterday</td>\n",
" <td>0.002841</td>\n",
" <td>0.028003</td>\n",
" <td>0.000406</td>\n",
" <td>0.015016</td>\n",
" <td>0.032062</td>\n",
" <td>0.042614</td>\n",
" <td>0.023945</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1102</th>\n",
" <td>zero</td>\n",
" <td>0.004762</td>\n",
" <td>0.033333</td>\n",
" <td>0.004762</td>\n",
" <td>0.033333</td>\n",
" <td>0.033333</td>\n",
" <td>0.004762</td>\n",
" <td>0.052381</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1103</th>\n",
" <td>zoom</td>\n",
" <td>0.017857</td>\n",
" <td>0.053571</td>\n",
" <td>0.017857</td>\n",
" <td>0.017857</td>\n",
" <td>0.017857</td>\n",
" <td>0.089286</td>\n",
" <td>0.017857</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" word disgust surprise neutral anger sad happy \\\n",
"1099 wrong 0.001244 0.004231 0.000083 0.116725 0.009209 0.002904 \n",
"1100 yeah 0.001401 0.037815 0.001401 0.021008 0.026611 0.032213 \n",
"1101 yesterday 0.002841 0.028003 0.000406 0.015016 0.032062 0.042614 \n",
"1102 zero 0.004762 0.033333 0.004762 0.033333 0.033333 0.004762 \n",
"1103 zoom 0.017857 0.053571 0.017857 0.017857 0.017857 0.089286 \n",
"\n",
" fear \n",
"1099 0.008877 \n",
"1100 0.029412 \n",
"1101 0.023945 \n",
"1102 0.052381 \n",
"1103 0.017857 "
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"##모듈 호출\n",
"import pandas as pd\n",
"import numpy as np\n",
"import nltk\n",
"\n",
"# data read : 감정사전\n",
"prob = pd.read_csv('./Andbrain_DataSet.csv')\n",
"prob.tail()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "iVcVVcQ8p0FJ",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2.접근 : Kaggle (영문)자료를 이용함\n",
"\n",
"+ 사전이 영문이므로, 우선은 영문 문장을 이용하고자 함\n",
"+ 이후, 영어->한글 번역을 통해 진행해 보자"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 267
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1339,
"status": "ok",
"timestamp": 1539828989584,
"user": {
"displayName": "박경하",
"photoUrl": "",
"userId": "14970276928355419245"
},
"user_tz": -540
},
"id": "iY_MK7mQ1ET1",
"outputId": "032970db-3aeb-4cc9-988a-9134a816dbdb",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>hmid</th>\n",
" <th>wid</th>\n",
" <th>reflection_period</th>\n",
" <th>original_hm</th>\n",
" <th>cleaned_hm</th>\n",
" <th>modified</th>\n",
" <th>num_sentence</th>\n",
" <th>ground_truth_category</th>\n",
" <th>predicted_category</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>100533</th>\n",
" <td>128765</td>\n",
" <td>1629</td>\n",
" <td>24h</td>\n",
" <td>I had a great meeting yesterday at work with m...</td>\n",
" <td>I had a great meeting yesterday at work with m...</td>\n",
" <td>True</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>bonding</td>\n",
" </tr>\n",
" <tr>\n",
" <th>100534</th>\n",
" <td>128766</td>\n",
" <td>141</td>\n",
" <td>24h</td>\n",
" <td>I had a great workout last night.</td>\n",
" <td>I had a great workout last night.</td>\n",
" <td>True</td>\n",
" <td>1</td>\n",
" <td>NaN</td>\n",
" <td>exercise</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" hmid wid reflection_period \\\n",
"100533 128765 1629 24h \n",
"100534 128766 141 24h \n",
"\n",
" original_hm \\\n",
"100533 I had a great meeting yesterday at work with m... \n",
"100534 I had a great workout last night. \n",
"\n",
" cleaned_hm modified \\\n",
"100533 I had a great meeting yesterday at work with m... True \n",
"100534 I had a great workout last night. True \n",
"\n",
" num_sentence ground_truth_category predicted_category \n",
"100533 1 NaN bonding \n",
"100534 1 NaN exercise "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### data read : 분석대상 문장\n",
"sentense = pd.read_csv('./cleaned_hm.csv')\n",
"sentense.tail(2)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 71
},
"colab_type": "code",
"executionInfo": {
"elapsed": 1033,
"status": "ok",
"timestamp": 1539828993857,
"user": {
"displayName": "박경하",
"photoUrl": "",
"userId": "14970276928355419245"
},
"user_tz": -540
},
"id": "h3mVsXfa1ET7",
"outputId": "13be3772-a681-41a4-eccd-780ff7f62f9a",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/plain": [
"100533 I had a great meeting yesterday at work with m...\n",
"100534 I had a great workout last night.\n",
"Name: cleaned_hm, dtype: object"
]
},
"execution_count": 3,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"### 관련된 colunm만 추출\n",
"s1 = sentense['cleaned_hm']\n",
"s1.tail(2)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "IX7Vmoac1EUA",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 3.모형 접근1 : 기존 단어집을 이용한 Scoring함\n",
"+ 접근 : 문장내 사전의 단어집을 활용하고, 각 수치를 Scoring함(평균, 합계 등) \n",
"+ 결과 : 좋지 않음(현재 별도 Code 미확인됨)\n",
"+ 원인 : 감정사전이 작아, 문장을 Cover하지 못함\n",
"![test](https://i.imgur.com/NcE7HjA.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Kc1W3ekx7VOo",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 3.모형 접근1 : Word2Vec를 통해 Scoring\n",
"\n",
"+ 결과 : 좋지 않음(현재 별도 Code 미확인됨)\n",
"+ 원인 : 감정사전이 작아, 문장을 Cover하지 못함"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "uXEdmRHjZoMo",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"#### 1.데이터 전처리"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 107
},
"colab_type": "code",
"executionInfo": {
"elapsed": 5810,
"status": "ok",
"timestamp": 1539828807018,
"user": {
"displayName": "박경하",
"photoUrl": "",
"userId": "14970276928355419245"
},
"user_tz": -540
},
"id": "DzD-vYQ58yNw",
"outputId": "84b9513a-8b81-41d8-9eb9-2490c0687d3e",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Requirement already satisfied: nltk in /usr/local/lib/python3.6/dist-packages (3.2.5)\n",
"Requirement already satisfied: six in /usr/local/lib/python3.6/dist-packages (from nltk) (1.11.0)\n",
"[nltk_data] Downloading package stopwords to /root/nltk_data...\n",
"[nltk_data] Unzipping corpora/stopwords.zip.\n"
]
},
{
"data": {
"text/plain": [
"True"
]
},
"execution_count": 4,
"metadata": {
"tags": []
},
"output_type": "execute_result"
}
],
"source": [
"# !pip install nltk\n",
"# nltk.download('stopwords')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "Sua9EtIJYWMy",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"##모듈 호출\n",
"import pandas as pd\n",
"import numpy as np\n",
"import nltk\n",
"\n",
"### data read\n",
"sentense = pd.read_csv('./cleaned_hm.csv')\n",
"# sentense.head(2)\n",
"\n",
"\n",
"### 관련된 colunm만 추출\n",
"s1 = sentense['cleaned_hm']\n",
"# s1.head(2)\n",
"\n",
"from nltk.corpus import stopwords\n",
"stopword_eng = stopwords.words('english')\n",
"\n",
"import string\n",
"punct = string.punctuation # punct = '!\"#$%&\\'()*+,-./:;<=>?@[\\\\]^_`{|}~'\n",
"punct = [punct[i] for i in range(len(punct))]\n",
"punct = punct + stopword_eng + ['\\n'] # 특수문자를 사전에 추가한다\n",
"# len(punct)\n",
"\n",
"# row로 분리된 데이터를 하나의 문장으로 만든다 \n",
"texts = [txt.lower() for txt in s1 if txt.lower() not in punct]\n",
"document = ''\n",
"for txt in texts:\n",
" document += txt + ' '"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "sEk9acoaYWUq",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Wall time: 53min 49s\n"
]
}
],
"source": [
"%%time\n",
"tokens = [word for word in document \n",
" if word not in stopwords.words('english')]"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "vkxz1hTbYkLS",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"word_sequence = \" \".join(document).split()\n",
"word_list = \" \".join(document).split()\n",
"word_list = list(set(word_list))\n",
"word_dict = {w: i for i, w in enumerate(word_list)} # word를 token화 한다"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "Y9ULNyD_ZhZK",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### 2.주요 파라미터 설정"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "izi68dVSYlja",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"import tensorflow as tf\n",
"import numpy as np\n",
"from datetime import datetime, timedelta\n",
"\n",
"skip_grams = []\n",
"for i in range(1, len(word_sequence) - 1):\n",
" target = word_dict[word_sequence[i]]\n",
" context = [word_dict[word_sequence[i-1]], word_dict[word_sequence[i+1]]]\n",
" for w in context:\n",
" skip_grams.append([target, w])"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "tKXF63AfYqFS",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"# skip-gram 데이터에서 무작위로 데이터를 뽑아 입력값과 출력값의 배치 데이터를 생성\n",
"def random_batch(data, size):\n",
" random_inputs, random_labels = [], []\n",
" random_index = np.random.choice(range(len(data)), size, replace=False)\n",
" for i in random_index:\n",
" random_inputs.append(data[i][0]) # target\n",
" random_labels.append([data[i][1]]) # context word\n",
" return random_inputs, random_labels\n",
"\n",
"training_epoch = 300\n",
"learning_rate = 0.1\n",
"batch_size = 20\n",
"embedding_size = 2 # 단어벡터 임베딩 차원 (x, y 2개만 출력)\n",
"num_sampled = 15 # 모델의 nce_loss 샘플크기 (batch_size 보다 작아야 한다)\n",
"voc_size = len(word_list) # 총 단어 갯수"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "F2wrHIEDZbNw",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### 3.모델설정(Tensorflow)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "lrPnZZVBYw4q",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"inputs = tf.placeholder(tf.int32, shape=[batch_size])\n",
"labels = tf.placeholder(tf.int32, shape=[batch_size, 1])\n",
"\n",
"embeddings = tf.Variable(tf.random_uniform([voc_size, embedding_size], -1.0, 1.0))\n",
"selected_embed = tf.nn.embedding_lookup(embeddings, inputs)\n",
"nce_weights = tf.Variable(tf.random_uniform([voc_size, embedding_size], -1.0, 1.0))\n",
"nce_biases = tf.Variable(tf.zeros([voc_size]))\n",
"\n",
"inputs = tf.placeholder(tf.int32, shape=[batch_size])\n",
"labels = tf.placeholder(tf.int32, shape=[batch_size, 1])\n",
"\n",
"embeddings = tf.Variable(tf.random_uniform([voc_size, embedding_size], -1.0, 1.0))\n",
"selected_embed = tf.nn.embedding_lookup(embeddings, inputs)\n",
"nce_weights = tf.Variable(tf.random_uniform([voc_size, embedding_size], -1.0, 1.0))\n",
"nce_biases = tf.Variable(tf.zeros([voc_size]))\n",
"\n",
"loss = tf.reduce_mean(\n",
" tf.nn.nce_loss(nce_weights, nce_biases, labels, selected_embed, num_sampled, voc_size))\n",
"train_op = tf.train.AdamOptimizer(learning_rate).minimize(loss)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "FuytpP7iZAiT",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### 4.모델학습(Session)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "JA9LvgPVY1tK",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"loss at step : 30 / Loss = 8.111506 / Time = 12:35:51\n",
"loss at step : 60 / Loss = 3.015723 / Time = 12:37:26\n",
"loss at step : 90 / Loss = 2.5067964 / Time = 12:39:01\n",
"loss at step : 120 / Loss = 2.0653539 / Time = 12:40:38\n",
"loss at step : 150 / Loss = 3.4744956 / Time = 12:42:16\n",
"loss at step : 180 / Loss = 3.2406578 / Time = 12:43:52\n",
"loss at step : 210 / Loss = 2.9685848 / Time = 12:45:25\n",
"loss at step : 240 / Loss = 2.515349 / Time = 12:46:59\n",
"loss at step : 270 / Loss = 3.9161484 / Time = 12:48:32\n",
"loss at step : 300 / Loss = 2.20857 / Time = 12:50:05\n",
"Wall time: 15min 50s\n"
]
}
],
"source": [
"%%time\n",
"with tf.Session() as sess:\n",
" init = tf.global_variables_initializer()\n",
" sess.run(init)\n",
" for step in range(1, training_epoch + 1):\n",
" batch_inputs, batch_labels = random_batch(skip_grams, batch_size)\n",
" _, loss_val = sess.run([train_op, loss],\n",
" feed_dict={inputs: batch_inputs,\n",
" labels: batch_labels})\n",
" if step % 30 == 0:\n",
" now = datetime.today()\n",
" print(\"loss at step : \", step, \"/ Loss = \", loss_val, ' / Time =', now.strftime('%H:%M:%S'))\n",
" # with 구문 안에서는 sess.run 대신 간단히 eval() 함수를 사용할 수 있습니다.\n",
" trained_embeddings = embeddings.eval()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ST9qR8voZGVB",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### 5.학습 결과(Visualization)"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "QsguU35HY3tZ",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"C:\\Users\\khpark\\Anaconda3\\lib\\site-packages\\matplotlib\\font_manager.py:1320: UserWarning: findfont: Font family ['D2Coding'] not found. Falling back to DejaVu Sans\n",
" (prop.get_family(), self.defaultFamily[fontext]))\n"
]
},
{
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAe8AAAHVCAYAAADYaHMGAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMS4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvNQv5yAAAIABJREFUeJzs3XtcVXW+//HX4o6okBcE1DIdU7wQomlmlkiKaWSpY5meaqay09QZa0ZndCpjrEYry3S6jaeL2c36qWmMTTahnMzsomlkKZpEKZdQCQQE3MD6/YEwohsB2Zu1L+/n49Fjs7977bU+K5X3/q7vd3+XYZomIiIi4j58rC5AREREmkfhLSIi4mYU3iIiIm5G4S0iIuJmFN4iIiJuRuEtIiLiZhTeIiIibkbhLSIi4mYU3iIiIm7Gz+oCzqZTp05mjx49rC6jRUpLSwkJCbG6jFbnrecN3nvu3nre4L3nrvN2vB07dhwxTbNzY9u5dHj36NGD7du3W11Gi6SlpTFq1Ciry2h13nre4L3n7q3nDd577jpvxzMM48embKfL5iIiIm5G4S0iIuJmFN4iIiJuRuEtIiLiZhTeIiIibkbhLSIi4mYcEt6GYbxsGEa+YRi7G3jdMAxjmWEY3xuGkW4YRpwjjisiIuKNHNXzXgGMO8vrVwO9T/43E3jeQccVERHxOg4Jb9M0PwYKzrLJRGClWeMzIMwwjEhHHFtERMTbGKZpOmZHhtED+KdpmgPsvPZPYJFpmp+cfJ4K/Nk0zTOWTzMMYyY1vXO6dOkyeNWqVQ6pzyolJSW0bdvW6jJanbeeN3jvuXvreYP3nvvp57169Wo2bNiAaZpcc801TJkyxcLqnMeZf97x8fE7TNMc0th2rbU8qmGnze6nBtM0lwPLAYYMGWK6+9J7Wj7Q+3jruXvreYP3nvup5717927S0tL49ttvCQgIYNy4cXTt2pXevXtbW6QTuMKfd2vNNj8EdD/leTcgp5WOLSIiTrZnzx4uvfRS2rRpg5+fH1deeSXvvvuu1WV5rNYK7/eAm0/OOr8UKDJNM7eVji0iIk42YMAAPv74Y44ePcrx48d5//33OXjwoNVleSyHXDY3DOMtYBTQyTCMQ8BDgD+AaZovAO8D44HvgePAbxxxXBERb3DnnXdy8803M2LECKtLaVB0dDR//vOfGTNmDG3btuXiiy/Gz8+lb1zp1hzyf9Y0zWmNvG4CdzviWCIi3ubzzz/nueees7qMRt12223cdtttAPzlL3+hW7duFlfkufSxSETEhe3Zs4eLLroIX19fq0tpVH5+PuHh4fz000+sXbuWbdu2WV2Sx1J4i4i4sH/961+MG3e2NbBcx+TJkzl69Cj+/v48++yznHfeeVaX5LEU3iIiLmzjxo288sorVpdxVut2ZvPExgxyRswlKiyYOYl9SBjU1eqyPJrCW0TERR0/fpzCwkKioqKsLqVB63ZmM2/tN5TZqgDILixj3tpvALhOAe40uquYiIiL2rx5M/Hx8VaXcVZPbMyoC+5aZbYqntiYYVFF3kHhLSLiotxhvDunsKxZ7eIYumwuIuIi6saOC8uICgsmL/X/WLJkidVlnVVUWDDZdoI6KizYgmq8h3reIiIuoHbsOLuwDJOaseOAKU+wYXe+1aWd1ZzEPgT71/8aW7C/L3MS+1hUkXdQeIuIuAB3HTu+blBXFk4aSNewYAyga1gwCycN1GQ1J9NlcxERF+DOY8fXDeqqsG5l6nmLiLiAhsaIz2XsuLCwkClTptC3b1+io6O10pkHUniLeLgTJ05QWlpqdRnSCEeOHc+aNYtx48axd+9evv76a6Kjox1VprgIhbeIh9qzZw9//OMf6dOnD/v27bO6HGmEo8aOjx07xscff1x3g5CAgADCwsKcULFYSWPeIh6ktLSUd955h5deegnTNPnNb35Deno67dq1s7o0aQJHjB1nZmbSuXNnfvOb3/D1118zePBgli5dSkhIiIOqFFegnreIB4mMjOSll17ixRdfZOvWrdx+++0Kbi9TWVnJV199xV133cXOnTsJCQlh0aJFVpfVoFGjRpGVlWV1GW5H4S3iQVavXk3Xrl25/vrrWbBgAT/++KPVJUkr69atG926dWPYsGEATJkyha+++sriqsTRFN4iHmTs2LG8/fbbfPLJJ4SGhjJx4kSuuuoq9Wy8SEREBN27dycjo+b74ampqfTr18/iqsTRNOYt4kFKd+ZzbGMWVYUVTA0bzu2vTONbWxa+vr6Nv1k8xt///nemT5/OiRMn6Nmzp8vfUlSaT+Et4iFKd+ZTuHY/pq0agKrCCgrX7qf/pN6EdA+3uDppzMiRIykuLj6jffHixVx11VWNvn9D5gaWfrWUvNI8IkIi+Os7f2VCzwnOKFVcgMJbxEMc25hVF9y1TFs1xzZmETJI4e3qtmzZcs7v3ZC5geRPkymvKgcgtzSX5E+TARTgHkrhLeIhqgormtUurqUlPe+lXy2tC+5a5VXlLP1qqcLbQym8RTyEb1ig3aD2DQu0oBpprpb0vPNK85rVLu5Ps81FPET7xB4Y/vX/SRv+PrRP7GFNQdJqIkIimtVumfR3YMkASA6reUx/x+qK3JbCW8RDhAwKJ2xS77qetm9YIGGTemu8252cY7jNiptFkG9QvbYg3yBmxc1yRpXnJv0dSPk9FB0EzJrHlN9D6WGrK3NLumwu4kFCBoUrrN1VbbjZTt4CtDbcAGKmnvWttePap842nxU3y7XGu1MX/OfcatnKuPVXhVp7/RwovEVEXEED4UbqgkbDG2oC3KXC+nRFh+w239qnFBTezabL5iIirqCBcGuw3d2Edmteu5yVwltExBV4erglzAf/4Ppt/sE17dJsCm8REVfg6eEWMxWSlkFod8CoeUxa1qQhATmTxrxFRFxBbYilLqi5VB7arSa4PSncYqZ61vlYSOEtIuIqFG7SRLpsLiLiRUzTBCA5Obnec3Ev6nmLiHiRN954g5ycHMrLy3n88ceJiopixowZVpclzaSet4iIF5kxYwbdu3fn8ccf5/zzz1dwuymFt4iIF3nzzTc5ePAgf/rTn/jpp5948803rS5JzoEum4uIeJFp06ZhGAbJycn86U9/0pi3m1LPW0TEixiGAfxnwlrtc3EvCm8RERE3o/AWERFxMxrzFhHxcOnp6aSmplJUVERoaCgJCQnExMRYXZa0gMJbRMSDpaenk5KSgs1mA6CoqIiUlBQABbgb02VzEREPlpqaWhfctWw2G6mpqRZVJI6g8BYR8WBFRUXNahf3oPAWEfFgoaGhzWoX96DwFhHxYAkJCfj7+9dr8/f3JyEhwaKKxBE0YU1ExIPVTkrTbHPPovAWEfFwMTExCmsP45DL5oZhjDMMI8MwjO8Nw5hr5/VbDcM4bBjGrpP/3e6I44qIiHijFve8DcPwBZ4FxgCHgC8Nw3jPNM3vTtv0bdM072np8URERLydI3reQ4HvTdPMNE3zBLAKmOiA/YqIiIgdjgjvrsDBU54fOtl2usmGYaQbhrHaMIzuDjiuiIiIVzJaei9XwzB+DSSapnn7yef/BQw1TfN/TtmmI1BimmaFYRj/DUw1TXN0A/ubCcwE6NKly+BVq1a1qD6rlZSU0LZtW6vLaHXeet7gvefurecN3nvuOm/Hi4+P32Ga5pDGtnPEbPNDwKk96W5AzqkbmKZ59JSn/ws81tDOTNNcDiwHGDJkiDlq1CgHlGidtLQ03P0czoW3njd477l763mD9567zts6jrhs/iXQ2zCMCw3DCABuBN47dQPDMCJPeXotsMcBxxUREfFKLe55m6ZZaRjGPcBGwBd42TTNbw3DWABsN03zPeD3hmFcC1QCBcCtLT2uiDfYunUr1dXVjBw50upSRMSFOGSRFtM03wfeP61t/ik/zwPmOeJYIt5i586dvPLKKzz//PNWlyIiLkYrrIm4qEGDBvHiiy9aXYaIuCDdmETEBb3++usMHTqU2NhY7rzzTqqqqqwuSURciMJbxMXs2bOHt99+m61bt7Jr1y58fX154403rC5LRFyILpuLuJjU1FR27NjBJZdcAkBZWRnh4eEWVyUirkThLeJiTNPklltuYeHChVaXIiIuSpfNRVxMQkICq1evJj8/H4CCggJ+/PFHi6sSd5SWlsatt95qdRniBApvERfTr18/HnnkEcaOHUtMTAxjxowhNzfX6rJExIXosrmIC1mTV8DCzFyyw/vQ9bk3mNczkskRHawuS0RcjMJbxEWsyStgdsZByqprbhZ0qMLG7IyaG/YpwKU5hg0bRkVFBSUlJRQUFBAbGwvAY489RmJiosXViSMovEVcxMLM3LrgrlVWbbIwM1fhLc3y+eefAzVj3itWrGDFihXWFiQOpzFvEReRXWFrVruIeC/1vEWcbN68eSQmJlJYWMjevXuZO3eu3e26BvpzyE5Qdw30d3aJIuJm1PMWcbLPP/+cYcOG8X//939nvTvYvJ6RBPsY9dqCfQzm9Yxs4B0i9hWlpLB/dAJd7vod9/90kKKUFKtLEgdTz1vESebMmcPGjRv54YcfGD58OAcOHCA1NZUpU6Ywf/78M7avHddemJlLdoWNroH+mm0uzVaUkkLug/Mxy8sBqMzJIffBmr9voUlJVpYmDqTwFnGSJ554gl//+te89tprPPXUU4waNYqtW7ee9T2TIzoorKVF8pc8XRfctczycvKXPK3w9iC6bC7iRDt37iQ2Npa9e/fSr18/q8sRL1DZwII+DbWLe1LPW8QJdu3axa233sqhQ4fo1KkTx48fxzRNYmNj2bZtG8HBwVaXKB7KLzKSypwcu+0tVVlZyfjx41myZAn9+/dv8f7k3KnnLeIEsbGx7Nq1i4suuojvvvuO0aNHs3HjRnbt2qXgFqcKv+9ejKCgem1GUBDh993b4n37+fnx+uuv85e//AWbTV9htJLCW8RJDh8+zHnnnYePj48um0urCU1KIvLhBfhFRYFh4BcVReTDC5o83j1+/Hhy7PTca4WHh7N+/Xr8/fUVRivpsrmIk3Tu3JkNGzYA8Nlnn1lcjXiT0KSkc56c9v777zu4GnEGhbeIk5TuzOfYxiyqCivwDQukfWIPQgaFW12WiHgAhbeIE5TuzKdw7X5MWzUAVYUVFK7dD6AAF5EW05i3iBMc25hVF9y1TFs1xzZmWVOQSBM1NuYtrkE9bxEnqCqsaFa7iBXS09NJTU2lqKiI0NBQEhISNObtJhTeIk7gGxZoN6h9wwItqEbkTOnp6aSkpNR95auoqIiUk2ugx8TEWFmaNIHCW8QJ2if2qDfmDWD4+9A+sYd1RYmcIjU19YzvattsNlJTU+uF954tm9myaiXFR4/QrmMnRt54M9Ej41u7XDmNwlvECWonpWm2ubiqoqKiRtv3bNnMh8ufofJEzVWk4iOH+XD5MwAKcIspvEWcJGRQuMJaXFZoaKjdAA8NDa37ecuqlXXBXavyRAVbVq1UeFtMs81FRLxQQkLCGauk+fv7k5CQUPe8+OgRu+9tqF1aj3reIiJeqHZc+/TZ5qeOd7fr2IniI4fPeG+7jp1arU6xT+EtIuKlYmJizjqzfOSNN9cb8wbwCwhk5I03t0Z5chYKbxERsat2XFuzzV2PwltERBoUPTJeYe2CNGFNREQsU1lZyYQJE+jUqRO7d++2uhy3ofAWERHL3HXXXfTp04f169dzww03cOjQIatLcgsKbxERscRf//pXQkNDeeqppxgxYgQvvvgi06ZNa3ABGfkPjXmLiIglHnrooXrPhw8fzpYtWyyqxr2o5y0iIuJmFN4iIiJuRuEtIiLiZhTeIiIibkYT1kRExBJr8gpYmJlLdoWNroH+zOsZyeSIDlaX5RYU3iIi0urW5BUwO+MgZdUmAIcqbMzOOAigAG8CXTYXEZFWtzAzty64a5VVmyzMzLWoIvei8BYRkVaXXWFrVrvUp/AWEfFwcXFx2GyuFYpdA/2b1S71KbxFRDzcZZddxtatW60uo555PSMJ9jHqtQX7GMzrGWlRRe5F4S0i4uGuvvpqPvjgA6vLqGdyRAcW9+lOt0B/DKBboD+L+3TXZLUmcshsc8MwxgFLAV/gRdM0F532eiCwEhgMHAVuME0zyxHHFhGRs4uPj2fBggVWl3GGyREdFNbnqMU9b8MwfIFngauBfsA0wzD6nbbZbcAvpmn+ClgCPNbS44qISNO0adOGsLAwcnJyrC7FEnv37uWyyy5j4MCBXHnllRw5csTqklrMEZfNhwLfm6aZaZrmCWAVMPG0bSYCr578eTWQYBiGgYiItIrExESXu3Teml5//XW++eYbLrvsMl544QWry2kxR1w27wocPOX5IWBYQ9uYpllpGEYR0BE44+OPYRgzgZkAXbp0IS0tzQElWqekpMTtz+FceOt5g/eeu7eeN7jHuXfs2JFXXnmFnj17Omyf7nDep/rpp5/4/vvv6dSpU4vqdoXzdkR42+tBm+ewTU2jaS4HlgMMGTLEHDVqVIuKs1paWhrufg7nwlvPG7z33L31vMF9zn3JkiWMHDkSX19fh+zPXc671saNG9m9ezfbtm0jLCzsnPfjCuftiMvmh4DupzzvBpw+sFK3jWEYfkAoUOCAY4uISBNdeumlfPbZZ1aXYYnq6mpuu+023nvvvRYFt6twRM/7S6C3YRgXAtnAjcBNp23zHnALsA2YAmwyTdNuz1tERFouN289mQcWU16RS1BgJD17zfaIsd5zlZOTQ2hoKL1797a6FIdocXifHMO+B9hIzVfFXjZN81vDMBYA203TfA94CXjNMIzvqelx39jS44qIiH25eevZu/d+qqvLACivyGHv3vsBiIw4fT6xdzjvvPN48sknrS7DYRzyPW/TNN8H3j+tbf4pP5cDv3bEsURE5OwyDyyuC+5a1dVlZB5Y7LXhXVRUxIsvvsi4ceOsLsUhdEtQEREPU15h/85cDbV7og2ZG1j61VLySvOICIlgVtwsVq9ebXVZDqPwFhHxMEGBkZRXnLkgS1Cgd6wbviFzA8mfJlNeVQ5AbmkuyZ8mAzCh5wQLK3McrW0uIuJhevaajY9PcL02H59gevaabVFFrWvpV0vrgrtWeVU5S79aalFFjqeet4iIh6kd1z59trm3jHfnleY1q90dKbxFRDxQZMRErwnr00WERJBbeub4fkRIhAXVOIcum4uIiEeZFTeLIN+gem1BvkHMiptlUUWOp563iIh4lNpJaafPNveUyWqg8BYREQ80oecEjwrr0+myuYiIiJtReIuIiLgZhbeIiIibUXiLiIi4GYW3iIiIm1F4i4jLOnLkCPHx8cTExDB06FBKSkqsLknEJSi8RcRlPf/881xxxRWkp6ezbt06AgICrC5JxCXoe94i4rICAgLIysoCICoqytpiRFyIet4i4rJ69erFmjVreOGFF6wuRcSlKLxFxCVlZ2fz6KOPkpGRwYsvvsiaNWsAiImJ4dixYxZXJ2ItXTYXEZe0detWLr74Yrp06cKGDRtISEjg559/pkePHrRv397q8kQspZ63iLikmJgYNm/eTE5ODl26dGHJkiXcfffd3HTTTVaXJmI59bxFxCX17duXRx99lMTERPz9/enSpQurVq1i7ty5xMXFcdFFF1ldoohlFN4i4pKKUlIY9vIrvFNZhV/ncMJ/9ztCk5K44YYbrC5NxHIKbxFxOUUpKeQ+OB+zvByAypwcch+cD0BoUpKVpYm4BI15i4jLyV/ydF1w1zLLy8lf8rRFFYm4FoW3iLicytzcZrWLeBuFt4i4HL/IyGa1i3gbhbeIuJzw++7FCAqq12YEBRF+370WVSTiWjRhTURcTu2ktPwlT1OZm4tfZCTh992ryWoiJym8RcQlhSYlKaxFGqDL5iIiIm5G4S0iIuJmFN4iIiJuRuEtIiLiZhTeIiIibkbhLSIi4mYU3iIiIm5G4S0iIuJmFN7Addddx+DBg+nfvz/Lly9vteM+/PDD9O3blzFjxjBt2jQWL17cascWERH3pRXWgJdffpkOHTpQVlbGJZdcwuTJk+nYsaNTj7l9+3bWrFnDzp07qaysJC4ujsGDBzv1mCIi4hkU3sCyZct49913ATh48CD79+93enh/8sknTJw4keDgYACStAykiIg0kddfNk9LS+Ojjz5i27ZtfP311wwaNIjy8nKnH9c0TacfQ6Q5srKyCA4OJjY21upSRKQRXh/eRUVFnHfeebRp04a9e/fy2WeftcpxL7/8clJSUigvL6ekpIQNGza0ynFFzqZXr17s2rXL6jKkmX755RerS5BW5vXhPW7cOCorK4mJieHBBx/k0ksvbZXjXnLJJVx77bVcfPHFTJo0iSFDhhAaGtoqxxYRzzJkyBBuuukmNm3apKt6XsKrw3vPls28+of/5qpQP+4ZOZgFv/8daWlpjBo1yqnHXZNXwJBPv+X5oWNp98pabn3hZTIyMjRhTUTOyb59+7jpppt45pln6NevH3/729/IycmxuiyXsmTJEvr378+AAQOYNm1aqwyPOpPXhveeLZv5cPkzFB85DKZJ8ZHDfLj8GfZs2ezU467JK2B2xkEOVdg49uTDfH3zJG4ZNZI+Y68mLi7OqccWEc/k6+vLNddcw9q1a/n444/JzMzk/PPP54svvrC6NJeQnZ3NsmXL2L59O7t376aqqopVq1ZZXVaLeO1s8y2rVlJ5oqJeW+WJCrasWkn0yHinHXdhZi5l1TWXtUIfWFjXnhHo77RjiojnKyoq4u233+aVV17B39+fl156iZiYGKvLchmVlZWUlZXh7+/P8ePHiYqKsrqkFvHannfx0SN223Ozs7n88ssZMGAA69atq2ufOHGiQy5DZVfYmtUuItYqKysjNjaWgIAAjhyx/3vDajNmzCAuLo7MzExWrlzJxx9/zC233EJQUJDVpbmErl27Mnv2bM4//3wiIyMJDQ1l7NixVpfVIl4b3u06drLbvudoEbfccgvbtm3jiSeeACAlJYW4uDiHfFLr2kAPu6F2EbFWcHAwu3btcume2tSpU8nIyGDRokX07t3b6nJczi+//ML69ev54YcfyMnJobS0lNdff93qslqkReFtGEYHwzD+bRjG/pOP5zWwXZVhGLtO/vdeS47pKCNvvBm/gMB6bX4BgfQZOpyysjIqKirw8fGhsrKSp59+mjlz5pzTcfLy8hgwYEDd834b11D+6gv1tjmx+nXyfzuFAQMG8PTTT5/TcUTORXp6OkuWLCE5OZmXXnrJ7SfxeKtrr72Ww0c2sHXrSFI3/YqtW0eSm7fe6rJcxkcffcSFF15I586d8ff3Z9KkSXz66adWl9UiLR3zngukmqa5yDCMuSef/9nOdmWmabrUyg+149pbVq2k+OgR2nXsxMgbbyYqJo6bbrqJlStX8thjj/Hcc89x880306ZNG4ccN6ZdG8o7hZIZ6E92hY2wH/ZxfNP7fLdjO6ZpMmzYMK688koGDRrkkOOJNCQ9PZ2UlBRstpohm5KSEg4fPkzfvn3Zu3evxdVJc+TmrWfv3vupri4DoLwih7177wcgMmKilaW5hPPPP5/PPvuM48ePExwcTGpqKkOGDLG6rBZpaXhPBEad/PlVIA374e2SokfG252cVrtgyi+//MJjjz3G2rVrueOOO/jll1/44x//yPDhw1t03H5tg3nnsv4ALE3/P47+egohISEATJo0iS1btii8xelSU1PrghsgNDSU++67T+sNuKHMA4vrgrtWdXUZmQcWK7yBYcOGMWXKFOLi4vDz82PQoEHMnDnT6rJaxGjJF/oNwyg0TTPslOe/mKZ5xqVzwzAqgV1AJbDINM11p29zyrYzgZkAXbp0GWzldP5nn32WESNGcOjQIaqrq0lISOCBBx5gyZIlTd5HVlYWycnJrFixAoDXXnuNqqoqbr31VgBWr17NsWPH+O1vfwvU3CQlNDSUyZMnO/p0WlVJSQlt27a1ugxLuMu55+bmNvhaZGRks/fnLud9rm688Ub+8Y9/2P1wY/W5FxfvbvC1du0GNPhaS1l93k1RdKKI/NJ8bNU2/H38CQ8JJzSgZR9QnXne8fHxO0zTbPSyQKM9b8MwPgIi7Lx0fzPqOd80zRzDMHoCmwzD+MY0zQP2NjRNczmwHGDIkCGmsxdMacj+/fvx8fHh3nvvZenSpQQHB3PFFVcQFBTUrEVcPvroI0pKShg4cCBt27Zl7ty5jBs3rm4f7du3Z+TIkbzwwguYpsk999zDa6+95vY979ZY7MZVucu5L1myhKKiojPaQ0NDmTZtWrP35y7nfa6CgoIYMWIEnTqdOdnV6nPfuvVByivO/DZMUGAUI0bc47TjWn3ejdmQuYFHPn2E8qr/zOUIKg0i+bJkJvSccM77dYXzbnTCmmmaV5mmOcDOf+uBnw3DiAQ4+ZjfwD5yTj5mUnNp3eWT6f777+eRRx4BYNq0aaxYsYJLL72U2bNnN2s/fn5+zJ8/n2HDhnHNNdfQt2/feq/HxcXx8MMPM3ToUIYNG8btt9/u9sEt7iEhIQF///rfcvD39ychIcGiiqyXm7feLSd99ew1Gx+f4HptPj7B9OzVvN9XnmbpV0vrBTdAeVU5S79aalFFjtPSMe/3gFuARScfz/ibfnIG+nHTNCsMw+gEjAAeb+FxnW7psulkHvgtPx3MJSgwkjVr/9zssaN1O7P5Oa+YJTm9iLp9OXcn9uG6QV3rvf7Exgy2zb+f4Qs2MOe018VaGRkZ3HDDDXXPMzMzWbBgAffee6+FVTlO7QIeqampFBUVERoaSkJCgtcu7OHOk75q68s8sJjyiprfWT17zXb5up0trzSvWe3upKXhvQh4xzCM24CfgF8DGIYxBPhv0zRvB6KBfxiGUU1NT3+RaZrftfC4TuWIf8TrdmYzb+03/K5vNSY+ZBeWMW/tNwBcN6hr3etltiqAM14X6/Xp06fuDltVVVV07dqV66+/3uKqHCsmJsZrw/p0DU362vPdY1w97iFsNhs+Pq67NEZkxESvD+vTRYREkFt65tyOiBB7I8HupUV/E03TPGqaZoJpmr1PPhacbN9+MrgxTfNT0zQHmqZ58cnHlxxRuDOdbeZmUz3l7mMiAAAgAElEQVSxMaMumGuV2ap4YmNGk14X15KamkqvXr244IILrC5FnKS8wv4EPpN8du3aRXZ2Nh06dGjlqqQlZsXNIsi3/ipzQb5BzIqbZVFFjuO1a5ufTUP/iBtqtyensOys7Y29Lq5l1apV5zSJS9xHUGBkA5O+mj/zXlxD7aS0pV8tJa80j4iQCGbFzWrRZDVXofC2wxH/iKPCgsm2E8RRYcFNel1cx4kTJ3jvvfdYuHBh4xuL2+rZa3a94TLQpC9PMKHnBI8I69O57gCOhRwxc3NOYh+C/X3rtQX7+zInsU+TXhfX8a9//Yu4uDi6dOlidSniRJERE+nb91GCAqMAg6DAKPr2fVTjyOKS1PO2wxEzN2snnf2c8RUGNT3qU2eT1z4+sTED4w+raVf8I8c+/Dvj7t9JaWkpQ4cO5e233663LrpY46233tIlcy+hSV/iLhTeDXDEP+LrBnUlrWg/PywaBdQsGDB29W/qjb1snfufyzkPPJDDAw88QFlZGTNmzFBwO9EHH3zArFmzqKqq4vbbb2fu3Ll2tzt+/Dj//ve/+cc//tHKFYqINEzh3Uo2ZG4g+dPkugUDcktzSf40GfjPpIr58+dzySWXEBQUxLJly6wq1eNVVVVx99138+9//5tu3bpxySWXcO2119KvX7+6bfZ9nse29QcoKajgqTvf4+e9ZYQO05rfIuIaNObdSpqy0k9BQQElJSUUFxfr1oxO9MUXX/CrX/2Knj17EhAQwI033sj69f9ZX2jf53lsfmMvJQUVAJQUVLD5jb3s+9z9F3YQEc+g8G4lTVnpZ+bMmTz88MNMnz6dP//ZbW7O5nays7Pp3r173fNu3bqRnZ1d93zb+gNUnqiu957KE9VsW293OX4RkVan8G4lDa3oU9u+cuVK/Pz8uOmmm5g7dy5ffvklmzZtas0SW+zBBx9k6dL/XEm4//77XfLyv7076RmGUfdzbY/7dA21i0h9ZWVlXHnllVRVVTW+sZwThXcraWyln5tvvpm1a9cC4Ovry+eff87o0aNbvc6WuO2223j11VcBqK6uZtWqVUyfPt3iqs7UrVs3Dh48WPf80KFDREVF1T1v2yHQ7vsaaheR+l5++WUmTZqEr69v4xvLOVF4t5IJPSeQfFkykSGRGBhEhkSSfFkyAGNXjyXm1RjGrh7LhswN1hbaAj169KBjx47s3LmT7du3M2jQIDp27Gh1WWe45JJL2L9/Pz/88AMnTpxg1apVXHvttXWvD5/YC7+A+v80/AJ8GD6xV2uXKtIszz77LMOGDWPkyJG8/PLL7N+/n4ULF/Lpp5+2ah1vvPEGEyfqK3fOpNnmrej0lX6aMgPd3dx+++2sWLGC9PR05syZY3U5dvn5+fHMM8+QmJhIVVUVv/3tb+nfv3/d6xcNqxnKqJ1t3rZDIMMn9qprF3FVBw8eZOvWrRw4cIBHHnmExx9/nKlTp3LppZe2Wg0nTpwgMzOTHj16tNoxvZHC20Jnm4HuruF9/fXXM3/+fIqLi0lMTLS6nDPs2bKZLatWUnz0CLPHjGDkjTcTPTL+jO0uGhahsBa3s2jRIqDmjnivvfaaJTUcOXKEsLAwS47tTRTeFvLEe80GBAQQHx/PsWPHXG68a8+WzXy4/BkqT9RMPCs+cpgPlz8DYDfARaT5goOD9VXXVqAxbws1NgPdXRSlpLB/dAJ7ovuRET+arR9+yPjx460u6wxbVq2sC+5alScq2LJqpUUViXie8847j6qqKgW4kym8LeQJ95otSkkh98H5VObk8H15OVdt/YQhJSVEtmtndWlnKD56pFntIm4n/R1YMgCSw2oe09+xpIyxY8fyySefWHJsb6HL5hY69V6zW+dtZcTCEW53r9n8JU9jnvyE/avAQD7sWTMj+6eff7ayLLvadexE8ZHDdttF3F76O5Dye7CdvKVp0cGa5wAxU5122FOXEq6d3HnPPffw1FNPcdVVVzntuN5OPW+LTeg5gQ+nfEjp/lI+nPKhWwU3QGVurt1202Zr5UoaN/LGm/ELqP9dbb+AQEbeeLNFFYk4UOqC/wR3LVtZTbuTVJRW2l1KOOREJPHx8VqkxYkU3i6ibdu2VpdwTvwiI+22G/7+rVxJ46JHxjN25j2069QZDIN2nTozduY9mqzmAFVVVdxxxx1cc801VpfivYoONa/dAUoKKxpcSvi3v/2ty01a9SS6bC4tEn7fveQ+OL/u0jmAERSEX5cuFlbVsOiR8QprJ1i6dCnnn3++1WV4t9BuNZfK7bU7SXVVtd12T1xK+Oeff+Zvf/sbmzdvpry8nCuuuIKHHnqo3n0SWpN63tIkr7/+OkOHDiU2NpY777yz7nJYaFISkQ8vwC8qCgwDv6goIh9egG+obp/pLQ4dOsSGDRuYMMG9hnw8TsJ88A+u3+YfXNPuJD6+9iPE05YSPnDgAOPGjWPEiBFs376d5cuXM23aNK6//noOHLDmhkUKb2nUnj17ePvtt9m6dSu7du3C19eXN954o+710KQkem9KJXrPd/TelEpoUpKF1Upru/fee3n88cfx8dGvE0vFTIWkZRDaHTBqHpOWOXWyWtuwQK9YSviuu+7i1VdfZerUqQQEBACQkJDA66+/zh//+EdLatJlc2lUamoqO3bs4JJLLgFq7hgUHh5ucVXiCv75z38SHh7O4MGD2bJli9XlSMxUp4b16QJD/Iif3tejlxLet28fnTt3JiYmhn/+85/Mnz+fdu3a0alTJ9asWYOPjw9HjhyhU6fW/daKwttFnHpLSldjmia33HILCxcutLoUcTFbt27lvffe4/333+fYsWOUl5czY8YMXn/9datLk1bi6UsJf/3111x66aVUVVXx17/+lU2bNvHBBx9wxx13ANC7d29++OGHVg9vXedqZet2ZjNi0SYunLuBEYs2sW5nNkePHqVDhw5Wl9aghIQEVq9eTX5+PgAFBQX8+OOPFlclrmDhwoUcOnSIrKws5s+fz+jRoxXc4lFM08TX15cjR47Qq1cvwsLCiIiIoF+/fgDk5+dbciVS4d2K1u3MZt7ab8guLMMEsgvLmP1qGgPjLmH27NlWl9egfv368cgjjzB27FhiYmIYM2YMuQ18v1tExJMMHDiQbdu20alTJw4cOEBRURE///wze/bs4ZtvviE/P58LLrig1evSZfNW9MTGDMps9RctqAwK48K7XuR//me0RVU1Iv0dSF3ADUWHuOGWbjUzV1txTE3cR2xsLPfee6/VZYg4VHR0NFlZWXz99dc88MADxMfH065dO6699loWL17Myy+/bEldCu9WlFNY1qx2y1m03KK4h9OXxYwYUWl1SSJOsXz5cqZPn86f/jyRZcvaUFAwhcOHX8Y/YAJdLFrTQpfNW1FUWHCz2i1nwXKLnuzgwYPEx8cTHR1N//79Wbp0qdUlnbN9n+edsSxmcUE5+z5339vZijQkOjqa/33xbt58Yym3/uZL5sx5kvfeyyQkZDW5eestqUk971Y0J7EP89Z+U+/SebC/L3MS+1hY1VlYsNyiJ/Pz8+PJJ58kLi6O4uJiBg8ezJgxY+omvriTbesPnLEspmmabFt/wKNnHov3Ki9bwax7zwPO43jpfbQJWQLYyDywmMiIia1ej3rerei6QV1ZOGkgXcOCMYCuYcEsnDSQ6wZ1tbo0+xpaVtGJyy16ssjISOLi4gBo164d0dHRZGdnW1zVuWlo+UtPXBZTBKC8wv4k3YbanU0971Z23aCurhvWp0uYX3/MG5y+3KK3yMrKYufOnQwbNszqUs5J2w6BdoPa05bFFKkVFBhJeUWO3XYrqOctDbNguUVvUFJSwuTJk3n66adp37691eWck+ETe52xLKZhGB63LKZIrZ69ZuPjU39+ko9PMD17WfM1X/W83VhhYSFvvvkmv/vd75x3kFZebtHT2Ww2Jk+ezPTp05k0aZLV5Zyz2nHtU2ebt+tQrfFu8Vi149qZBxZzvBSCAqPo2Wu2JePdoPB2a4WFhTz33HPODW9xGNM0ue2224iOjuYPf/iD1eW02OnLYqalpVlXjEgriIyYSGTERNLS0hgx4h5La9Flczc2d+5cDhw4QGxsLHPmzLG6HGnE1q1bee2119i0aROxsbHExsby/vvvW12WiLgh9bzd2KJFi9i9eze7du2yuhQ5i/T0dFJTUykqKuKpp54iISGBmJgYq8sSETem8BZxovT0dFJSUrDZbAAUFRWRkpICoAAXkXOmy+YiTpSamloX3LVsNhupqakWVSQinkDh7cbatWtHcXGx1WXIWRQVFTWrXUSkKRTebqxjx46MGDGCAQMGaMKaiwoNDW1Wu4hIU2jM292cvEUnRYcgtBtvzp0PMW9aXZU0ICEhod6YN4C/vz8JCQkWViUi7k49b3dSe4vOooOA+Z9bdKa/06LdnjhxgiuuuILKSt3S0dFiYmJISkqq62mHhoaSlJTkUZPVxo8fT07OmctGuppHH32UPn36cNVVVzFt2jQWL15sdUki50w9b3dytlt0tmAVtICAABISEnj77beZPn16C4uU08XExHhUWJ+u9rvq+/bts7iShu3YsYNVq1axc+dOKisriYuLY/DgwVaXJXLO1PN2J068Red1113HG2+80eL9SI0ePXqQlZXFqFGjrC5FgC1btnD99dfTpk0b2rdvz7XXXmt1SSItovB2J068ReeAAQP48ssvW7wfEVdlGIbVJYg4jMLbnSTMr7kl56kcdItOX19fAgIC9NUzB+ncuTO+vr506NDB6lKcLiEhweXvS37FFVfw7rvvUlZWRnFxcd1COSLuSmPe7qR2XPuU2eYkzHfYXb8qKioICgpyyL68Xe1VjLVr11pciXNVV1fz/fffu/yHlLi4OG644QZiY2O54IILGDlypNUlibRIi3rehmH82jCMbw3DqDYMY8hZthtnGEaGYRjfG4YxtyXH9HoxU+G+3ZBcWPPooOA+evQonTt3xt/f3yH7E89TlJLC/tEJ7Inux/7RCRSlpPDdd98xefJkgoODG9+Bxe6//34yMjL48MMPOf/8860uR6RFWtrz3g1MAv7R0AaGYfgCzwJjgEPAl4ZhvGea5nctPLY40ObNmxk/frzVZYiLKkpJIffB+Zjl5QBU5uSQ++B8uj+8gKeeesri6hpWujOfYxuzqCqswDcskPaJPQgZFG51WSIt1qLwNk1zDzQ6EWQo8L1pmpknt10FTAQU3hax9wvtzTffZOHChVaXJi4qf8nTdcFdyywvJ3/J04QmJVlU1dmV7syncO1+TFs1AFWFFRSu3Q9AcnKyhZWJtJxhmmbLd2IYacBs0zS323ltCjDONM3bTz7/L2CYaZp272RuGMZMYCZAly5dBq9atarF9VmppKSEtm3bWl1GneqySqp+qYBT/txtVZV8nL6Ncddc7bDjuNp5O1tFaSUlhRVUV1Xj37aa4MAQAkM8Z0pJ+bffNvhaUP/+gOv9mdvyjkNV9Zkv+PrgH9HGocdytXNvLTpvx4uPj99hmmaDw9C1Gv3tYhjGR0CEnZfuN01zfRNqsdctb/ATg2may4HlAEOGDDHd/XuyaWlpLvVd39xFX1BVWHFG+6BOk4kcNdRhx3G183amfZ/nsXntXipP1EwhCb+slJ/T/Iif3peLhtn7p+N+9i94mEo7q6j5RUXR++67Adf7Mz80d0uDr3Vb5NgJa6527q1F522dRiesmaZ5lWmaA+z815Tghppx7u6nPO8GuP5aih7KXnCfrV0at239ASpP1O/hVZ6oZtv6AxZV5Hjh992Lcdo3EYygIMLvu9eiihrnGxbYrHYRd9Ia3/P+EuhtGMaFhmEEADcC77XCccUO/UJzvJIC+x98Gmp3R6FJSUQ+vAC/qCgwDPyiooh8eIHLjncDtE/sgeFf/1ec4e9D+8Qe1hQk4kAtGpQzDON64O9AZ2CDYRi7TNNMNAwjCnjRNM3xpmlWGoZxD7AR8AVeNk2z4QE0car2iT3qTeIB/UJrqbYdAu0GddsOnvWBKDQpyaXD+nS1s8o121w8UUtnm78LvGunPQcYf8rz94H3W3IscQz9QnO84RN7sfmNvfUunfsF+DB8Yi8LqxKo+fuuv9viiTxnOqw0mX6hOVbtpLRt6w9QUlCBj6+PR01WExHXo/AWcYCLhkXUhXVaWpqCW0ScSjcmERERcTMKbxEhKyuLFStWWF2GiDSRwlvEyz3//PMkJiby4IMPMmrUKPLy8qwuSUQaoTFvES9WXFzMQw89REpKCnv27GHUqFGEhIRYXZaINELhLeLFfHx8OHHiBMeOHQOgR48e1hYkIk2iy+ZuLCsriwEDBlhdhrixkJAQVq5cyV/+8hcefPBBZs+ezfHjx60uS0QaofAW8XLXXnst/+///T/+9Kc/cfjwYZ588kmrSxKRRii8PURmZiaDBg3iyy+/tLoUcSMlJSX8+OOPALRr147o6GiKi4strkpEGqMxbw+QkZHBjTfeyCuvvEJsbKzV5Ygbsdls3HnnnRw5coSjR49y/vnn8+abb1pdlog0QuHt5g4fPszEiRNZs2YN/fv3t7occRPrdmbzxMYMcgrLiBr1J2YObEPA4b3ceuutVpcmIk2gy+ZuLjQ0lO7du7N161arSxE3sW5nNvPWfkN2YRkmkF1YxuK0Q/wSFGV1aSLSRApvNxcQEMC6detYuXJl3eXOrKws+vbtyy233EJMTAxTpkzRDGKp88TGDMpsVfXaTvgGszpLF+JE3IXC2wOEhITwz3/+kyVLlrB+/XqgZhx85syZpKen0759e5577jmLqxRXkVNY1qx2EXE9Cm83tCavgCGffsvwHwoJWv42a/IKCAsL48svv2TixIkAdO/enREjRgAwY8YMPvnkEytLFhcSFRbcrHYRcT0KbzezJq+A2RkHOVRhwwQOVdiYnXGQNXkF9bYzDOOsz8V7zUnsQ7C/b722YH9f5iT2sagiEWkuhbebWZiZS1m1Wa+trNpkYWZuvbaffvqJbdu2AfDWW29x+eWXt1qNtbTUpmu6blBXFk4aSNewYAyga1gwCycN5LpBXa0uTUSaSDNU3Ex2ha1J7dHR0bz66qvceeed9O7dm7vuuqs1yhM3cd2grgprETemnreb6Rrof9b2DZkb+K/3/4vMY5lkXpXJwnULWbNmDW3atGnNMgHo3Lmzw/bVtm1bh+1LRMTdKbzdzLyekQT71B+/DvYxmNczkg2ZG0j+NJn84/kA5JbmkvxpMhsyN1hRqpZqFRFxEoW3m5kc0YHFfbrTLdAfA+gW6M/iPt2ZHNGBpV8tpbyqnIDOAfR+tDcA5VXlLP1qqbVFi4iIQ2nM2w1NjujA5IgOZ7TnlebZ3b6hdhERcU/qeXuQiJCIZrWLiIh7Us/bg8yKm0Xyp8mUV5XXtQX5BjErbpbTj71ny2a2rFpJ8dEjtOvYifOvnuT0Y4qIeCv1vD3IhJ4TSL4smciQSAwMIkMiSb4smQk9Jzj1uHu2bObD5c9QfOQwmCbFRw7z2GOP0fG88xgwYIBTjy0i4o3U8/YwE3pOcHpYn27LqpVUnqio13blZZcxIrID677LbNVaRES8gcJbAHjhhRd44YUXACgqKqJHjx5s3ry5Se8tPnrkjLboPheR/t2Oc6olPT2d1NRUioqKCA0NJSEhgZKSknPal4iIJ9JlcwHgv//7v9m1axdffvkl3bp14w9/+EOT39uuYye77SFh5zW7jvT0dFJSUigqKgJqPkikpKSQnp7e7H2JiHgqhbfUM2vWLEaPHk1SUlKT3zPyxpvxCwis12YYBkOTJjf7+Kmpqdhs9Zd6tdlspKamNntfIiKeSuEtdVasWMGPP/5ImzZtWLZsGQD33Xcfo0ePBmqCdcaMGWe8L3pkPGNn3kO7Tp3BMGjXqTPtO4fzq6HDm11DbY+7qe0iIt5IY94CwI4dO1i8eDFbtmwhIyODJ598kt///vds376diooKbDYbn3zyCSNHjrT7/uiR8USPjK97npaWdk51hIaG2g3q0NDQc9qfiIgnUs9bAHjmmWcoKCggPj6eO++8k40bN1JcXExgYCDDhw9n+/btbNmypcHwPt3DDz/M8OHDycjIoFu3brz00ktNel9CQgL+/vVvvuLv709CQkKzz0lExFOp5y0AvPLKK/Wejx49mldeeYXLLruMmJgYNm/ezIEDB4iOjj7rfmpnit9+++3k5eWRkJBATExMk+uo3fb02ebN2YeIiKdTeFPT21u5ciVdu3rX/Y2LUlLIX/I0lbm5+EVGEn7fvYSenKh2xRVXsHjxYl5++WUGDhzIH/7wBwYPHoxhGA3ur3amuM1mIyIiom6mONDsAFdYi4g0zOsvm1dXV/P999/TocOZN/rwZEUpKeQ+OJ/KnBwwTSpzcsh9cD5FJ8N25MiR5ObmMnz4cLp06UJQUFCjl8ztzRT/97//zQMPPOC08xAR8Ube1fNOfwdSF0DRIQjtBgnz+c6nH5MnTyY4ONjq6lpV/pKnMcvL67WZ5eXkL3ma0KQkEhIS6gXxvn37Gt1nQzPCy087joiItIz3hHf6O5Dye7CV1TwvOggpv2dA0jKeeuopa2uzQGVu7lnb1+QVsDAzl+wKG10D/ZnXM9LubUhPVTtT/OOPP2b58uUEBQUREhLChRde6PD6RUS8mfdcNk9d8J/grmUrq2n3Qn6RkQ22r8krYHbGQQ5V2DCBQxU2ZmccZE1ewVn3mZCQQH5+Pt9++y1/+9vfuOGGG8jJyaF3795OOAMREe/lPeFddKh57R4u/L57MYKC6rUZQUGE33cvCzNzKas2671WVm2yMNN+b71WTEwMbdu25eKLLyYwMJDw8HDGjx9PZAMfFJqjR48eDBw4kNjYWIYMGdLi/YmIuDPvuWwe2q3mUrm9di9UO6vc3mzz7M277L4nu8Jmt/1UUVFRBAcHExkZybRp05q1RnpjNm/eTKdO9tdRFxHxJt7T806YD/6nTUrzD65p91KhSUn03pRK9J7v6L0ptS7Quwb6292+ofZTXXHFFbz77rtUVFRQXFxc91Wx1vbss88SGxtLbGwsOTk5ltQgIuIs3hPeMVMhaRmEdgeMmsekZTXtUs+8npEE+/zn+9xmWRnH/vI/HLnjBgYMGMDbb7/d4Hvj4uK44YYbuOOOO5g8eXKTV2RrjGEYjB07lsGDB7N8+fJGt7/77rvZtWsXu3btIioqyiE1iIi4Cu+5bA41Qa2wblTtrPLa2eYhOz9nYM8L+eiNLUDjNwm5//77GTFiBKNGjXJYTVu3biUqKor8/HzGjBlD3759ueKKKxy2fxERd+I9PW9plskRHdh+WX9y42P516/Hc+DTLfz5z39my5YtDd4kpCglhf2jE9gT3Y+KffvqFnxxhNrec3h4ONdffz1ffPGFw/YtIuJuFN7SqIsuuogdO3YwcOBA5s2bx4IFZ3697vQV20ybrd6KbS1RWlpKcXFx3c8ffvghAwYMqLfNhswNjF09lphXYxi7eiwbMje0+LgiIq7Kuy6byznJycmhQ4cOzJgxg7Zt27JixYoztmlsxbaW+Pnnn7n++usBqKys5KabbmLcuHF1r2/I3EDyp8mUV9UcP7c0l+RPkwGY0HNCi44tIuKKWhTehmH8GkgGooGhpmlub2C7LKAYqAIqTdPUF3XdyDfffMOcOXPw8fHB39+f559//oxtGlux7VyU7szn2MYsAgor+GDa/9I+sQchg8LP2G7pV0vrgrtWeVU5S79aqvAWEY/U0p73bmAS8I8mbBtvmuaRFh5PWlHdEqkBXej6wltnXSLVLzKy5pK5nfZzUbozn8K1+zFt1QBUFVZQuHY/wBkBnleaZ3cfDbWLiLi7Fo15m6a5xzTNDEcVI66juUuknm3FtnNxbGNWXXDXMm3VHNuYdca2ESERdvfRULuIiLtrrQlrJvChYRg7DMOY2UrHlBZo7hKpoUlJRD68AL+oKDAMDH9/Ih9ecM7j3VWFFU1unxU3iyDf+h8cgnyDmBU365yOLSLi6gzTNM++gWF8BNjrwtxvmub6k9ukAbPPMuYdZZpmjmEY4cC/gf8xTfPjBradCcwE6NKly+BVq1Y19VxcUklJCW3btrW6jGZLLy5r8LWYdo3fPrWl523LOw5V1We+4OuDf0SbM5qLThSRX5qPrdqGv48/4SHhhAbY/0qbs7nrn3lLeet5g/eeu87b8eLj43c0ZV5Yo+HdFI2F92nbJgMlpmkubmzbIUOGmNu3N7pLl5aWlubQxUpay5BPv+WQnbXMuwX6s/2y/o2+v6XnffqYN4Dh70PYpN52J625Enf9M28pbz1v8N5z13k7nmEYTQpvp182NwwjxDCMdrU/A2OpmegmLuz0JVIBgn0M5vVs+R3CmiJkUDhhk3rjGxYIgG9YoEODu7y8nKFDh3LxxRfTv39/HnroIYfsV0SkNbT0q2LXA38HOgMbDMPYZZpmomEYUcCLpmmOB7oA7xqGUXu8N03T/KCFdYuTnb5EatdA/7PONm8u0zQxTRMfn4Y/P4YMCndaLzswMJBNmzbRtm1bbDYbl19+OVdffTWXXnqpU44nIuJILQpv0zTfBd61054DjD/5cyZwcUuOI9aYHNHBYWENkJWVxdVXX018fDzbtm1j3bp1XHDBBQ7bf3MYhlE3ZmWz2bDZbJz8gCki4vK0PKq0qoyMDG6++WZ27txpWXDXqqqqIjY2lvDwcMaMGcOwYcMsrUdEpKkU3tKqLrjgApe5NO3r68uuXbs4dOgQX3zxBbt3ayqGiLgHhbc0Kisr64wbgZyrkJAQh+zHkcLCwhg1ahQffKCpGCLiHhTe4rGysrLo27cvt99+OwMGDGD69Ol89NFHjBgxgp49e5KamgpAWVkZH330EX379rW4YhGRplF4S5NUVVVxxx130L9/f8aOHUtZWcOLuLiS77//nlmzZpGens7eva6aBbgAABHaSURBVHt58803+eSTT5g1axaTJ08mJiaGSy65hDFjxnDNNddYXa6ISJPolqDSJPv37+ett97if//3f5k6dSpr1qxhxowZZ33Pvs/z2Lb+ACUFFbTtEMjwib1afVz5wgsvZODAgQD079+fhIQEju86zLDs8+ka2JmNN73Y4N3KRERclcJbmuTCCy8kNjYWgMGDB5OVlXXW7StKK9m8di+VJ2pWSCspqGDzG3sBuGhY690wJDAwsO5nHx8fzJxyCvfuxyy2UVlddda7lYmIuCpdNpcmOTUEfX19qaysPOv2JYUVdcFdq/JENdvWH3BKfU1Vtiu/yXcrExFxVQpvcYpqezcVoaYHbqXqUvsfOhq6i5mIiCtSeItT+Pja/6vVtkOg3XZn6NGjR70x9hUrVnDtpYl2t61dQ11ExB1ozFsalv4OpC6gR9Ehdt/WreZ5zFRmz57d6FvbhgXiF+BT79K5X4APwyf2cmbFZzh90tzlcZ0J+vpwvW0Mfx/aJ/Zo1bpERFpCPW+xL/0dSPk9FB0EzJrHlN/XtDdBYIgf8dP71vW023YIJH5631adrLbv8zw2v7G37lJ9SUEFH32cQ/nFnfFtH0BVdRV/+HARiavu4JZHf8fx48dbrTYRkZZQeIt9qQvAdtp3uW1lNe1NdNGwCG752wjufmE0t/xtRKsGN8C29QfsTpr75KvDhN8Vy4GCn5j197+we/93tG/fnueee65V6xMROVcKb7Gv6FDz2l1QQ5Pjatu7d+/OiBEjAJgxYwaffPJJq9UmItISCm+xL7Rb89pdUEOT42rbT78FqG4JKiLuQuEt9iXMB//g+m3+wTXtbmL4xF74BdT/K37qpLmffvqJbdu2AfDWW29x+eWXt3qNIiLnQuEt9sVMhaRlENodMGoek5bVtLuJi4ZFnHXSXHR0NK+++ioxMTEUFBRw1113WVmuiEiT6ati0rCYqW4V1vZcNCyi3kS5dTuz+c2iTeQUlhF18zOMS+zDCy90tbBCEZHmU3iL11i3M5t5a7+hzFYFQHZhGfPWfgPAdYMU4CLiPnTZXLzGExsz6oK7Vpmtiic2ZlhUkYjIuVF4i9fIKbR/D/KG2kVEXJXCW7xGVFhws9pFRFyVwltc0vjx48nJyXHoPuck9iHY37deW7C/L3MS+zj0OCIizqYJa2K5028eMnxiL95//32HH6d2UtoTGzNqZpuHBTMnsY8mq4mI21F4i6Vqbx5SuwZ5SUEFm9/YC+CUtdCvG9RVYS0ibk+XzcVSDd08ZNv6AxZVJCLi+hTeYqnGbh4iIiJnUniLpRq7eYiIiJxJ4S2WauzmIe4kOTmZxYsXW12GiHgBTVgTS9VOSjt9trkzJquJiHgKhbdY7vSbh6xcuZIpdyzGMAxiYmJ47bXXLKxORMT1KLzFpXz77bc8+uijbN26lU6dOlFQUGB1SSIiLkdj3uJSNm3axJQpU+jUqRMAHTp0sLgiERHXo563uBTTNDEMw+oyzklycjIAaWlpltYhIp5PPW9xKQkJCf+/vfuPkbq+8zj+fAuB1bJnleXHquUOq4lo05Nz412uEtlqCBiFux6abfTw1EpIjks0MaFm08Zc0tw11uAZ5Tg0d5a0caEag+TUgi0oF+O1mGjVSotwXM7sWljachAFuvK5P3aWW3GWHTo/vvvZeT6Syc73O5/9ft9vPrO8Zr7znRk2btzIwYMHATxsLkll+MxbY8oVV1xBd3c31157LRMmTGDu3Lk8+eSTI47ft28fixYt4pprruHVV1/lwgsvZNOmTZx9duO/KWzt2rWcc845zJo1q+H7ltRcDG+NCX0fbGLvnu9w9Fgfl1zSztaXvkX7zCUV/e7u3bt56qmnePzxx7nlllt45plnuO222+pcMby7Yxs7etZz+GA/rVPbuLZrGXPmdXrYXFLdGd4qXN8Hm9i1q5sTJz4C4OixXnbt6gaoKMBnz57NlVdeCcBVV13Fvn376lbrkHd3bGPLukcZOD74Ma6H+w+wZd2jpVvzfM1eUj58zVuF27vnOyeDe8iJEx+xd09ln1Y2efL/f5TqhAkTGBgYqGl95ezoWX8yuIcMHD/Gjp71dd+3JBneKtzRY31ntH4sOHyw/4zWS1ItGd4qXMvk9jNaPxa0Tm07o/WSVEuGtwp38efv46yzPnl2+Flnnc3Fn7/vtL93aPNmfnfnXfzg4xPs/vJ1HNq8mfvuu+/k+63raV7XMiZO+uQ3n02cNJl5Xcvqvm9JMrxVuPaZS7jssm/RMvkCIGiZfAGXXXb6s80Pbd5M3ze+yUBvL6TEQG8vfd/4Joc2by47/pFHHmHOnDnceuutNal5zrxOFixfSWvbNIigtW0aC5avZM68zppsX5JOx7PNNSa0z1xS8VvDAPavfph09Ogn1qWjR9m/+mHOvemmT41fs2YNL7zwArNnz6661iFz5nUa1pIK4TNvZWmgr/zJbOXWr1ixgr1797J48WJWr15d79Ikqe585q0sTWxvHzxkXmb9qdauXcuLL77Itm3bTn7hiSTlzGfeytL0e+8hWlo+sS5aWph+7z0FVSRJjeMzb2Vp6HXt/asfZqCvj4nt7Uy/956yr3dL0nhTVXhHxIPATcBxYA9wR0rpt2XGLQT+CZgAPJFS+sdq9ivBYIAb1pKaUbWHzbcCX0gpfRH4JXD/qQMiYgLwGLAIuBz4akRcXuV+JUlqWlU9804pbRm2+BqwtMywq4H3Ukp7ASKiB1gC/LyafUsjOfXbvuZ1LWvIl5VIUqNESqk2G4rYDGxIKX3vlPVLgYUppa+Vlv8a+NOU0soRtrMcWA4wY8aMq3p6empSX1GOHDnClClTii6j4Yrq++iRw/zvgf0Mv19HBH8wbTotU1obUoNz3nyatXf7rr3Ozs7XU0odo40b9Zl3RLwEzCxzU3dKaVNpTDcwAHy/3CbKrBvxEUNKaR2wDqCjoyPNnz9/tBLHtO3bt5N7D7+Povpe97d3cLj/wKfWt7ZNY/lj/9aQGpzz5tOsvdt3cUYN75TS9ae7PSJuB24Erkvln8a/D3xu2PJFwKffoCvVgN/2JakZVHXCWuks8lXA4pTShyMM+ylwaUTMjohJQBfwXDX7lUbit31JagbVnm3+KNAKbI2INyJiLUBEXBARzwOklAaAlcAPgXeBjSmld6rcr1SW3/YlqRlUe7b5JSOs7wVuGLb8PPB8NfuSKjH0RSGnnm3uF4hIGk/8hDWNO37bl6Txzs82lyQpM4a3JEmZMbwlScqM4S1JUmYMb0mSMmN4S5KUGcNbkqTMGN6SJGXG8JYkKTOGtyRJmTG8JUnKjOEtSVJmDG9JkjJjeEuSlBnDW5KkzBjekiRlxvCWJCkzhrckSZkxvCVJyozhLUlSZgxvSZIyY3hLkpQZw1uSpMwY3pIkZcbwliQpM4a3JEmZMbwlScqM4S1JUmYMb0mSMmN4S5KUGcNbkqTMGN6SJGXG8JYkKTOGtyRJmTG8JUnKjOEtSVJmDG9JkjJjeEuSlBnDW5KkzBjekiRlxvCWJCkzhrckSZkxvCVJyozhLUlSZgxvSZIyY3ir7latWsWaNWtOLj/wwAM89NBDBVYkSXkzvFV3XV1dbNiw4eTyxo0bufnmmwusSJLyNrGaX46IB4GbgOPAHuCOlNJvy4zbBxwGPgYGUkod1exXeZk7dy779++nt7eXAwcOcN555zFr1qyiy5KkbFUV3sBW4P6U0kBEfBu4H1g1wtjOlFJ/lftTppYuXcrTTz/NBx98QFdXV9HlSFLWqgrvlNKWYYuvAUurK0fjVVdXF3fffTf9/f28/PLLRZcjSVmLlFJtNhSxGdiQUvpemdv+C/gNkIB/SSmtO812lgPLAWbMmHFVT09PTeorypEjR5gyZUrRZTRcub7vvPNOzj33XFavXl1QVY3hnDefZu3dvmuvs7Pz9UpeWh41vCPiJWBmmZu6U0qbSmO6gQ7gK6nMBiPigpRSb0RMZ/BQ+9+llF4ZrbiOjo60c+fO0YaNadu3b2f+/PlFl9Fw27dvZ8aExI6e9Rw+2E/r1DbmdS1jzrzOokuru2ae82bsG5q3d/uuvYioKLxHPWyeUrp+lB3dDtwIXFcuuEvb6C393B8RzwJXA6OGt/J19Mhhtmx4koHjxwA43H+ALeseBWiKAJekeqrqrWIRsZDBE9QWp5Q+HGHMZyKideg6sAB4u5r9auw78uuDJ4N7yMDxY+zoWV9QRZI0flT7Pu9HgVZga0S8ERFrYfAweUQ8XxozA/iPiHgT+Anw7ymlF6vcr8a4jwcGyq4/fNA3HEhStao92/ySEdb3AjeUru8F/ria/Sg/EyaWv2u1Tm1rcCWSNP74CWuqiynnT2XipMmfWDdx0mTmdS0rqCJJGj8Mb9VFy5RWFixfSWvbNIigtW0aC5av9GQ1SaqBaj9hTRrRnHmdhrUk1YHPvCVJyozhLUlSZgxvSZIyY3hLkpQZw1uSpMwY3pIkZcbwliQpM4a3JEmZMbwlScqM4S1JUmYMb0mSMmN4S5KUGcNbkqTMGN6SJGXG8JYkKTOGtyRJmTG8JUnKjOEtSVJmIqVUdA0jiogDwH8XXUeV2oD+oosoQLP2Dc3be7P2Dc3bu33X3h+mlKaNNmhMh/d4EBE7U0odRdfRaM3aNzRv783aNzRv7/ZdHA+bS5KUGcNbkqTMGN71t67oAgrSrH1D8/berH1D8/Zu3wXxNW9JkjLjM29JkjJjeEuSlBnDu8Yi4sGI2BURP4uIZyPisyOMWxgRv4iI9yLi642us9Yi4uaIeCciTkTEiG+hiIh9EfFWRLwRETsbWWO9nEHv423Oz4+IrRGxu/TzvBHGfVya7zci4rlG11kro81fREyOiA2l2/8zIv6o8VXWRwW9/01EHBg2z18ros5ai4h/jYj9EfH2CLdHRDxS+nf5WUT8SaNqM7xrbyvwhZTSF4FfAvefOiAiJgCPAYuAy4GvRsTlDa2y9t4GvgK8UsHYzpTSlUW/T7KGRu19nM7514EfpZQuBX5UWi7no9J8X5lSWty48mqnwvm7C/hNSukSYDXw7cZWWR9ncN/dMGyen2hokfXzJLDwNLcvAi4tXZYD/9yAmgDDu+ZSSltSSgOlxdeAi8oMuxp4L6W0N6V0HOgBljSqxnpIKb2bUvpF0XUUocLex92cM1j/d0vXvwv8RYG11Fsl8zf83+Np4LqIiAbWWC/j8b5bkZTSK8CvTzNkCbA+DXoN+GxEtDeiNsO7vu4EXiiz/kLgf4Ytv19a1wwSsCUiXo+I5UUX00Djcc5npJT6AEo/p48wriUidkbEaxGRa8BXMn8nx5QewB8Cpjakuvqq9L77V6VDx09HxOcaU1rhCvu7ntiInYw3EfESMLPMTd0ppU2lMd3AAPD9cpsos27Mv2evkr4r8KWUUm9ETAe2RsSu0qPbMa0GvY+7OT+DzcwqzfnFwI8j4q2U0p7aVNgwlcxflnNcgUr62gw8lVI6FhErGDwC8eW6V1a8wubc8P49pJSuP93tEXE7cCNwXSr/Rvr3geGPTC8CemtXYX2M1neF2+gt/dwfEc8yeEhuzId3DXofd3MeEb+KiPaUUl/pUOH+EbYxNOd7I2I7MBfILbwrmb+hMe9HxETgXE5/yDUXo/aeUjo4bPFxxsnr/RUo7O/aw+Y1FhELgVXA4pTShyMM+ylwaUTMjohJQBeQ7Vm4lYqIz0RE69B1YAGDJ3s1g/E4588Bt5eu3w586ghERJwXEZNL19uALwE/b1iFtVPJ/A3/91gK/HiEB++5GbX3U17nXQy828D6ivQcsKx01vmfAYeGXkqqu5SSlxpegPcYfA3kjdJlbWn9BcDzw8bdwODZ6HsYPPRaeO1V9v2XDD4KPQb8CvjhqX0DFwNvli7vjIe+K+19nM75VAbPMt9d+nl+aX0H8ETp+p8Db5Xm/C3grqLrrqLfT80f8PcMPlAHaAF+UPo/4CfAxUXX3MDe/6H0N/0msA24rOiaa9T3U0Af8LvS3/hdwApgRen2YPBM/D2l+3dHo2rz41ElScqMh80lScqM4S1JUmYMb0mSMmN4S5KUGcNbkqTMGN6SJGXG8JYkKTP/B0qpa3P8kyeYAAAAAElFTkSuQmCC\n",
"text/plain": [
"<matplotlib.figure.Figure at 0x66c7bcc0>"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"%matplotlib inline\n",
"from matplotlib import font_manager, rc\n",
"font_fname = './data/D2Coding.ttf' \n",
"font_name = font_manager.FontProperties(fname=font_fname).get_name()\n",
"rc('font', family=font_name)\n",
"\n",
"import matplotlib.pyplot as plt\n",
"plt.figure(figsize=(8,8))\n",
"for i, label in enumerate(word_list):\n",
" x, y = trained_embeddings[i]\n",
" plt.scatter(x, y)\n",
" plt.annotate(label, xy=(x, y), xytext=(5, 2),\n",
" textcoords='offset points', ha='right', va='bottom')\n",
"plt.grid(True)\n",
"plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"![](https://i.imgur.com/eWw448c.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "0z6obd8j_MD4",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 4.중간결과 요약\n",
"\n",
"+ 오류원인 : 한글 자료에 최적화된 tool을 영문자료에 넣은 경우 발생함\n",
"+ 해결방안 : 한글 자료로 번역(or 한글 자료를 입력함)\n",
"+ 확인결과 : http://nbviewer.jupyter.org/gist/KyoungHa-Park/ee4339362f63de7ab80335035794795b\n",
"\n",
" (요약 : Naver 영화평가 자료 역시, DB가 작은 것으로 보임)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LZW7PUJV1EUD",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 5.활용 : 2차원으로 시각화 / 다른 감정사전에 대한 확보\n",
"\n",
"+ 2차원으로 시각화(TEB)\n",
" + 감성 분류의 경우, 특정한 분류 형태(기준)이 있을 것이다.\n",
" + 심리학의 [감정의 바퀴]와 같은 방식으로 배열이 있음\n",
" + 감정 분류에 대해, Word2Vec 결과와 같이 2차원에 배열하는 것이 알아보기 편하다.\n",
" + AAE의 결과는, [감정의 바퀴] 유사한 pattern을 보이고 있음\n",
" + 이에, 감정분류(Word2Vec 결과) -> AAE 로직을 적용을 해 보고자 함\n",
" ![Alt text](https://i.imgur.com/WrTiUqN.jpg)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "qlZT_kR-1EUF",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"+ 다른 감정사전에 대한 확보(TEB)\n",
" + 사전이 작은 경우, 큰 사전을 찾아보면 된다.\n",
" + 서울대에서 연구용으로 제작한 자료가 있음(Korean Sentiment Lexicon)\n",
" + 해당 사전의 활용을 위해서는, 단어에 대해 품사태깅/조사분류/ngram 모형으로 활용이 병행되어야 한다."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "HGF2X7Ib1EUH",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ngram</th>\n",
" <th>freq</th>\n",
" <th>Agreement</th>\n",
" <th>Argument</th>\n",
" <th>Emotion</th>\n",
" <th>Intention</th>\n",
" <th>Judgment</th>\n",
" <th>Others</th>\n",
" <th>Speculation</th>\n",
" <th>max.value</th>\n",
" <th>max.prop</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>가*/JKS</td>\n",
" <td>1</td>\n",
" <td>0.000000</td>\n",
" <td>1.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>Argument</td>\n",
" <td>1.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>가*/VV</td>\n",
" <td>3</td>\n",
" <td>0.000000</td>\n",
" <td>0.333333</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.666667</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>Judgment</td>\n",
" <td>0.666667</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>가/JKC</td>\n",
" <td>17</td>\n",
" <td>0.058824</td>\n",
" <td>0.352941</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.588235</td>\n",
" <td>0.000000</td>\n",
" <td>0.0</td>\n",
" <td>Judgment</td>\n",
" <td>0.588235</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>가/JKS</td>\n",
" <td>112</td>\n",
" <td>0.008929</td>\n",
" <td>0.330357</td>\n",
" <td>0.053571</td>\n",
" <td>0.008929</td>\n",
" <td>0.571429</td>\n",
" <td>0.026786</td>\n",
" <td>0.0</td>\n",
" <td>Judgment</td>\n",
" <td>0.571429</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>가/VV</td>\n",
" <td>11</td>\n",
" <td>0.000000</td>\n",
" <td>0.727273</td>\n",
" <td>0.000000</td>\n",
" <td>0.000000</td>\n",
" <td>0.181818</td>\n",
" <td>0.090909</td>\n",
" <td>0.0</td>\n",
" <td>Argument</td>\n",
" <td>0.727273</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ngram freq Agreement Argument Emotion Intention Judgment Others \\\n",
"0 가*/JKS 1 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000 \n",
"1 가*/VV 3 0.000000 0.333333 0.000000 0.000000 0.666667 0.000000 \n",
"2 가/JKC 17 0.058824 0.352941 0.000000 0.000000 0.588235 0.000000 \n",
"3 가/JKS 112 0.008929 0.330357 0.053571 0.008929 0.571429 0.026786 \n",
"4 가/VV 11 0.000000 0.727273 0.000000 0.000000 0.181818 0.090909 \n",
"\n",
" Speculation max.value max.prop \n",
"0 0.0 Argument 1.000000 \n",
"1 0.0 Judgment 0.666667 \n",
"2 0.0 Judgment 0.588235 \n",
"3 0.0 Judgment 0.571429 \n",
"4 0.0 Argument 0.727273 "
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"#자료출처 : http://word.snu.ac.kr/kosac/lexicon.php\n",
"\n",
"import pandas as pd\n",
"df = pd.read_csv('./subjectivity-type.csv')\n",
"df.head(5)"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "b6PDcZU41EUO",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>ngram</th>\n",
" <th>freq</th>\n",
" <th>Agreement</th>\n",
" <th>Argument</th>\n",
" <th>Emotion</th>\n",
" <th>Intention</th>\n",
" <th>Judgment</th>\n",
" <th>Others</th>\n",
" <th>Speculation</th>\n",
" <th>max.value</th>\n",
" <th>max.prop</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>16357</th>\n",
" <td>힘겹/VA;게/EC;버티/VV</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Argument</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16358</th>\n",
" <td>힘들/VA;고/EC;외롭/VA</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Argument</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16359</th>\n",
" <td>힘들/VA;ㄹ/ETM;것/NNB</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Argument</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16360</th>\n",
" <td>힘들/VA;ㄹ/ETM;때/NNG</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Argument</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16361</th>\n",
" <td>힘차/VA;ㄴ/ETM;붓/NNG</td>\n",
" <td>1</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>1.0</td>\n",
" <td>0.0</td>\n",
" <td>0.0</td>\n",
" <td>Judgment</td>\n",
" <td>1.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" ngram freq Agreement Argument Emotion Intention \\\n",
"16357 힘겹/VA;게/EC;버티/VV 1 0.0 1.0 0.0 0.0 \n",
"16358 힘들/VA;고/EC;외롭/VA 1 0.0 1.0 0.0 0.0 \n",
"16359 힘들/VA;ㄹ/ETM;것/NNB 1 0.0 1.0 0.0 0.0 \n",
"16360 힘들/VA;ㄹ/ETM;때/NNG 1 0.0 1.0 0.0 0.0 \n",
"16361 힘차/VA;ㄴ/ETM;붓/NNG 1 0.0 0.0 0.0 0.0 \n",
"\n",
" Judgment Others Speculation max.value max.prop \n",
"16357 0.0 0.0 0.0 Argument 1.0 \n",
"16358 0.0 0.0 0.0 Argument 1.0 \n",
"16359 0.0 0.0 0.0 Argument 1.0 \n",
"16360 0.0 0.0 0.0 Argument 1.0 \n",
"16361 1.0 0.0 0.0 Judgment 1.0 "
]
},
"execution_count": 15,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail(5)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "oXvCQDMlpGTB",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# 주제2 : \"문장\" 기반의 감정분류(거시적 접근)\n",
"\n",
"1. idea : 감정분석이 된 문장을 찾자\n",
"2. 접근 : 문장을 합쳐서 accuracy를 높히고자 함(data 부족 이슈 해결)\n",
"3. 모형 : 기본 NN모형 / CNN 모형을 이용\n",
"4. 결과 : NN 모형은 20문장, CNN 모형은 10문장 합할 경우 accuracy가 95%를 넘음\n",
"5. 활용 : RNN 모형 추가 / 개별 감정별로 모형을 따로 만듬"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "gYh-b6sv7L3Y",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 1.idea : 감정분석이 된 문장을 찾자\n",
"\n",
"+ IBM에서 제공하는 문장자료를 이용\n",
" + 단 영문이므로, 이를 한글로 번역하는 작업을 1차로 거침\n",
" \n",
"2. 접근 : 문장을 합펴서 accuracy를 높이는 방향으로 진행함\n",
"3. 모형 : 기본 NN모형을 통해 분석을 진행함"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "EqfQxe_750r9",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2.접근 : 문장을 합쳐서 accuracy를 조정함\n",
"\n",
"1. 한 문장으로는 부적함 \n",
" + 20개 문장을 하나로 합쳐서 하나의 data set으로 설정함 \n",
" + 단일 문장으로 training시, Accuracy = 40% 수준으로 나옴\n",
"\n",
"\n",
"2. NN모형으로 할 경우 Accuracy가 높게 나옴\n",
"\n",
" + 모형별/문장set을 나누어 비교, 각각의 정확성을 확인해 보고자 함\n",
"\n",
"\n",
"3. 허형완 연구원님 작성 code(금번 발표에는 일부 발췌)\n",
"\n",
" + http://nbviewer.jupyter.org/gist/KyoungHa-Park/3de95a6245e4e9522ec2c937a785a607"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "ZmIvXzgZ4BtN",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"\n",
"## 3.모형 : NN 및 CNN\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LPpLYEYW-BmA",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"#### 1.데이터 전처리"
]
},
{
"cell_type": "code",
"execution_count": 31,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "1fGV1hWYp9Ti",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Loading complete!\n"
]
}
],
"source": [
"# code\n",
"from keras.models import Sequential\n",
"from keras.layers import Dense, Dropout, Activation\n",
"from keras.wrappers.scikit_learn import KerasClassifier\n",
"from keras.utils import np_utils\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn import model_selection, metrics\n",
"import json\n",
"import numpy as np\n",
"\n",
"print('Loading complete!')"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"285 285\n"
]
}
],
"source": [
"data = json.load(open(\"./data.json\"))\n",
"X = data[\"X\"]\n",
"Y = data[\"Y\"]\n",
"\n",
"X_train, X_test, Y_train, Y_test = train_test_split(X,Y)\n",
"\n",
"Y_train = np_utils.to_categorical(Y_train, nb_classes)\n",
"#Y_test = np_utils.to_categorical(Y_test, nb_classes)\n",
"print(len(X_train), len(Y_train))\n",
"\n",
"X_train = np.array(X_train)\n",
"X_test = np.array(X_test)\n",
"\n",
"s = np.arange(X_train.shape[0])\n",
"np.random.shuffle(s)\n",
"\n",
"X_train = X_train[s]\n",
"Y_train = Y_train[s]"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "vd8XUlQRAvD0",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>X</th>\n",
" <th>Y</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>376</th>\n",
" <td>[0, 1, 0, 0, 0, 6, 18, 0, 0, 0, 0, 0, 0, 0, 0,...</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>377</th>\n",
" <td>[0, 4, 0, 0, 0, 11, 26, 0, 0, 1, 0, 0, 0, 0, 0...</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>378</th>\n",
" <td>[0, 1, 0, 1, 0, 5, 18, 0, 0, 0, 0, 0, 0, 0, 0,...</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>379</th>\n",
" <td>[0, 1, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
" <td>6</td>\n",
" </tr>\n",
" <tr>\n",
" <th>380</th>\n",
" <td>[0, 0, 0, 0, 0, 2, 8, 0, 0, 0, 0, 0, 0, 0, 0, ...</td>\n",
" <td>6</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" X Y\n",
"376 [0, 1, 0, 0, 0, 6, 18, 0, 0, 0, 0, 0, 0, 0, 0,... 6\n",
"377 [0, 4, 0, 0, 0, 11, 26, 0, 0, 1, 0, 0, 0, 0, 0... 6\n",
"378 [0, 1, 0, 1, 0, 5, 18, 0, 0, 0, 0, 0, 0, 0, 0,... 6\n",
"379 [0, 1, 0, 0, 0, 3, 2, 0, 0, 0, 0, 0, 0, 0, 0, ... 6\n",
"380 [0, 0, 0, 0, 0, 2, 8, 0, 0, 0, 0, 0, 0, 0, 0, ... 6"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"pd.DataFrame(data).tail()"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-eFQKq0A-Fli",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### 2.주요 파라미터 설정"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "SLYOiqqo3dbu",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"285 285\n"
]
}
],
"source": [
"max_words = 20259\n",
"nb_classes = 7\n",
"\n",
"batch_size = 64\n",
"nb_epoch = 10"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "-n0KxyLD-LWB",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"#### 3.모델설정(Kaggle)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "8tJwv-6T5rvM",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"def build_nn():\n",
" model = Sequential()\n",
" model.add(Dense(512, input_shape=(max_words,)))\n",
" model.add(Activation('relu'))\n",
" model.add(Dropout(0.05))\n",
" model.add(Dense(nb_classes))\n",
" model.add(Activation('softmax'))\n",
" model.compile(loss='categorical_crossentropy',\n",
" optimizer='adam',\n",
" metrics=['accuracy'])\n",
" \n",
" return model"
]
},
{
"cell_type": "code",
"execution_count": 34,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "JrlRyPJi3ieW",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Epoch 1/10\n",
"285/285 [==============================] - 4s 14ms/step - loss: 1.6972 - acc: 0.4351\n",
"Epoch 2/10\n",
"285/285 [==============================] - 2s 7ms/step - loss: 0.1728 - acc: 1.0000\n",
"Epoch 3/10\n",
"285/285 [==============================] - 2s 6ms/step - loss: 0.0160 - acc: 1.0000\n",
"Epoch 4/10\n",
"285/285 [==============================] - 2s 6ms/step - loss: 0.0037 - acc: 1.0000\n",
"Epoch 5/10\n",
"285/285 [==============================] - 2s 6ms/step - loss: 0.0012 - acc: 1.0000\n",
"Epoch 6/10\n",
"285/285 [==============================] - 2s 6ms/step - loss: 6.1784e-04 - acc: 1.0000\n",
"Epoch 7/10\n",
"285/285 [==============================] - 2s 8ms/step - loss: 3.7409e-04 - acc: 1.0000\n",
"Epoch 8/10\n",
"285/285 [==============================] - 2s 7ms/step - loss: 3.0957e-04 - acc: 1.0000\n",
"Epoch 9/10\n",
"285/285 [==============================] - 2s 7ms/step - loss: 1.7786e-04 - acc: 1.0000\n",
"Epoch 10/10\n",
"285/285 [==============================] - 2s 7ms/step - loss: 1.4692e-04 - acc: 1.0000\n",
"Wall time: 23.6 s\n"
]
}
],
"source": [
"%%time\n",
"model = KerasClassifier(build_fn=build_nn)\n",
"model.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch)"
]
},
{
"cell_type": "code",
"execution_count": 36,
"metadata": {
"scrolled": false,
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"정답률 = 0.9479166666666666\n",
"리포트 = precision recall f1-score support\n",
"\n",
" 0 1.00 0.88 0.94 17\n",
" 1 0.92 0.92 0.92 13\n",
" 2 1.00 1.00 1.00 16\n",
" 3 1.00 0.94 0.97 16\n",
" 4 1.00 1.00 1.00 17\n",
" 5 1.00 1.00 1.00 9\n",
" 6 0.64 0.88 0.74 8\n",
"\n",
"avg / total 0.96 0.95 0.95 96\n",
"\n"
]
}
],
"source": [
"y = model.predict(X_test)\n",
"ac_score = metrics.accuracy_score(Y_test, y)\n",
"cl_report = metrics.classification_report(Y_test, y)\n",
"\n",
"print(\"정답률 =\", ac_score)\n",
"print(\"리포트 =\", cl_report)"
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "qJhI2rqgD8yd",
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"emotion_str_arr = ['Anger', 'Disgust', 'Fear', 'Guilt', 'Joy', 'Sadness', 'Shame']\n",
"\n",
"TEST_SENTENCE = '이번 크리스마스에는 여자친구와 함께 스키장에 가서 스키를 타고, 와인을 마시며 분위기를 낼것이다.'\n",
"X_input = wordDictionary.GetInputData(TEST_SENTENCE)\n",
"X_input = np.array(X_input)\n",
"X_input = X_input.reshape(1, X_input.size)\n",
"\n",
"if LEARNING_MODEL_NAME == 'cnn':\n",
" X_input = d_2d_to_3d(X_input)\n",
"elif LEARNING_MODEL_NAME == 'rnn':\n",
" X_input = d_2d_to_3d(X_input, False)\n",
"\n",
"y = model.predict(X_input)\n",
"print(\"예측 = \", emotion_str_arr[y[0]])"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "T7UZn41S6F2W",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 4.중간결과 요약\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "fBO8gdne-w0Z",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"- 문장 set / 모형을 다르게 한 경우, Accuracy를 비교하면 아래와 같음"
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "y1YpW7KF5fR8",
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"감정분석 데이터 유형별 정확도 비교\n",
"\n",
"========================================\n",
"NN (장점 : 학습시간이 빠르다. 단점 : 데이터의 작은 변화에도 학습이 잘 안된다.)\n",
"=====\n",
"\n",
"[1줄] 정답률 = 0.4715\n",
"[2줄] 정답률 = 0.6384\n",
"[5줄] 정답률 = 0.8151\n",
"[10줄] 정답률 = 0.9015\n",
"[20줄] 정답률 = 0.9690\n",
"[30줄] 정답률 = 1.0000"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"========================================\n",
"CNN (장점 : 데이터의 작은 변화에도 학습이 잘된다. 단점 : 학습시간이 오래걸린다)\n",
"=====\n",
"[1줄] 정답률 = 0.6382\n",
"[2줄] 정답률 = 0.8014\n",
"[5줄] 정답률 = 0.9010\n",
"[10줄] 정답률 = 0.9689\n",
"[20줄] 정답률 = 0.9587\n",
"[30줄] 정답률 = 0.9846"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"outputs": [],
"source": [
"========================================\n",
"RNN (학습자체가 안되는 문제 발생)\n",
"====="
]
},
{
"cell_type": "code",
"execution_count": 0,
"metadata": {
"colab": {},
"colab_type": "code",
"id": "rV2KTo2cD3Hn",
"scrolled": true,
"slideshow": {
"slide_type": "-"
}
},
"outputs": [],
"source": [
"# tokenizing된 사전\n",
"\n",
"data_raw = json.load(open(\"./word-dic.json\"))\n",
"# data_raw \n",
"# {'누군가와': 2765,\n",
"# '순찰': 11024,\n",
"# '겪어야하는': 4659,\n",
"# '버림.': 18423,\n",
"# '공중에': 6278,\n",
"# '치료할': 7055,\n",
"# '사랑을': 6918,\n",
"# '친애하는': 1844,\n",
"# '장착': 8912,"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "HVeOJZiK-0PN",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 5.활용 : 개별 감정에 대한 분류모델 설정 / RNN 모형 추가\n",
"\n",
"+ 한개의 모델로, 7가지 감정을 분류함\n",
" + 7개의 모델로, 각각의 특화된 감정분류를 진행, 결과를 비교하고자 함\n",
" + 이를 통해, 데이터(사전)의 부족한 점을 극복하고자 함\n",
"+ RNN방법을 추가로 적용하고, 결과를 추가로 비교하고자 함\n",
" + 금번의 경우 RNN 적용시 error발생으로 결과를 얻지 못함\n",
" + RNN의 경우, 구체적으로 \"학습\" 자체가 안되는 문제 발생함\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"![](https://i.imgur.com/962frJq.png)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "w1-jhaA2plKE",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# Deep Learning 기본내용 정리"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "8DTqOuvjqYte",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"## 1.NLP 내용 및 사례"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "9SA7U4kJqiGH",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"### 1. NLP 목적별 활용사례"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "E06jMPOb9dEa",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"<table> \n",
" <tbody> \n",
" <tr> \n",
" <td style=\"background-color: #ccc\">목적</td> \n",
" <td style=\"background-color: #ccc\">사례</td> \n",
" <td style=\"background-color: #ccc\">target</td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>품사태깅</strong></td> \n",
" <td > \n",
" <ul> 명사구에 대한 분류</ul>\n",
" </td> \n",
" <td > \n",
" <ul> </ul> \n",
" </td> \n",
" </tr> .\n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>파싱(구문분석)</strong></td> \n",
" <td > \n",
" <ul> 고유명사 인지\t</ul>\n",
" </td> \n",
" <td > \n",
" <ul> check </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>개체명 인식</strong></td> \n",
" <td > \n",
" <ul> 사전 만들기</ul>\n",
" </td> \n",
" <td > \n",
" <ul> check </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>의미역 결정</strong></td> \n",
" <td > \n",
" <ul> 신문기사 headline 추출</ul>\n",
" </td> \n",
" <td > \n",
" <ul> </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>감성분류</strong></td> \n",
" <td > \n",
" <ul> 긍정/부정에 대한 스코어링</ul>\n",
" </td> \n",
" <td > \n",
" <ul> check </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>번역\t</strong></td> \n",
" <td > \n",
" <ul> 영어-한글 번역</ul>\n",
" </td> \n",
" <td > \n",
" <ul> </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>질의응답\t</strong></td> \n",
" <td > \n",
" <ul> QA 데이터셋 기반 반응</ul>\n",
" </td> \n",
" <td > \n",
" <ul> </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>대화 시스템\t\t</strong></td> \n",
" <td > \n",
" <ul> 쳇봇</ul>\n",
" </td> \n",
" <td > \n",
" <ul> </ul> \n",
" </td> \n",
" </tr> "
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "a3L_20rFqSX-",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 2.한글/영문 처리모듈 비교"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "B1IL6y_KnWlA",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"<table> \n",
" <tbody> \n",
" <tr> \n",
" <td style=\"background-color: #ccc\">구분</td> \n",
" <td style=\"background-color: #ccc\">English</td> \n",
" <td style=\"background-color: #ccc\">한글</td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>Tokenize</strong></td> \n",
" <td > \n",
" <ul> nltk(regexp_tokenize)</ul>\n",
" </td> \n",
" <td > \n",
" <ul> KoNLPy(Twitter) </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>POS(tagging)</strong></td> \n",
" <td > \n",
" <ul> nltk.pos_tag\t</ul>\n",
" </td> \n",
" <td > \n",
" <ul> KoNLPy(Twitter) </ul> \n",
" </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>POS(chunking)\t</strong></td> \n",
" <td > \n",
" <ul> nltk.RegexpParser\t\t</ul>\n",
" </td> \n",
" <td > \n",
" <ul> nltk.RegexpParser </ul> \n",
" </td> \n",
" </tr> \n",
"</table>\n",
"\n",
" *출처 : https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP/"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "LH037tGVqm2Q",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"### 3.프래임별 내용비교"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "_8bvsKbDnjQ2",
"slideshow": {
"slide_type": "-"
}
},
"source": [
"<!-- <table style=\"height: 882px\" border=\"1\" width=\"702\"> -->\n",
"<table> \n",
" <tbody> \n",
" <tr> \n",
" <td style=\"background-color: #ccc\">프레임워크</td> \n",
" <td style=\"background-color: #ccc\">장점</td> \n",
" <td style=\"background-color: #ccc\">단점</td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\"><strong>Theano</strong></td> \n",
" <td> \n",
" <ul> \n",
" <li>Python 지원</li> \n",
" <li>Wrapper 를 통한 높은 추상화로 사용성 편리</li> \n",
" <li>여러 에코시스템이 존재</li> \n",
" <li>&nbsp;연구용으로 많이 사용됨</li> \n",
" </ul> </td> \n",
" <td > \n",
" <ul> \n",
" <li>Theano자체는 로우레벨 라이브러리</li> \n",
" <li>큰 규모 모델에 많은 컴파일 시간</li> \n",
" <li>torch에 비해 매우 큰 라이브러리</li> \n",
" <li>에러메시지가 부정확</li> \n",
" </ul> </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>Torch</strong></td> \n",
" <td > \n",
" <ul> \n",
" <li>모듈화된 라이브러리로 상호 연계가 쉬움</li> \n",
" <li>GPU지원, 본인 레이어 타입 작성이 편리</li> \n",
" <li>선훈련된 모델들이 많음</li> \n",
" </ul> </td> \n",
" <td > \n",
" <ul> \n",
" <li>Lua 기반</li> \n",
" <li>회귀 뉴럴 네트워크에 적합하지 않음</li> \n",
" <li>문서화 부실</li> \n",
" </ul> </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>TensorFlow</strong></td> \n",
" <td> \n",
" <ul> \n",
" <li>Python + Numpy</li> \n",
" <li>컴퓨팅 그래프 추상화</li> \n",
" <li>Theano보다 빠른 컴파일</li> \n",
" <li>시각화를 위한 TensorBoard</li> \n",
" <li>데이터와 모델의 병렬화</li> \n",
" </ul> </td> \n",
" <td> \n",
" <ul> \n",
" <li>다른 프레임워크보다 느림</li> \n",
" <li>Torch보다 훨씬 큰 라이브러리</li> \n",
" <li>선 훈련된 모델이 적음</li> \n",
" <li>계산 그래프가 Python으로 되어 있어서 느림</li> \n",
" <li>도구로서의 기능이 약함</li> \n",
" </ul> </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>Caffe</strong></td> \n",
" <td > \n",
" <ul> \n",
" <li>이미지 프로세싱에 적합</li> \n",
" <li>잘 튜닝된 네트워크</li> \n",
" <li>코드 작성없이 모델 트레이닝 가능</li> \n",
" <li>Python인터페이스가 유용</li> \n",
" </ul> </td> \n",
" <td > \n",
" <ul> \n",
" <li>GPU를 위해서는 C++/CUDA작성 필요</li> \n",
" <li>회귀 네트워크에는 부적합</li> \n",
" <li>큰 네트워크에는 부적절</li> \n",
" <li>확장성이 떨어짐</li> \n",
" </ul> </td> \n",
" </tr> \n",
" <tr> \n",
" <td style=\"background-color: #eee\" ><strong>MxNet</strong></td> \n",
" <td> \n",
" <ul> \n",
" <li>혼합 패러다임 지원(symbolic/imperative)</li> \n",
" <li>자동 미분화</li> \n",
" <li>GPU, mobile에서도 동작</li> \n",
" <li>여러 언어 지원<br>\n",
" (C++, Python, R, Scala, Julia, Matlab and Javascript)</li> \n",
" <li>최적화된 C++ 엔진으로 좋은 성능</li> \n",
" </ul> </td> \n",
" <td> \n",
" <ul> \n",
" <li>로우 레벨 텐서 연산자가 적음</li> \n",
" <li>흐름 제어 연산자 지원하지 않음</li> \n",
" <li>컴파일 세팅에 따라 결과가 달라짐.</li> \n",
" <li>자신의 커스컴 레이어 생성을 위해서는<br>\n",
" 어느정도 백엔드 텐서 라이브러리 이해가 필요</li> \n",
" </ul> </td> \n",
" </tr> \n",
" </tbody> \n",
" </table>\n",
" \n",
" *출처 :AWS로 딥 러닝을 위한 프레임워크 MxNet 활용하기\n",
" \n",
"(https://aws.amazon.com/ko/blogs/korea/aws-deep-learning-framework-mxnet/)"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "s97nuR3P7eo9",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"## 2.딥러닝 기본 : Tensor Flow 기반\n",
"\n",
"+ 임동조 연구원님 작성 code\n",
"\n",
"+ http://nbviewer.jupyter.org/gist/KyoungHa-Park/7151f5e44e78e63033a970b89ae478cc/01_tensorflow_basic.ipynb"
]
},
{
"cell_type": "markdown",
"metadata": {
"colab_type": "text",
"id": "2LSvWvqy8f7M",
"slideshow": {
"slide_type": "slide"
}
},
"source": [
"# 출처/참조\n",
"\n",
"+ http://hero4earth.com/blog/learning/2018/01/17/NLP_Basics_01/\n",
"+ https://www.lucypark.kr/courses/2015-dm/text-mining.html\n",
"+ http://freesearch.pe.kr/archives/4828\n",
"+ https://ratsgo.github.io/natural%20language%20processing/2017/08/16/deepNLP\n",
"+ http://blog.naver.com/PostView.nhn?blogId=samsjang&logNo=220985170721&categoryNo=0&parentCategoryNo=0&viewDate=&currentPage=1&postListTopCurrentPage=1&from=postView"
]
}
],
"metadata": {
"accelerator": "GPU",
"celltoolbar": "Slideshow",
"colab": {
"collapsed_sections": [],
"name": "emotion alaysis(KOR)_1_v3.ipynb",
"provenance": [],
"toc_visible": true,
"version": "0.3.2"
},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment