Last active
August 24, 2017 11:09
-
-
Save dongsam/5284ec14756551beb2c79076e86db210 to your computer and use it in GitHub Desktop.
https://github.com/dongsam/CryptoCurrency-Analysis [채팅 텍스트 데이터 감성분석]
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 채팅 text 학습을 통한 Sentiment Analysis" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 목표\n", | |
"- 세계 최대 cryptocurrency 거래소인 poloniex 의 채팅데이터를 nltk 기반 sentimetn labeling 후 doc2vec 으로 학습하여 긍부정평가 및 도메인 특화 긍부정 사전 획득" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"Python 버전 3.6.1 기준" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'3.6.1 |Continuum Analytics, Inc.| (default, May 11 2017, 13:04:09) \\n[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)]'" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import sys\n", | |
"sys.version" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 긍부정 판단 ( NLTK 의 SentimentIntensityAnalyzer 사용 )\n", | |
"### [vaderSentiment](https://github.com/cjhutto/vaderSentiment) 기반\n", | |
"VADER Sentiment Analysis. VADER (Valence Aware Dictionary and sEntiment Reasoner) is a lexicon and rule-based sentiment analysis tool that is specifically attuned to sentiments expressed in social media, and works well on texts from other domains." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"import nltk" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'3.2.4'" | |
] | |
}, | |
"execution_count": 3, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"nltk.__version__" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from nltk.sentiment.vader import SentimentIntensityAnalyzer" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"sid = SentimentIntensityAnalyzer()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'compound': 0.4404, 'neg': 0.0, 'neu': 0.58, 'pos': 0.42}" | |
] | |
}, | |
"execution_count": 6, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sid.polarity_scores(\"Bitcoin Is Better Than Gold\") \n", | |
"# # https://www.forbes.com/sites/panosmourdoukoutas/2017/03/04/bitcoin-is-better-than-gold/#52d2468c5f04" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"* [compound] compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules\n", | |
"* [pos] positive sentiment: compound score >= 0.5\n", | |
"* [neu] neutral sentiment: (compound score > -0.5) and (compound score < 0.5)\n", | |
"* [neg] negative sentiment: compound score <= -0.5" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"#### 5000여개의 뉴스, 커뮤니티 제목, 본문 대상으로 긍부정 판단 테스트" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"with open('coin_text_list.txt') as fp: # 대상 텍스트가 담긴 파일 \n", | |
" text_list = fp.read().split('\\n') # 한 line에 하나의 텍스트가 저장되어 있음, 한줄씩 split 하여 load" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"5304" | |
] | |
}, | |
"execution_count": 8, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(text_list)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"res_list = [] # 5000여개 텍스트에 대해서 sentiment polarity 를 계산하여 텍스트와 결과를 묶어 리스트로 저장\n", | |
"for i in text_list:\n", | |
" res = sid.polarity_scores(i)\n", | |
" res['text'] = i\n", | |
" res_list.append(res)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"5304" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(res_list)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'compound': 0.4404,\n", | |
" 'neg': 0.0,\n", | |
" 'neu': 0.828,\n", | |
" 'pos': 0.172,\n", | |
" 'text': 'BAT ICO gasPrice analysis, good timing and reasonable gasPrice is enough to get you in'}" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"sorted(res_list, key=lambda x:x['pos'] ,reverse=True)[1000] # 긍정, 부정으로 정렬하여 분포 및 갯수 파악 " | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 채팅 데이터 crawling \n", | |
"### BeautifulSoup 를 사용하여 html 구조 파싱\n", | |
"poloniex 거래소의 chat data 를 dump 해놓은 사이트인 [polonibox.com](http://www.polonibox.com/) 를 페이지단위로 저장해 놓은 html 파일을 읽어서 파싱" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from bs4 import BeautifulSoup\n", | |
"import os\n", | |
"import dateutil.parser" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"html_dir_path = '/Users/dongsamb/coin_data/polonibox_html' # html 들이 있는 directory path 설정\n", | |
"html_file_list = os.listdir(html_dir_path)[1:]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"html_list = [] # 모든 html 파일들을 읽어서 html text 만 list 형태로 저장\n", | |
"for i in html_file_list:\n", | |
" full_path = \"{}/{}\".format(html_dir_path, i)\n", | |
" with open(full_path) as fp:\n", | |
" text = fp.read()\n", | |
" if len(text) > 1000: # html 길이가 1000보다 작은것은 가져올 때 해당 웹사이트 서버상의 문제로 제대로 가져오지 못한 에러페이지라서 제외 \n", | |
" html_list.append(text)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"17247" | |
] | |
}, | |
"execution_count": 15, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(html_list) # 총 html 갯수" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"추출한 html text 를 입력하면 chat 의 username, message, date, user 의 reputation 을 파싱하여 dict로 구조화하여 반환하는 함수작성" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"def get_chat_list(html):\n", | |
" try:\n", | |
" soup = BeautifulSoup(html, \"html.parser\")\n", | |
" res_list = []\n", | |
" except Exception as e:\n", | |
" print(e)\n", | |
" return []\n", | |
" for i in soup.find('tbody').find_all('tr'):\n", | |
" try:\n", | |
" res_dic = {}\n", | |
" tds = i.find_all('td')\n", | |
" res_dic['reputation'] = int(tds[0].span.text.strip())\n", | |
" res_dic['username'] = tds[0].a.text.strip()\n", | |
" res_dic['message'] = tds[1].text.strip()\n", | |
" res_dic['date_str'] = tds[0].find_all('span')[1]['title']\n", | |
" res_dic['date'] = dateutil.parser.parse(res_dic['date_str'])\n", | |
" res_dic['message_id'] = int(tds[0].find_all('a')[1]['href'].split('=')[1])\n", | |
" res_list.append(res_dic)\n", | |
" except Exception as e:\n", | |
" print(e)\n", | |
" continue\n", | |
" return res_list" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[{'date': datetime.datetime(2017, 6, 4, 8, 8, 37, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:37 UTC',\n", | |
" 'message': 'chrisjlabrie, i agree:)',\n", | |
" 'message_id': 18568553,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'sysoyoung333'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 37, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:37 UTC',\n", | |
" 'message': 'FlatEarth, nah i had them for long time...',\n", | |
" 'message_id': 18568552,\n", | |
" 'reputation': 20,\n", | |
" 'username': 'Larillo'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 35, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:35 UTC',\n", | |
" 'message': 'Please, keep the language in the TrollBox to English. Thank you for your understanding. A message from your local MOD SQUAD.',\n", | |
" 'message_id': 18568551,\n", | |
" 'reputation': 1170,\n", | |
" 'username': 'Popcorntime'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 33, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:33 UTC',\n", | |
" 'message': 'This is a trollbox not a rocket launch command centre, please avoid rockets/fly/UP/Moon as much as possible',\n", | |
" 'message_id': 18568550,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 32, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:32 UTC',\n", | |
" 'message': 'ex7231, SC for real and DGB for speculation bc kids wants quick money',\n", | |
" 'message_id': 18568549,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'CyberKing'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 32, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:32 UTC',\n", | |
" 'message': 'FlatEarth, They dropped it down the back of the sofa.',\n", | |
" 'message_id': 18568548,\n", | |
" 'reputation': 485,\n", | |
" 'username': 'AutoWhale'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 28, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:28 UTC',\n", | |
" 'message': 'khota kay bacho dgb buy kero',\n", | |
" 'message_id': 18568547,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'mas.exchanging'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 28, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:28 UTC',\n", | |
" 'message': 'Larillo, shorter ? I long time Bro !!!',\n", | |
" 'message_id': 18568546,\n", | |
" 'reputation': 67,\n", | |
" 'username': 'FlatEarth'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 26, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:26 UTC',\n", | |
" 'message': 'zaizakitano banned for 1 days, 0 hours, and 0 minutes by Popcorntime.',\n", | |
" 'message_id': 18568545,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Banhammer'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 24, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:24 UTC',\n", | |
" 'message': 'SC is rounding 800 soon',\n", | |
" 'message_id': 18568544,\n", | |
" 'reputation': 11,\n", | |
" 'username': 'chrisjlabrie'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 23, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:23 UTC',\n", | |
" 'message': 'SC next king',\n", | |
" 'message_id': 18568543,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'rubensimpson'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 21, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:21 UTC',\n", | |
" 'message': 'this is funny man, you guys know hwo to pump',\n", | |
" 'message_id': 18568542,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Donator'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 19, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:19 UTC',\n", | |
" 'message': 'ok here is the real test for BCN, folks.',\n", | |
" 'message_id': 18568541,\n", | |
" 'reputation': 7,\n", | |
" 'username': 'supahflyninja'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 19, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:19 UTC',\n", | |
" 'message': 'a159357 banned for 1 days, 0 hours, and 0 minutes by Popcorntime.',\n", | |
" 'message_id': 18568540,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Banhammer'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 19, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:19 UTC',\n", | |
" 'message': 'bcn up',\n", | |
" 'message_id': 18568539,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'zaizakitano'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 15, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:15 UTC',\n", | |
" 'message': 'hope itll provide me now the promised 3k',\n", | |
" 'message_id': 18568538,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Gimpel'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 12, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:12 UTC',\n", | |
" 'message': 'hello people!',\n", | |
" 'message_id': 18568537,\n", | |
" 'reputation': 64,\n", | |
" 'username': 'AlienX101'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 10, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:10 UTC',\n", | |
" 'message': 'ludovic.palmisano, Lets not be suggesting trades please. Thank You !',\n", | |
" 'message_id': 18568536,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 10, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:10 UTC',\n", | |
" 'message': 'AlienX101, ok thanks bro.',\n", | |
" 'message_id': 18568535,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Randeepchopra'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 8, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:08 UTC',\n", | |
" 'message': 'FlatEarth, DGB GAME SC',\n", | |
" 'message_id': 18568534,\n", | |
" 'reputation': 20,\n", | |
" 'username': 'Larillo'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 6, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:06 UTC',\n", | |
" 'message': 'dash dump',\n", | |
" 'message_id': 18568533,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'a159357'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 6, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:06 UTC',\n", | |
" 'message': \"ex7231, Sorry I don't like give wrong information, have no clue. I'm good in DGB, SC, Golem, ETH, and BTC\",\n", | |
" 'message_id': 18568532,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'CyberKing'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 4, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:04 UTC',\n", | |
" 'message': 'Sia is a very promising company. Siacoin smart investment.',\n", | |
" 'message_id': 18568531,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'lindormusai'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 8, 4, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:04 UTC',\n", | |
" 'message': 'come guys lets do yesterday DGB',\n", | |
" 'message_id': 18568530,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'michal.sedlek'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 59, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:59 UTC',\n", | |
" 'message': 'I trusted in that bitchy DGB',\n", | |
" 'message_id': 18568529,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Gimpel'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 57, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:57 UTC',\n", | |
" 'message': 'Hello! happy trades to you Donator, thank you o/',\n", | |
" 'message_id': 18568528,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 55, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:55 UTC',\n", | |
" 'message': 'i love you all',\n", | |
" 'message_id': 18568527,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'oikigbeme'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 55, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:55 UTC',\n", | |
" 'message': \"CyberKing, I don't think they will. I'd consider Sia more of a backend solution.\",\n", | |
" 'message_id': 18568526,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ea96b'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 55, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:55 UTC',\n", | |
" 'message': 'anyone called the odds correctly on DGB and made a lot of cash today?',\n", | |
" 'message_id': 18568525,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'bmalslamrod69'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 52, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:52 UTC',\n", | |
" 'message': 'Can a mod please look at this tx, been over 30 minutes, TY 0x71ba86010f7d30b04cb262ae82148b94fcc76b93cb7c9d943e9246a0d3b3e5ee',\n", | |
" 'message_id': 18568524,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'callummontgomery'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 52, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:52 UTC',\n", | |
" 'message': 'exp',\n", | |
" 'message_id': 18568523,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ludovic.palmisano'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 52, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:52 UTC',\n", | |
" 'message': 'Larillo, ETH ?',\n", | |
" 'message_id': 18568522,\n", | |
" 'reputation': 67,\n", | |
" 'username': 'FlatEarth'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 50, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:50 UTC',\n", | |
" 'message': 'You dont understand that GAMERs will be pushing the price of GAME up simply buy buying a game on the store...',\n", | |
" 'message_id': 18568521,\n", | |
" 'reputation': 115,\n", | |
" 'username': 'btcnerd'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 48, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:48 UTC',\n", | |
" 'message': 'can we turn off the auto-scroll on this window?',\n", | |
" 'message_id': 18568520,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'gabrieldib'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 45, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:45 UTC',\n", | |
" 'message': 'CyberKing, for realz? or just speculation?',\n", | |
" 'message_id': 18568519,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ex7231'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 43, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:43 UTC',\n", | |
" 'message': \"Let's try to make posts with more substance than just a coin name and one word. Thank you\",\n", | |
" 'message_id': 18568518,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 43, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:43 UTC',\n", | |
" 'message': 'jockeh_1 and josechacinu banned for 1 days, 0 hours, and 0 minutes by Xoblort.',\n", | |
" 'message_id': 18568517,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Banhammer'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 43, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:43 UTC',\n", | |
" 'message': 'FlatEarth, quite good, made some profits last night',\n", | |
" 'message_id': 18568516,\n", | |
" 'reputation': 20,\n", | |
" 'username': 'Larillo'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 43, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:43 UTC',\n", | |
" 'message': 'mitchelreedijk67, ah, well when ever the sell price matches the buy price of your currency',\n", | |
" 'message_id': 18568515,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'PlasmaHydra'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 43, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:43 UTC',\n", | |
" 'message': 'Chinese don t have ETH anymore ???',\n", | |
" 'message_id': 18568514,\n", | |
" 'reputation': 67,\n", | |
" 'username': 'FlatEarth'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 41, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:41 UTC',\n", | |
" 'message': 'btcnerd, no thanks',\n", | |
" 'message_id': 18568513,\n", | |
" 'reputation': 33,\n", | |
" 'username': 'WhaleOnMe'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 38, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:38 UTC',\n", | |
" 'message': 'Xoblort, ok',\n", | |
" 'message_id': 18568512,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Donator'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 36, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:36 UTC',\n", | |
" 'message': 'GA880820, origi587, Sorry we mods do not have access to tickets on the support side. please lets wait for support to respond, appreciate the patience',\n", | |
" 'message_id': 18568511,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 29, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:29 UTC',\n", | |
" 'message': 'DGB HIGH',\n", | |
" 'message_id': 18568510,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'josechacinu'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 27, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:27 UTC',\n", | |
" 'message': 'Donator, Please avoid the capslock, Thank you',\n", | |
" 'message_id': 18568509,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 27, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:27 UTC',\n", | |
" 'message': 'Xoblort, Please help me ticker 197387',\n", | |
" 'message_id': 18568508,\n", | |
" 'reputation': 2,\n", | |
" 'username': 'origi587'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 25, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:25 UTC',\n", | |
" 'message': 'Randeepchopra, safe khelna hai to 2800 tak ya fir 3000 ke upar tak ruk jao',\n", | |
" 'message_id': 18568507,\n", | |
" 'reputation': 64,\n", | |
" 'username': 'AlienX101'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 25, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:25 UTC',\n", | |
" 'message': 'WhaleOnMe, then dont buy it man.....But mark my words you will see a massive rise...',\n", | |
" 'message_id': 18568506,\n", | |
" 'reputation': 115,\n", | |
" 'username': 'btcnerd'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 25, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:25 UTC',\n", | |
" 'message': 'Bitcoin went up to 2500 http://coinhaunt.com/',\n", | |
" 'message_id': 18568505,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'jay.mokashi'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 25, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:25 UTC',\n", | |
" 'message': 'by end of the month',\n", | |
" 'message_id': 18568504,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ivafamanesh'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 20, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:20 UTC',\n", | |
" 'message': 'maybe finish at 20 cents',\n", | |
" 'message_id': 18568503,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ivafamanesh'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 20, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:20 UTC',\n", | |
" 'message': 'mitchelreedijk67, depends on the price',\n", | |
" 'message_id': 18568502,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'nathan.z.b'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 18, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:18 UTC',\n", | |
" 'message': 'burst the next SC',\n", | |
" 'message_id': 18568501,\n", | |
" 'reputation': 64,\n", | |
" 'username': 'Memegod'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 18, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:18 UTC',\n", | |
" 'message': 'Larillo, market ?',\n", | |
" 'message_id': 18568500,\n", | |
" 'reputation': 67,\n", | |
" 'username': 'FlatEarth'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 17, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:17 UTC',\n", | |
" 'message': 'just put coin towards dgb',\n", | |
" 'message_id': 18568499,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ivafamanesh'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 15, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:15 UTC',\n", | |
" 'message': \"TurbineBase, ex7231, Don't list to me bro, go read about SC they will take over Amazon and Google dropbox services\",\n", | |
" 'message_id': 18568498,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'CyberKing'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 13, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:13 UTC',\n", | |
" 'message': 'lol',\n", | |
" 'message_id': 18568497,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'GhostOf2016'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 10, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:10 UTC',\n", | |
" 'message': 'PlasmaHydra, damn but im buying it',\n", | |
" 'message_id': 18568496,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'mitchelreedijk67'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 8, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:08 UTC',\n", | |
" 'message': 'dgb maybe intersting',\n", | |
" 'message_id': 18568495,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ivafamanesh'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 8, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:08 UTC',\n", | |
" 'message': 'Xoblort, Why is my Level 2 Verification always incomplete???',\n", | |
" 'message_id': 18568494,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'GA880820'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 8, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:08 UTC',\n", | |
" 'message': 'lol ish just crazy out here',\n", | |
" 'message_id': 18568493,\n", | |
" 'reputation': 79,\n", | |
" 'username': 'DongQuixote'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 8, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:08 UTC',\n", | |
" 'message': 'Managothic, XPR Is instant not all are',\n", | |
" 'message_id': 18568492,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Monsterskater'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 8, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:08 UTC',\n", | |
" 'message': 'Suck up to pink for Mark she huh stroking anyone',\n", | |
" 'message_id': 18568491,\n", | |
" 'reputation': 7,\n", | |
" 'username': 'BlinkUBroke'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 6, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:06 UTC',\n", | |
" 'message': 'sc go',\n", | |
" 'message_id': 18568490,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'seokhwa89'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 3, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:03 UTC',\n", | |
" 'message': 'mobilego app released for game?',\n", | |
" 'message_id': 18568489,\n", | |
" 'reputation': 248,\n", | |
" 'username': 'mybestestfriend'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 3, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:03 UTC',\n", | |
" 'message': 'or GAME',\n", | |
" 'message_id': 18568488,\n", | |
" 'reputation': 115,\n", | |
" 'username': 'btcnerd'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 3, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:03 UTC',\n", | |
" 'message': '12et2pq-5233, NOW',\n", | |
" 'message_id': 18568487,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Donator'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 3, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:03 UTC',\n", | |
" 'message': 'For those who keep asking about SIA + wallet issues on POLO -read up here: https://twitter.com/SiaTechHQ/status/870482084896354308',\n", | |
" 'message_id': 18568486,\n", | |
" 'reputation': 21,\n", | |
" 'username': 'Enoch'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 1, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:01 UTC',\n", | |
" 'message': 'mitchelreedijk67, Ask dormammu',\n", | |
" 'message_id': 18568485,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'gabrieldib'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 1, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:01 UTC',\n", | |
" 'message': 'CyberKing, i tried to say sheet coin?',\n", | |
" 'message_id': 18568484,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ex7231'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 1, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:01 UTC',\n", | |
" 'message': 'Xoblort, Could you cancel my 0.8165BTC cash register? or hope to deal with BTC withdrawals as soon as possible',\n", | |
" 'message_id': 18568483,\n", | |
" 'reputation': 0,\n", | |
" 'username': '18501535715'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 1, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:01 UTC',\n", | |
" 'message': 'FlatEarth, marrekt? xd',\n", | |
" 'message_id': 18568482,\n", | |
" 'reputation': 20,\n", | |
" 'username': 'Larillo'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, 1, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:01 UTC',\n", | |
" 'message': 'btcnerd, 63 million coins available at 4.63 per coin. no thanks.',\n", | |
" 'message_id': 18568481,\n", | |
" 'reputation': 33,\n", | |
" 'username': 'WhaleOnMe'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 7, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:07:00 UTC',\n", | |
" 'message': 'Why is it taking so long to complete a withdrawl?',\n", | |
" 'message_id': 18568480,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'CryptoSentient'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 56, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:56 UTC',\n", | |
" 'message': 'watch DGB',\n", | |
" 'message_id': 18568479,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'jockeh_1'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 54, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:54 UTC',\n", | |
" 'message': 'hi',\n", | |
" 'message_id': 18568478,\n", | |
" 'reputation': 0,\n", | |
" 'username': '2ahmadiar'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 54, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:54 UTC',\n", | |
" 'message': 'mitchelreedijk67, when ever the buy price matches what youre selling it for',\n", | |
" 'message_id': 18568477,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'PlasmaHydra'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 51, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:51 UTC',\n", | |
" 'message': 'guck0101 banned for 1 days, 0 hours, and 0 minutes by Popcorntime.',\n", | |
" 'message_id': 18568476,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Banhammer'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 49, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:49 UTC',\n", | |
" 'message': 'guck0101, where we ehaded',\n", | |
" 'message_id': 18568475,\n", | |
" 'reputation': 78,\n", | |
" 'username': 'BIGFKNFATE'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 49, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:49 UTC',\n", | |
" 'message': 'whats news on game ?',\n", | |
" 'message_id': 18568474,\n", | |
" 'reputation': 248,\n", | |
" 'username': 'mybestestfriend'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 49, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:49 UTC',\n", | |
" 'message': 'omg isnt cryptocurrency transfer instant??? why do i have to wait so long for my Polo account to be credited',\n", | |
" 'message_id': 18568473,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Managothic'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 49, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:49 UTC',\n", | |
" 'message': 'thanks for the ride SIA',\n", | |
" 'message_id': 18568472,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'sqrtx'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 47, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:47 UTC',\n", | |
" 'message': 'Xoblort, Popcorntime. I messed up a NEM deposit with not including the identifing #. That a mod fix or something for support?(i got a ticket in already)',\n", | |
" 'message_id': 18568471,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'tigerbombtrading'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 47, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:47 UTC',\n", | |
" 'message': 'Popcorntime, My account frozen 7day',\n", | |
" 'message_id': 18568470,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'TurbineBase'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 45, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:45 UTC',\n", | |
" 'message': 'sia buy order added anothe 300 btc in 30 mins',\n", | |
" 'message_id': 18568469,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'dixon.zim'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 44, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:44 UTC',\n", | |
" 'message': \"sc price in bittrex is 10% cheaper it's 660 now\",\n", | |
" 'message_id': 18568468,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'seodh1229'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 42, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:42 UTC',\n", | |
" 'message': 'TurbineBase, Sorry i cant view tickets. Please allow support more time to respond to your ticket',\n", | |
" 'message_id': 18568467,\n", | |
" 'reputation': 1170,\n", | |
" 'username': 'Popcorntime'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 42, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:42 UTC',\n", | |
" 'message': 'strat up up up up up',\n", | |
" 'message_id': 18568466,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'guck0101'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 38, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:38 UTC',\n", | |
" 'message': 'huaaahahaha thank you everyone :D',\n", | |
" 'message_id': 18568465,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'komurasaki'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 38, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:38 UTC',\n", | |
" 'message': 'Okaykoko, support tickets and verifications may be taking longer due to larger than normal queues, we do apologize',\n", | |
" 'message_id': 18568464,\n", | |
" 'reputation': 6558,\n", | |
" 'username': 'Xoblort'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 38, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:38 UTC',\n", | |
" 'message': 'RStrayer, and??? if it makes money, its good xd',\n", | |
" 'message_id': 18568463,\n", | |
" 'reputation': 20,\n", | |
" 'username': 'Larillo'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 36, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:36 UTC',\n", | |
" 'message': 'BossBasso, old to;;?',\n", | |
" 'message_id': 18568462,\n", | |
" 'reputation': 3,\n", | |
" 'username': 'msmicromax'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 35, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:35 UTC',\n", | |
" 'message': 'when is the best time to sell DGB ?',\n", | |
" 'message_id': 18568461,\n", | |
" 'reputation': 0,\n", | |
" 'username': '12et2pq-5233'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 35, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:35 UTC',\n", | |
" 'message': 'Larillo, waza ? marekt ??',\n", | |
" 'message_id': 18568460,\n", | |
" 'reputation': 67,\n", | |
" 'username': 'FlatEarth'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 35, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:35 UTC',\n", | |
" 'message': 'srotman, i dont have eta sorry',\n", | |
" 'message_id': 18568459,\n", | |
" 'reputation': 1170,\n", | |
" 'username': 'Popcorntime'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 33, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:33 UTC',\n", | |
" 'message': 'AlienX101, khan tak jayega? means kab bech du?',\n", | |
" 'message_id': 18568458,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'Randeepchopra'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 33, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:33 UTC',\n", | |
" 'message': 'wow,super growing sc',\n", | |
" 'message_id': 18568457,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'bjork271828'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 31, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:31 UTC',\n", | |
" 'message': 'mycool57, SC long run real Technology',\n", | |
" 'message_id': 18568456,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'CyberKing'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 31, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:31 UTC',\n", | |
" 'message': 'Randeepchopra, 2600?',\n", | |
" 'message_id': 18568455,\n", | |
" 'reputation': 64,\n", | |
" 'username': 'AlienX101'},\n", | |
" {'date': datetime.datetime(2017, 6, 4, 8, 6, 27, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:06:27 UTC',\n", | |
" 'message': 'Xoblort, thanks anyway!',\n", | |
" 'message_id': 18568454,\n", | |
" 'reputation': 2,\n", | |
" 'username': 'Gramsci'}]" | |
] | |
}, | |
"execution_count": 17, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"get_chat_list(html_list[4])" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"전체 html 에 대해서 위에서 작성한 get_chat_list 함수를 통해 모든 채팅 데이터 파싱( 만여개가 넘어서 오래걸렸음 )" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# total_res_list = [] \n", | |
"# for html in html_list:\n", | |
"# total_res_list += get_chat_list(html)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": false, | |
"scrolled": true | |
}, | |
"outputs": [], | |
"source": [ | |
"total_res_dic = {}\n", | |
"cnt = 0\n", | |
"for html in html_list:\n", | |
" res = get_chat_list(html)\n", | |
" for i in res:\n", | |
" if i['message_id'] in total_res_dic:\n", | |
" continue\n", | |
" else:\n", | |
" total_res_dic[i['message_id']] = i\n", | |
" if cnt%500 == 0:\n", | |
" print(cnt)\n", | |
" cnt += 1" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import pickle # 힘들게 파싱했으니 pickle 형식으로 serialize 하여 저장, 17,000개 기준으로 300MB\n", | |
"pickle.dump(total_res_dic, open('total_res_dic.p', 'wb'))\n", | |
"# total_res_list = pickle.load(open('total_res_dic.p','rb'))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 21, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1658602" | |
] | |
}, | |
"execution_count": 21, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(total_res_dic)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 26, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'date': datetime.datetime(2017, 6, 4, 8, 8, 35, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:08:35 UTC',\n", | |
" 'message': 'Please, keep the language in the TrollBox to English. Thank you for your understanding. A message from your local MOD SQUAD.',\n", | |
" 'message_id': 18568551,\n", | |
" 'reputation': 1170,\n", | |
" 'username': 'Popcorntime'}" | |
] | |
}, | |
"execution_count": 26, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"total_res_dic[18568551]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 36, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"total_res_dic_with_senti = []" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 41, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"for k,v in iter(total_res_dic.items()):\n", | |
" tmp_dic = v.copy()\n", | |
" res = sid.polarity_scores(tmp_dic['message'])\n", | |
" tmp_dic['neg'] = res['neg']\n", | |
" tmp_dic['neu'] = res['neu']\n", | |
" tmp_dic['pos'] = res['pos']\n", | |
" tmp_dic['compound'] = res['compound']\n", | |
" total_res_dic_with_senti.append(tmp_dic)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 49, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"{'compound': 0.0,\n", | |
" 'date': datetime.datetime(2017, 6, 4, 8, 33, 3, tzinfo=tzutc()),\n", | |
" 'date_str': '2017-06-04 08:33:03 UTC',\n", | |
" 'message': 'what is the minimum for etc?',\n", | |
" 'message_id': 18569509,\n", | |
" 'neg': 0.0,\n", | |
" 'neu': 1.0,\n", | |
" 'pos': 0.0,\n", | |
" 'reputation': 0,\n", | |
" 'username': 'ronald.narvasa.rn'}" | |
] | |
}, | |
"execution_count": 49, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"total_res_dic_with_senti" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"import pickle\n", | |
"# pickle.dump(total_res_dic_with_senti, open('total_res_dic_with_senti.p', 'wb'))\n", | |
"total_res_dic_with_senti = pickle.load(open('total_res_dic_with_senti.p','rb'))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1658602" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(total_res_dic_with_senti)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## Gensim 을 통한 word2vec, Doc2vec" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'2.1.0'" | |
] | |
}, | |
"execution_count": 11, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import gensim # doc2vec 을 위해 gensim library 사용 \n", | |
"gensim.__version__" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"'3.2.4'" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import nltk\n", | |
"nltk.__version__" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# nltk 에 내장된 snowball stemmer 사용 \n", | |
"from nltk.stem.snowball import SnowballStemmer\n", | |
"stemmer = SnowballStemmer(\"english\")" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"run\n" | |
] | |
} | |
], | |
"source": [ | |
"# stemmer 예제, 진행형, 과거형, 복수형을 원형으로 변환해줌\n", | |
"print(stemmer.stem(\"running\"))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"from nltk import word_tokenize" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['Bitcoin',\n", | |
" 'has',\n", | |
" 'left',\n", | |
" 'gold',\n", | |
" 'in',\n", | |
" 'the',\n", | |
" 'dust',\n", | |
" 'in',\n", | |
" 'recent',\n", | |
" 'months',\n", | |
" '.']" | |
] | |
}, | |
"execution_count": 16, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# nltk 의 기본 word_tokenize 를 통해 문장 토크나이징 테스트\n", | |
"word_tokenize('Bitcoin has left gold in the dust in recent months.')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# 문장 입력에 대해 토크나이징 및 스테밍을 함께 수행하여 토큰을 리턴해주는 함수 정의 \n", | |
"def stemming(text):\n", | |
" return [stemmer.stem(x) for x in word_tokenize(text)]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['it', 'run']" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stemming('its running')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['bitcoin', 'is', 'better', 'than', 'gold']" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stemming('Bitcoin Is Better Than Gold')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"['talk',\n", | |
" 'about',\n", | |
" 'pump/dump',\n", | |
" 'group',\n", | |
" 'or',\n", | |
" 'announc',\n", | |
" 'pumps/dump',\n", | |
" 'is',\n", | |
" 'not',\n", | |
" 'want',\n", | |
" 'here',\n", | |
" '.',\n", | |
" 'thank',\n", | |
" 'you',\n", | |
" 'for',\n", | |
" 'your',\n", | |
" 'understand',\n", | |
" '.',\n", | |
" 'a',\n", | |
" 'messag',\n", | |
" 'from',\n", | |
" 'your',\n", | |
" 'local',\n", | |
" 'mod',\n", | |
" 'squad',\n", | |
" '.']" | |
] | |
}, | |
"execution_count": 20, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"stemming(total_res_dic_with_senti[0]['message'])" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 42, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from collections import namedtuple\n", | |
"TaggedDocument = namedtuple('TaggedDocument', 'words tags')\n", | |
"# from gensim.models.doc2vec import TaggedDocument\n", | |
"\n", | |
"# 임계치에 따라 긍부정 태깅하여 Doc2vec 을 위한 문서형태로 저장\n", | |
"tagged_document_list = []\n", | |
"pos_count = 0\n", | |
"neg_count = 0\n", | |
"for i in total_res_dic_with_senti[:]: # sample\n", | |
" label = []\n", | |
" if i['pos'] > 0.5 and i['neg'] < 0.4:\n", | |
" label = [1]\n", | |
" pos_count += 1\n", | |
" elif i['neg'] > 0.5 and i['pos'] < 0.4:\n", | |
" label = [0]\n", | |
" neg_count += 1\n", | |
" else:\n", | |
" continue\n", | |
" tagged_document_list.append(TaggedDocument(stemming(i['message']), label))\n" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 43, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"1658602" | |
] | |
}, | |
"execution_count": 43, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(total_res_dic_with_senti)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 51, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# 태깅 데이터 임시 저장 및 로드\n", | |
"import pickle\n", | |
"# pickle.dump(tagged_document_list, open('tagged_document_list.p','wb'))\n", | |
"tagged_document_list = pickle.load(open('tagged_document_list.p','rb'))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 52, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"179412" | |
] | |
}, | |
"execution_count": 52, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(tagged_document_list)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 61, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# 학습 및 테스트를 공평하게 하기위해 랜덤하게 training set 은 80%, test set 은 20%로 분할 \n", | |
"from random import shuffle\n", | |
"from math import ceil\n", | |
"shuffle(tagged_document_list)\n", | |
"persent = ceil(float(len(tagged_document_list))/100.0)\n", | |
"test_set_persentage = 20\n", | |
"test_set = tagged_document_list[:persent*test_set_persentage]\n", | |
"train_set = tagged_document_list[persent*test_set_persentage:]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 62, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"35900" | |
] | |
}, | |
"execution_count": 62, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(test_set)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 63, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"143512" | |
] | |
}, | |
"execution_count": 63, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"len(train_set)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 64, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"from gensim.models import doc2vec\n", | |
"# 사전 구축\n", | |
"doc_vectorizer = doc2vec.Doc2Vec(size=300, alpha=0.025, min_alpha=0.025, seed=1234)\n", | |
"doc_vectorizer.build_vocab(train_set)\n", | |
"# doc2vec 학습\n", | |
"for epoch in range(10):\n", | |
" doc_vectorizer.train(tagged_document_list,total_examples=doc_vectorizer.corpus_count, epochs=doc_vectorizer.iter)\n", | |
" doc_vectorizer.alpha -= 0.002 # decrease the learning rate\n", | |
" doc_vectorizer.min_alpha = doc_vectorizer.alpha # fix the learning rate, no decay" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 66, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('dash', 0.41827628016471863),\n", | |
" ('xrp', 0.41451185941696167),\n", | |
" ('etc', 0.40095293521881104),\n", | |
" ('ltc', 0.39470094442367554),\n", | |
" ('zcash', 0.38932809233665466),\n", | |
" ('zec', 0.389003723859787),\n", | |
" ('usdt', 0.38660886883735657),\n", | |
" ('xmr', 0.3814651668071747),\n", | |
" ('dab', 0.38086938858032227),\n", | |
" ('vtc', 0.378325879573822)]" | |
] | |
}, | |
"execution_count": 66, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# 학습된 문서, 단어에 대해서 유사 vector 를 지닌 값 출력 \n", | |
"doc_vectorizer.most_similar('eth')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 81, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('btc', 0.4619801342487335),\n", | |
" ('etc', 0.394686222076416),\n", | |
" ('buterin', 0.38618335127830505),\n", | |
" ('ether', 0.3723365068435669),\n", | |
" ('digibyt', 0.35357969999313354),\n", | |
" ('zcrash', 0.3457624912261963),\n", | |
" ('foldingcoin', 0.3446189761161804),\n", | |
" ('eur', 0.33972662687301636),\n", | |
" ('emc', 0.33606672286987305),\n", | |
" ('dab', 0.33002081513404846)]" | |
] | |
}, | |
"execution_count": 81, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"doc_vectorizer.most_similar('bitcoin')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 72, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# gensim 모델 저장 및 불러오기 가능\n", | |
"# doc_vectorizer.save('doc_vectorizer')\n", | |
"# doc_vectorizer2 = doc2vec.Doc2Vec.load('doc_vectorizer')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 50, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"[('maid', 0.42228561639785767),\n", | |
" ('xrp', 0.4165087044239044),\n", | |
" ('ltc', 0.4158686399459839),\n", | |
" ('etc', 0.41268616914749146),\n", | |
" ('xmr', 0.4046849012374878),\n", | |
" ('dash', 0.39771586656570435),\n", | |
" ('bitstamp', 0.38785219192504883),\n", | |
" ('sbd', 0.3800865113735199),\n", | |
" ('usdt', 0.37259727716445923),\n", | |
" ('vtc', 0.3709067702293396)]" | |
] | |
}, | |
"execution_count": 50, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"doc_vectorizer2.most_similar('eth')" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 67, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"# 학습 및 평가, 분류를 위한 training set 문서들의 벡터 리스팅\n", | |
"train_x = [doc_vectorizer.infer_vector(doc.words) for doc in train_set]\n", | |
"train_y = [doc.tags[0] for doc in train_set]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [ | |
"# 학습 및 평가, 분류를 위한 test set 문서들의 벡터 리스팅\n", | |
"test_x = [doc_vectorizer.infer_vector(doc.words) for doc in test_set]\n", | |
"test_y = [doc.tags[0] for doc in test_set]" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 70, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"143512\n", | |
"300\n", | |
"35900\n", | |
"300\n" | |
] | |
} | |
], | |
"source": [ | |
"print(len(train_x))\n", | |
"print(len(train_x[0]))\n", | |
"\n", | |
"print(len(test_x))\n", | |
"print(len(test_x[0]))\n", | |
"# => 300" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 71, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"0.82038997214484677" | |
] | |
}, | |
"execution_count": 71, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"# sickit-learn 의 logistic regression 을 통해 classify 및 정확도 체크\n", | |
"from sklearn.linear_model import LogisticRegression\n", | |
"classifier = LogisticRegression(random_state=1234)\n", | |
"classifier.fit(train_x, train_y)\n", | |
"classifier.score(test_x, test_y)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 결론: 82% 의 높은 정확도 " | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": { | |
"collapsed": true | |
}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"anaconda-cloud": {}, | |
"kernelspec": { | |
"display_name": "tensorflow", | |
"language": "python", | |
"name": "venv3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.6.1" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment