Skip to content

Instantly share code, notes, and snippets.

@esuji5
Created September 19, 2017 09:43
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save esuji5/c90f7a3938a10ae57fb185b42e660c30 to your computer and use it in GitHub Desktop.
Save esuji5/c90f7a3938a10ae57fb185b42e660c30 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"※C92で出した同人誌からプログラム分を説明するために一部抜粋した資料になります\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ゆゆ式のコマ画像OCR結果を「日本語」にする技術"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 概要\n",
"ゆゆ式のコマ画像は前回までのプログラムでほぼ綺麗に抜き出せるようになりました。次のステップとしてセリフをデータ化するためにGoogle Cloud Vision APIを使ってOCRを行いたいと思います。\n",
"このAPIは既存の縦書き日本語に対応したOCRとしてはとてもよい結果を出すことで知られています。\n",
"\n",
"ただ、結果をそのまま使うと以下の点で問題があります。\n",
"- 文字ではない絵の部分が余計な文字として認識される\n",
"- 言語を指定しなくても日本語縦書きを認識するが、逆に横書きとして認識される場合もあって煩わしい\n",
"- 吹き出し外の手書き文字がの精度が悪い\n",
"- 上下で2つの吹き出しに分かれている場合、右上から下に向かって走査されるため、順番がおかしくなる\n",
"- 特殊なフォントでは文字列の誤検出になる確率が高い\n",
"- 傍線やリーダー点のような記号の区別精度が悪い\n",
"\n",
"最後の下2つはAPIそのものの性能によるところが大きいので、現状では諦めて手入力で修正を行いたいと思いますが、上の4つについでは画像処理等の技術で自動的に上手いことしてくれるように改善しました。\n",
"この本では、非常にニッチな内容ですがそのデータ取得簡易化の技術についてお話したいと思います。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Google Cloud Vision APIで漫画のコマからOCRでセリフを抜き出す\n",
"後半のプログラム掲載部分で必要な準備等を説明します。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"(中略)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## プログラム"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"まずはコマの画像をPIL形式で開いた後にbase64へエンコードしAPIに投げます。何度もAPIに投げるとお金がかかるので、返ってきたOCR結果は画像のパスとセットでpickle形式で保存しておきます\n",
"\n",
"\n",
"### 準備\n",
"- Python 3.6をインストール\n",
"- ライブラリをインストール `pip install --upgrade google-api-python-client pillow`\n",
"- Google Cloud Platformのアカウントを作成してkeyをjson形式でダウンロードしておく。今回は説明を割愛\n",
"- [utils.py](https://github.com/esuji5/yonkoma2data/blob/master/src/utils.py)をダウンロード\n",
"\n",
"### 実行準備\n",
"- pickle_fileに保存するpickleファイルの名前を付ける\n",
"- image_dirにコマ画像の入ったディレクトリを指定する\n",
"\n",
"### 実行後\n",
"- pickleファイルに[[画像パス, テキストアノテーションの結果], ...]という形式が保存される\n",
"- コマ画像のディレクトリにoutという名前のディレクトリが生成され、その中にテキストアノテーションされた結果を枠線で囲んだものが保存される"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import base64\n",
"import os\n",
"import sys\n",
"from io import BytesIO\n",
"\n",
"from PIL import Image\n",
"from PIL import ImageDraw\n",
"from googleapiclient import discovery\n",
"from oauth2client.client import GoogleCredentials\n",
"\n",
"import utils\n",
"\n",
"# ダウンロードしたユーザー設定\n",
"KEY_JSON = 'gcp-esuji-api-d811bac799d1.json'\n",
"DISCOVERY_URL = 'https://{api}.googleapis.com/$discovery/rest?version={apiVersion}'\n",
"\n",
"\n",
"# Google Cloud Vision APIのサービスを取得\n",
"def get_gcv_service():\n",
" os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = KEY_JSON\n",
" credentials = GoogleCredentials.get_application_default()\n",
" service = discovery.build('vision', 'v1', credentials=credentials,\n",
" discoveryServiceUrl=DISCOVERY_URL)\n",
" return service\n",
"\n",
"# text_annotationsの結果を取得\n",
"def fetch_text_annotations(image_data, gcv_service):\n",
" image_content = base64.b64encode(image_data)\n",
" service_request = gcv_service.images().annotate(body={\n",
" 'requests': [{\n",
" 'image': {\n",
" 'content': image_content.decode('UTF-8')\n",
" },\n",
" 'features': [{\n",
" 'type': 'TEXT_DETECTION',\n",
" # 'maxResults': 5\n",
" }]\n",
" }]\n",
" })\n",
" response = service_request.execute()\n",
" if 'textAnnotations' in response['responses'][0]:\n",
" return response['responses'][0]['textAnnotations']\n",
" \n",
"# pillowのimg情報を一旦saveしてbase64.b64encodeで読めるようにする\n",
"def get_img_data(pl_img):\n",
" output = BytesIO()\n",
" pl_img.save(output, format='JPEG')\n",
" return output.getvalue()\n",
"\n",
"# テキスト部分を赤線で囲う\n",
"def highlight_texts(img, responce):\n",
" draw = ImageDraw.Draw(img)\n",
" for text in responce[1:]:\n",
" color = '#ff0000'\n",
" box = [(v.get('x', 0.0), v.get('y', 0.0)) for v in text['boundingPoly']['vertices']]\n",
" draw.line(box + [box[0]], width=2, fill=color)\n",
" return img\n",
"\n",
"# text_annotationsの結果を読める程度に表示する\n",
"def print_text_annotations(text_annotations):\n",
" for idx, res in enumerate(text_annotations):\n",
" print(idx)\n",
" try:\n",
" print(res['locale'], res['boundingPoly']['vertices'])\n",
" except:\n",
" print(res['description'])\n",
" print(res['boundingPoly']['vertices'])\n",
"\n",
"\n",
"if __name__ == '__main__':\n",
" pickle_file = 'yuyu8.pickle'\n",
" image_dir = '/Users/esuji/image/yuyu8/koma/'\n",
" image_path_list = utils.get_path_list(image_dir, 'png')\n",
" print('{} files'.format(len(image_path_list)))\n",
" gcv_service = get_gcv_service()\n",
"\n",
" ta_list = []\n",
" for img_path in image_path_list[30:40]:\n",
" img = Image.open(img_path)\n",
" img_gray = img.convert('L')\n",
"\n",
" text_annotations = fetch_text_annotations(get_img_data(img_gray), gcv_service)\n",
" \n",
" if text_annotations:\n",
" # print_text_annotations(text_annotations)\n",
" highlighted_img = highlight_texts(img, text_annotations)\n",
"\n",
" ta_list.append([img_path, text_annotations])\n",
" if len(ta_list) % 10 == 0:\n",
" utils.pickle_dump(ta_list, filename=pickle_file)\n",
" out_path = utils.make_outdir_of_imgfile(img_path)\n",
" print(out_path)\n",
" highlighted_img.save(out_path)\n",
"\n",
" utils.pickle_dump(ta_list, filename=pickle_file)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ocr結果を使って日本語を組み立てる\n",
"\n",
"### 準備\n",
"- ライブラリをインストール `pip install numpy scikit-learn`\n",
"- OpenCV 3系をインストール\n",
"- [joyo_kanji.py](https://github.com/esuji5/yonkoma2data/blob/master/src/joyo_kanji.py)をダウンロード\n",
"\n",
"### 実行準備\n",
"- ta_list = utils.pickle_load('')に保存したpickleファイルのパスを指定する\n",
"\n",
"### 実行後\n",
"日本語化した結果が表示される"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"import unicodedata\n",
"import re\n",
"from itertools import combinations\n",
"from operator import itemgetter\n",
"\n",
"from PIL import Image\n",
"import cv2\n",
"import numpy as np\n",
"from sklearn.cluster import KMeans\n",
"\n",
"import utils\n",
"from joyo_kanji import JOYO_KANJI\n",
"\n",
"\n",
"SKIP_SIZE_EVAL_LIST = {'!', '?', '一', '...', '…', 'が', 'か', 'く', 'へ', 'ヘ', 'と'}\n",
"SKIP_LONG_WIDTH_LIST = {'一', 'へ', 'ヘ', 'が', 'か'}\n",
"SKIP_NG_WORD_LIST = {'1', '2', '3', '8', 'UFO'}\n",
"RE_BALLOON_NG_MATCH = re.compile('(^[0-9]{2,}$|^[a-zA-Z()\\-\\\\\\/%:;*\\\"\\'\\.,_\"]{1,}$|^フ$|^℃$|^ù$)',\n",
" re.UNICODE)\n",
"\n",
"\n",
"class Koma:\n",
" def __init__(self, img_path=''):\n",
" if img_path:\n",
" self.img_path = img_path\n",
" self.img = cv2.imread(img_path)\n",
" if self.img is not None:\n",
" self.img_height, self.img_width = self.img.shape[:2]\n",
"\n",
"\n",
"class Balloons(Koma):\n",
" NG_HEIGHT_RATIO = 0.03\n",
" NG_LONG_WIDTH_RATIO = 1.3\n",
" ta_rect_list = []\n",
"\n",
" def define_rect(self, positions):\n",
" ''' positions = [left_top, right_top, right_bottom, left_bottom], left_top={x:x_val, y:y_val}'''\n",
" lt, rt, rb, lb = positions # 旧ver\n",
"# rt, rb, lb, lt = positions\n",
" # rect = x, y, w, h の形式で返す。なぜかx,yの値が無いことがあるのでその場合は0を入れておく\n",
" rect = [lt.get('x', 0), lt.get('y', 0),\n",
" rt.get('x', 0) - lt.get('x', 0), lb.get('y', 0) - lt.get('y', 0)]\n",
" return rect\n",
"\n",
" def get_ta_value(self, text_annotation):\n",
" pos = self.define_rect(text_annotation['boundingPoly']['vertices'])\n",
" text = text_annotation['description'].replace('\\n', '')\n",
" text = '…' if text == '...' else text\n",
"# text = 'ー' if text == '-' else text\n",
" text = 'か' if text == '力' else text\n",
" text = 'か' if text == 'カ' else text\n",
" return {'rect': pos, 'text': text}\n",
"\n",
" def detect_balloon(self):\n",
" for ta in self.ta_list:\n",
" h, w = self.img.shape[:2]\n",
" mask = np.zeros((h+2, w+2), np.uint8)\n",
" flooded_try = self.img.copy()\n",
" rect_set = set()\n",
" for ta in self.ta_list:\n",
" # if ta['text'].startswith((\"亍\", \"宁\", \"佇\", \"■\", \"𠆤\", \"枣\", \"令ひ\")):\n",
" # continue\n",
" target = (ta['rect'][0], ta['rect'][1])\n",
"\n",
" # 塗りつぶしを行い、塗りつぶし範囲のrectを取得\n",
" _, _, _, rect = cv2.floodFill(\n",
" flooded_try, mask, target, (0, 0, 220), (1, 1, 1), (250, 250, 250))\n",
" x, y, rw, rh = rect\n",
" area = rw * rh\n",
"\n",
" # 塗りつぶした範囲が適正なら吹き出しと判断\n",
" is_nice_x_range = w * 0.1 < rw < w * 0.7\n",
" is_nice_y_range = h * 0.1 < rh < h * 0.999\n",
" is_nice_area = 0.03 < area / (w * h) < 0.45\n",
" mean_val = (self.img[y:y + rh, x:x + rw] > 200).mean()\n",
" is_white = mean_val > 0.68\n",
" if area and is_nice_x_range and is_nice_y_range and is_nice_area and is_white:\n",
" rect_set.add(rect)\n",
" else:\n",
" pass\n",
"# print('not in fukidashi:', ta)\n",
" # print(target, rect, area, bp['text'], area / (w*h), mean_val)\n",
" # xの降順、yの昇順でソート。右上の吹き出しから順番になるように\n",
" s = sorted(rect_set, key=itemgetter(1), reverse=False)\n",
" rect_list = sorted(s, key=itemgetter(0), reverse=True)\n",
" self.ta_rect_list = rect_list\n",
"\n",
" def detect_duprect(self, a, b):\n",
" ax, ay, aw, ah = a['rect']\n",
" bx, by, bw, bh = b['rect']\n",
"\n",
" sx = np.array([ax, bx]).max()\n",
" sy = np.array([ay, by]).max()\n",
" ex = np.array([ax + aw, bx + bw]).min()\n",
" ey = np.array([ay + ah, by + bh]).min()\n",
"\n",
" w = ex - sx\n",
" h = ey - sy\n",
" if w > 0 and h > 0:\n",
" return True\n",
"\n",
" def define_balloon(self, text_annotations):\n",
" print(text_annotations)\n",
" # 不正っぽいtext_annotationの結果を取り除く\n",
" ta_list = [self.get_ta_value(ta) for ta in text_annotations[1:] if self.is_good_ta(ta)]\n",
"\n",
" # taが重複している領域をいい感じにする\n",
" for combi in list(combinations(ta_list, 2)):\n",
" is_duprect = self.detect_duprect(*combi)\n",
" if is_duprect:\n",
" # 文字数が少ないtaを除去。それが一緒なら後のtaを除去\n",
" remove_ta = None\n",
" if len(combi[1]['text']) > len(combi[0]['text']):\n",
" remove_ta = combi[0]\n",
" elif len(combi[0]['text']) > len(combi[1]['text']):\n",
" remove_ta = combi[1]\n",
" # 同じ文字の場合、先のtaを除去\n",
" if not remove_ta and combi[0]['text'] == combi[1]['text']:\n",
" remove_ta = combi[0]\n",
" # まだ決まらない場合、領域の広い方を残す\n",
" if not remove_ta:\n",
" w0, h0 = combi[0]['rect'][2:]\n",
" w1, h1 = combi[1]['rect'][2:]\n",
" area0, area1 = w0 * h0, w1 * h1\n",
" remove_ta = combi[1] if area0 > area1 else combi[0]\n",
" try:\n",
" ta_list.remove(remove_ta)\n",
" print('duplicate:', combi)\n",
"# duplicate: ({'rect': [151, 79, 15, 14], 'text': 'で'}, {'rect': [153, 78, 14, 14], 'text': 'で'})\n",
" except:\n",
" print('already removed:', remove_ta)\n",
" self.ta_list = ta_list\n",
"\n",
" # 吹き出しの領域を検出\n",
" self.detect_balloon()\n",
" # 吹き出し毎にtaを分ける\n",
" self.ta_rect_dict = {}\n",
" for rect in self.ta_rect_list:\n",
" x, y, w, h = rect\n",
" for ta in self.ta_list:\n",
" ta_x, ta_y, _, _ = ta['rect']\n",
" if x < ta_x < x + w and y < ta_y < y + h:\n",
" try:\n",
" self.ta_rect_dict[rect] += [ta]\n",
" except:\n",
" self.ta_rect_dict[rect] = [ta]\n",
"\n",
" # 吹き出し内で縦に分かれていたら、いい感じにする\n",
" for rect, ta_list in self.ta_rect_dict.items():\n",
" # 吹き出しの高さ/画像の高さによって縦に分割されてそうか判断\n",
" balloon_height_rate = rect[3] / self.img.shape[0]\n",
" if len(ta_list) >= 2 and balloon_height_rate > 0.9:\n",
" # 2クラスに分類\n",
" print(rect[3], self.img.shape[0], rect[3] / self.img.shape[0])\n",
" y_positions = [[i['rect'][1]] for i in ta_list]\n",
" clf = KMeans(n_clusters=2).fit(y_positions)\n",
" classes = clf.predict(y_positions) # 0か1に分類される\n",
"\n",
" # 分類結果でta_listを分割\n",
" ta_list_1 = [ta for ta, cls in zip(ta_list, classes) if cls == 0]\n",
" ta_list_2 = [ta for ta, cls in zip(ta_list, classes) if cls == 1]\n",
" x1, y1, w1, h1 = ta_list_1[0]['rect']\n",
" x2, y2, w2, h2 = ta_list_2[0]['rect']\n",
" x1l, y1l, w1l, h1l = ta_list_1[-1]['rect']\n",
" x2l, y2l, w2l, h2l = ta_list_2[-1]['rect']\n",
" print('わかれて〜:', y1l + h1l - y2, y2l + h2l - y1)\n",
" # 新しいtaのリストを作成してself.ta_rect_dict[rect]に設定する\n",
" if y1 < y2:\n",
" new_ta_list = ta_list_1 + [{'text': '\\n'}] + ta_list_2\n",
" else:\n",
" new_ta_list = ta_list_2 + [{'text': '\\n'}] + ta_list_1\n",
" self.ta_rect_dict[rect] = new_ta_list\n",
"\n",
" # 吹き出しidとtaのjoinを作成\n",
" self.text_list = []\n",
" for idx, ta_list in enumerate(self.ta_rect_dict.values()):\n",
" text = \"\".join([i['text'] for i in ta_list])\n",
" print(str(idx + 1) + \":\", text)\n",
" self.text_list.append(text)\n",
"\n",
" def is_good_ta(self, text_annotation):\n",
" ta_value = self.get_ta_value(text_annotation)\n",
" rect, text = ta_value['rect'], ta_value['text']\n",
"\n",
" # NGリストの文字は多分不正\n",
" if text not in SKIP_NG_WORD_LIST:\n",
" if RE_BALLOON_NG_MATCH.search(text):\n",
" print('NG Word: {}'.format(text))\n",
" return False\n",
" elif len(text) == 1:\n",
" if not is_japanese_char(text):\n",
" print('NG char: {}'.format(text))\n",
" return False\n",
" elif \"CJK UNIFIED\" in unicodedata.name(text) and text not in JOYO_KANJI:\n",
" print('NG KANJI: {}'.format(text))\n",
" return False\n",
" \n",
" area_width, area_height = rect[2:]\n",
"\n",
" # 小さすぎる領域は多分不正。でもSKIP_SIZE_EVAL_LISTに入っている文字は間違いやすいからここをSKIP\n",
"# if text not in SKIP_SIZE_EVAL_LIST:\n",
"# if area_height < self.img_height * self.NG_HEIGHT_RATIO:\n",
"# print('NG small area height: {}'.format(text))\n",
"# print('{} < {} * {} = {}'.format(area_height, self.img_height, self.NG_HEIGHT_RATIO,\n",
"# self.img_height * self.NG_HEIGHT_RATIO))\n",
"# return False\n",
"# if area_width < self.img_width * (self.NG_HEIGHT_RATIO - 0.01):\n",
"# print('NG small area width: {}'.format(text))\n",
"# print('{} < {} * {} = {}'.format(area_width, self.img_width, self.NG_HEIGHT_RATIO,\n",
"# self.img_width * self.NG_HEIGHT_RATIO))\n",
"# return False\n",
"\n",
" # 横長の領域は多分不正\n",
" if text not in SKIP_LONG_WIDTH_LIST:\n",
" if area_width > area_height * self.NG_LONG_WIDTH_RATIO:\n",
" print('NG long width: {}'.format(text))\n",
" print('{} > {} * {}'.format(area_width, area_height, self.NG_LONG_WIDTH_RATIO))\n",
" return False\n",
"\n",
" return True\n",
"\n",
"\n",
"def is_japanese_char(char):\n",
" # ref. http://minus9d.hatenablog.com/entry/2015/07/16/231608\n",
" name = unicodedata.name(char)\n",
" japanese_char = (\"CJK UNIFIED\", \"HIRAGANA\", \"KATAKANA\")\n",
" if name.startswith(japanese_char) or \"MARK\" in name or 'HORIZONTAL ELLIPSIS' in name:\n",
" return True\n",
" return False\n",
"\n",
"if __name__ == '__main__':\n",
" ta_list = utils.pickle_load('yuyu8.pickle')\n",
" ta_list = utils.pickle_load('../cvtest/pickles/yuyu7.pickle')\n",
" bal_list = []\n",
" for idx, ta in enumerate(ta_list[70:]):\n",
" ta[0] = ta[0].replace('cut_images_wide/knife_cut/knife_cut-', 'koma/')\n",
" print(ta[0])\n",
" bal = Balloons(ta[0])\n",
" if bal.img is not None:\n",
" # bal.NG_HEIGHT_RATIO = 0.03\n",
" display(Image.open(ta[0].replace('koma/', 'koma/out/').replace('.png', '_out.png')))\n",
" bal.define_balloon(ta[1])\n",
" bal_list.append(bal)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
" \n",
"\n",
"\n",
"\n",
"\n",
"\n",
"## 宣伝\n",
"アニメ『ゆゆ式』のかおり監督インタビューと既刊ゆゆ式MANIAC vol.1〜3をまとめた『真・ゆゆ式MANIAC』をpdfで無料公開しています。詳しくは以下のURLへアクセスしてください。 \n",
"- http://d.hatena.ne.jp/esuji5/20161201\n",
"\n",
"同人誌ではなく一般的な書籍として、プログラミング初心者がPythonでなにかしら動くものを作ることを目指す本を共著で書きました。8月7日に発売していますので、プログラミングを始めてみたい方はぜひお手にとってみてください。当サークルでやっている画像処理や機械学習もすべてPythonで書いていますのでこちらの理解にも繋がるかと思います。\n",
"\n",
"**翔泳社『スラスラわかるPython』2484円**\n",
"\n",
"\n",
"\n",
"## 奥付\n",
"- サークル: ポストモダンのポリアネス\n",
"- 発行日:2017年8月13日\n",
"- 文責:S治\n",
"- 連絡先:esuji5@gmail.com or Twitterの[@esuji](https://twitter.com/esuji) まで\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment