Skip to content

Instantly share code, notes, and snippets.

@Cartman0
Last active May 7, 2016 08:59
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Cartman0/77c669b28f674179e459869881da7a56 to your computer and use it in GitHub Desktop.
Save Cartman0/77c669b28f674179e459869881da7a56 to your computer and use it in GitHub Desktop.
言語処理100本ノック 1章 準備運動 with python
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {
"toc": "true"
},
"cell_type": "markdown",
"source": "# Table of Contents\n <p><div class=\"lev1\"><a href=\"#1章.-準備運動-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>1章. 準備運動</a></div><div class=\"lev2\"><a href=\"#00.-文字列の逆順-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>00. 文字列の逆順</a></div><div class=\"lev2\"><a href=\"#01.-「パタトクカシーー」-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>01. 「パタトクカシーー」</a></div><div class=\"lev2\"><a href=\"#02.-「パトカー」+「タクシー」=「パタトクカシーー」-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>02. 「パトカー」+「タクシー」=「パタトクカシーー」</a></div><div class=\"lev2\"><a href=\"#03.-円周率-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>03. 円周率</a></div><div class=\"lev2\"><a href=\"#04.-元素記号-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>04. 元素記号</a></div><div class=\"lev2\"><a href=\"#05.-n-gram-1.6\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>05. n-gram</a></div><div class=\"lev2\"><a href=\"#06.-集合-1.7\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span>06. 集合</a></div><div class=\"lev2\"><a href=\"#07.-テンプレートによる文生成-1.8\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>07. テンプレートによる文生成</a></div><div class=\"lev2\"><a href=\"#08.-暗号文-1.9\"><span class=\"toc-item-num\">1.9&nbsp;&nbsp;</span>08. 暗号文</a></div><div class=\"lev2\"><a href=\"#09.-Typoglycemia-1.10\"><span class=\"toc-item-num\">1.10&nbsp;&nbsp;</span>09. Typoglycemia</a></div><div class=\"lev2\"><a href=\"#参考リンク-1.11\"><span class=\"toc-item-num\">1.11&nbsp;&nbsp;</span>参考リンク</a></div>"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# 1章. 準備運動 "
},
{
"metadata": {},
"cell_type": "markdown",
"source": "http://www.cl.ecei.tohoku.ac.jp/nlp100/#ch1"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 00. 文字列の逆順"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "文字列\"stressed\"の文字を逆に(末尾から先頭に向かって)並べた文字列を得よ"
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "s = 'stressed'\nrev = s[::-1]\nprint(rev)",
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": "desserts\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 01. 「パタトクカシーー」"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "「パタトクカシーー」という文字列の1,3,5,7文字目を取り出して連結した文字列を得よ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "s = 'パタトクカシーー'\ns1357 = s[1-1] + s[3-1] + s[5-1] + s[7-1]\nprint(s1357)",
"execution_count": 2,
"outputs": [
{
"output_type": "stream",
"text": "パトカー\n",
"name": "stdout"
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "s = 'パタトクカシーー'\ns1357 = s[::2]\nprint(s1357)",
"execution_count": 3,
"outputs": [
{
"output_type": "stream",
"text": "パトカー\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 02. 「パトカー」+「タクシー」=「パタトクカシーー」\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "「パトカー」+「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "s1 = 'パトカー'\ns2 = 'タクシー'\n\ns = ''.join([p + t for p, t in zip(s1, s2)])\nprint(s)",
"execution_count": 4,
"outputs": [
{
"output_type": "stream",
"text": "パタトクカシーー\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 03. 円周率"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "\"Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.\"という文を単語に分解し,各単語の(アルファベットの)文字数を先頭から出現順に並べたリストを作成せよ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "import re\nsentense = 'Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.'\n\nwords = re.sub(r'[.|,]', '', sentense).split()\ncounts = [len(w) for w in words]\nprint(counts)",
"execution_count": 5,
"outputs": [
{
"output_type": "stream",
"text": "[3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9]\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 04. 元素記号"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "\"Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.\"という文を単語に分解し,\n1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字,それ以外の単語は先頭に2文字を取り出し,取り出した文字列から単語の位置(先頭から何番目の単語か)への連想配列(辞書型もしくはマップ型)を作成せよ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "sentense = \"Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.\"\nwords = re.sub(r'[.|,]', '', sentense).split()\nidx_list_first = [1, 5, 6, 7, 8, 9, 15, 16, 19]\n\n# 参考:http://qiita.com/tanaka0325/items/08831b96b684d7ecb2f7\ndic = {w[:2 - int(i in idx_list_first)]:i for i, w in enumerate(words, 1)}\ndic",
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "{'Al': 13,\n 'Ar': 18,\n 'B': 5,\n 'Be': 4,\n 'C': 6,\n 'Ca': 20,\n 'Cl': 17,\n 'F': 9,\n 'H': 1,\n 'He': 2,\n 'K': 19,\n 'Li': 3,\n 'Mi': 12,\n 'N': 7,\n 'Na': 11,\n 'Ne': 10,\n 'O': 8,\n 'P': 15,\n 'S': 16,\n 'Si': 14}"
},
"metadata": {},
"execution_count": 6
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 05. n-gram"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "与えられたシーケンス(文字列やリストなど)からn-gramを作る関数を作成せよ.この関数を用い,\"I am an NLPer\"という文から単語bi-gram,文字bi-gramを得よ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "def n_gram(_in, n):\n return [_in[i:i+n] for i in range(len(_in)) if len(_in[i:i+n]) >= n]\n\ns = \"I am an NLPer\"\n\n# 単語bigram\nn_gram(s.split(), 2)",
"execution_count": 7,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "[['I', 'am'], ['am', 'an'], ['an', 'NLPer']]"
},
"metadata": {},
"execution_count": 7
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "# 文字bi-gram\nn_gram(s, 2)",
"execution_count": 8,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "['I ', ' a', 'am', 'm ', ' a', 'an', 'n ', ' N', 'NL', 'LP', 'Pe', 'er']"
},
"metadata": {},
"execution_count": 8
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "print('mono-gram(word)', n_gram(s.split(), 1))\nprint('mono-gram(str)', n_gram(s, 1))",
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": "mono-gram(word) [['I'], ['am'], ['an'], ['NLPer']]\nmono-gram(str) ['I', ' ', 'a', 'm', ' ', 'a', 'n', ' ', 'N', 'L', 'P', 'e', 'r']\n",
"name": "stdout"
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "print('tri-gram(word)', n_gram(s.split(), 3))\nprint('tri-gram(str)', n_gram(s, 3))",
"execution_count": 10,
"outputs": [
{
"output_type": "stream",
"text": "tri-gram(word) [['I', 'am', 'an'], ['am', 'an', 'NLPer']]\ntri-gram(str) ['I a', ' am', 'am ', 'm a', ' an', 'an ', 'n N', ' NL', 'NLP', 'LPe', 'Per']\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 06. 集合"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "\"paraparaparadise\"と\"paragraph\"に含まれる文字bi-gramの集合を,それぞれ, XとYとして求め,XとYの和集合,積集合,差集合を求めよ.さらに,'se'というbi-gramがXおよびYに含まれるかどうかを調べよ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "s1 = \"paraparaparadise\"\ns2 = \"paragraph\"\n\nX = set(n_gram(s1, 2))\nprint(X)\nY = set(n_gram(s2, 2))\nprint(Y)",
"execution_count": 11,
"outputs": [
{
"output_type": "stream",
"text": "{'se', 'ap', 'ra', 'di', 'is', 'ar', 'pa', 'ad'}\n{'ap', 'ag', 'ph', 'ra', 'gr', 'ar', 'pa'}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "# 和集合\nunion = X.union(Y)\nprint(union)",
"execution_count": 12,
"outputs": [
{
"output_type": "stream",
"text": "{'se', 'ap', 'ag', 'ph', 'ra', 'gr', 'di', 'is', 'ar', 'pa', 'ad'}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "# 積集合\nintersec = X.intersection(Y)\nprint(intersec)",
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": "{'ap', 'ra', 'ar', 'pa'}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "# 差集合\ndiff_X_Y = X.difference(Y)\nprint(diff_X_Y)\ndiff_Y_X = Y.difference(X)\nprint(diff_Y_X)",
"execution_count": 14,
"outputs": [
{
"output_type": "stream",
"text": "{'di', 'ad', 'se', 'is'}\n{'ag', 'ph', 'gr'}\n",
"name": "stdout"
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "'se' in X",
"execution_count": 15,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "True"
},
"metadata": {},
"execution_count": 15
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "'se' in Y",
"execution_count": 16,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "False"
},
"metadata": {},
"execution_count": 16
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 07. テンプレートによる文生成\n"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "引数x, y, zを受け取り「x時のyはz」という文字列を返す関数を実装せよ.さらに,x=12, y=\"気温\", z=22.4として,実行結果を確認せよ.\n"
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "def create_sentense_temp(x, y, z):\n return '{}時の{}は{}'.format(x, y, z)\n\ncreate_sentense_temp(12, '気温', 22.4)",
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'12時の気温は22.4'"
},
"metadata": {},
"execution_count": 17
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "## 08. 暗号文"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "与えられた文字列の各文字を,以下の仕様で変換する関数cipherを実装せよ.\n\n英小文字ならば(219 - 文字コード)の文字に置換\nその他の文字はそのまま出力\nこの関数を用い,英語のメッセージを暗号化・復号化せよ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "s='aあ'\ns.islower()",
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "True"
},
"metadata": {},
"execution_count": 18
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "def cipher(s:str):\n # chr: アスキーコードから文字へ\n # 219-ord: 文字から219アスキーコードへ\n return ''.join([chr(219-ord(c)) if 'a' <= c <= 'z' else c for c in s]) \n \ncipher('Hello World')",
"execution_count": 19,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'Hvool Wliow'"
},
"metadata": {},
"execution_count": 19
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "cipher('abcdefghijkelomnopqrstuvwxyz')",
"execution_count": 20,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'zyxwvutsrqpvolnmlkjihgfedcba'"
},
"metadata": {},
"execution_count": 20
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true,
"scrolled": true
},
"cell_type": "code",
"source": "cipher(cipher('Hello World'))",
"execution_count": 21,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "'Hello World'"
},
"metadata": {},
"execution_count": 21
}
]
},
{
"metadata": {
"collapsed": true
},
"cell_type": "markdown",
"source": "## 09. Typoglycemia"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "スペースで区切られた単語列に対して,各単語の先頭と末尾の文字は残し,それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ.ただし,長さが4以下の単語は並び替えないこととする.適当な英語の文(例えば\"I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind .\")を与え,その実行結果を確認せよ."
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "import re\nimport random\ns = \"I couldn't believe that I could actually understand what I was reading : the phenomenal power of the human mind .\"\n\ndef typoglycemia(not_sort_word_length=4):\n def typo(s):\n words = re.sub(r'[.|,|:]', '', s).split()\n return [w[0] + ''.join(random.sample(w[1:-1], len(w[1:-1]))) + w[-1] if len(w) > not_sort_word_length else w for w in words]\n return typo\n \n# [w[0] + random.shuffle(list(w[1:-2])) + w[-1] for w in words if len(w) > 4]\ntypo = typoglycemia(not_sort_word_length=4)\nt = typo(s)\n' '.join(t)",
"execution_count": 22,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "\"I culod'nt beeilve that I culod aultclay unrsetadnd what I was raeding the peonhaemnl peowr of the hmaun mind\""
},
"metadata": {},
"execution_count": 22
}
]
},
{
"metadata": {
"collapsed": false,
"trusted": true
},
"cell_type": "code",
"source": "# random shuffleは返り値なし\ns = 'abcdefg'\nl = list(s)\nrandom.shuffle(l)\nl",
"execution_count": 23,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": "['b', 'g', 'f', 'e', 'c', 'a', 'd']"
},
"metadata": {},
"execution_count": 23
}
]
},
{
"metadata": {
"collapsed": true
},
"cell_type": "markdown",
"source": "## 参考リンク"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "- [言語処理100本ノック with Python(第1章)](http://qiita.com/gamma1129/items/37bf660cf4e4b21d4267)\n- [言語処理100本ノック 第1章 in Python](http://qiita.com/piyo56/items/eb72b496669f541055c3)"
}
],
"metadata": {
"toc": {
"toc_window_display": false,
"toc_cell": true,
"toc_number_sections": true,
"toc_threshold": "6"
},
"gist": {
"id": "77c669b28f674179e459869881da7a56",
"data": {
"description": "言語処理100本ノック 1章 準備運動 with python",
"public": true
}
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"nbconvert_exporter": "python",
"name": "python",
"file_extension": ".py",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"mimetype": "text/x-python",
"pygments_lexer": "ipython3",
"version": "3.5.1"
},
"hide_input": false,
"_draft": {
"nbviewer_url": "https://gist.github.com/77c669b28f674179e459869881da7a56"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment