Skip to content

Instantly share code, notes, and snippets.

@reiyw
Created October 25, 2016 07:06
Show Gist options
  • Save reiyw/0da182856120bab9fdc60e682923ee76 to your computer and use it in GitHub Desktop.
Save reiyw/0da182856120bab9fdc60e682923ee76 to your computer and use it in GitHub Desktop.
言語処理100本ノックの解答です.Python 初学者を対象としているので無駄な技巧は用いませんが,できるだけ Pythonic な方法を目指します.
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T14:02:19.377555",
"start_time": "2016-10-25T14:02:19.370809"
}
},
"source": [
"# [第2章: UNIXコマンドの基礎](http://www.cl.ecei.tohoku.ac.jp/nlp100/#ch2)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> [hightemp.txt](http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt)は,日本の最高気温の記録を「都道府県」「地点」「℃」「日」のタブ区切り形式で格納したファイルである.以下の処理を行うプログラムを作成し,[hightemp.txt](http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt)を入力ファイルとして実行せよ.さらに,同様の処理をUNIXコマンドでも実行し,プログラムの実行結果を確認せよ."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. 行数のカウント\n",
"> 行数をカウントせよ.確認にはwcコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.009334",
"start_time": "2016-10-25T16:05:19.985059"
},
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"24"
]
},
"execution_count": 1,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"for i, _ in enumerate(open('hightemp.txt'), start=1): pass\n",
"i"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.148078",
"start_time": "2016-10-25T16:05:20.011600"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"24 hightemp.txt\r\n"
]
}
],
"source": [
"!wc -l hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11. タブをスペースに置換\n",
"> タブ1文字につきスペース1文字に置換せよ.確認にはsedコマンド,trコマンド,もしくはexpandコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.161224",
"start_time": "2016-10-25T16:05:20.150889"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県 江川崎 41 2013-08-12\n",
"埼玉県 熊谷 40.9 2007-08-16\n",
"岐阜県 多治見 40.9 2007-08-16\n",
"山形県 山形 40.8 1933-07-25\n",
"山梨県 甲府 40.7 2013-08-10\n",
"和歌山県 かつらぎ 40.6 1994-08-08\n",
"静岡県 天竜 40.6 1994-08-04\n",
"山梨県 勝沼 40.5 2013-08-10\n",
"埼玉県 越谷 40.4 2007-08-16\n",
"群馬県 館林 40.3 2007-08-16\n",
"群馬県 上里見 40.3 1998-07-04\n",
"愛知県 愛西 40.3 1994-08-05\n",
"千葉県 牛久 40.2 2004-07-20\n",
"静岡県 佐久間 40.2 2001-07-24\n",
"愛媛県 宇和島 40.2 1927-07-22\n",
"山形県 酒田 40.1 1978-08-03\n",
"岐阜県 美濃 40 2007-08-16\n",
"群馬県 前橋 40 2001-07-24\n",
"千葉県 茂原 39.9 2013-08-11\n",
"埼玉県 鳩山 39.9 1997-07-05\n",
"大阪府 豊中 39.9 1994-08-08\n",
"山梨県 大月 39.9 1990-07-19\n",
"山形県 鶴岡 39.9 1978-08-03\n",
"愛知県 名古屋 39.9 1942-08-02\n"
]
}
],
"source": [
"for line in open('hightemp.txt'):\n",
" print line.strip().replace('\\t', ' ')"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.319459",
"start_time": "2016-10-25T16:05:20.165968"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県 江川崎 41 2013-08-12\r\n",
"埼玉県 熊谷 40.9 2007-08-16\r\n",
"岐阜県 多治見 40.9 2007-08-16\r\n",
"山形県 山形 40.8 1933-07-25\r\n",
"山梨県 甲府 40.7 2013-08-10\r\n",
"和歌山県 かつらぎ 40.6 1994-08-08\r\n",
"静岡県 天竜 40.6 1994-08-04\r\n",
"山梨県 勝沼 40.5 2013-08-10\r\n",
"埼玉県 越谷 40.4 2007-08-16\r\n",
"群馬県 館林 40.3 2007-08-16\r\n",
"群馬県 上里見 40.3 1998-07-04\r\n",
"愛知県 愛西 40.3 1994-08-05\r\n",
"千葉県 牛久 40.2 2004-07-20\r\n",
"静岡県 佐久間 40.2 2001-07-24\r\n",
"愛媛県 宇和島 40.2 1927-07-22\r\n",
"山形県 酒田 40.1 1978-08-03\r\n",
"岐阜県 美濃 40 2007-08-16\r\n",
"群馬県 前橋 40 2001-07-24\r\n",
"千葉県 茂原 39.9 2013-08-11\r\n",
"埼玉県 鳩山 39.9 1997-07-05\r\n",
"大阪府 豊中 39.9 1994-08-08\r\n",
"山梨県 大月 39.9 1990-07-19\r\n",
"山形県 鶴岡 39.9 1978-08-03\r\n",
"愛知県 名古屋 39.9 1942-08-02\r\n"
]
}
],
"source": [
"!sed 's/\\t/ /g' hightemp.txt"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.472269",
"start_time": "2016-10-25T16:05:20.324079"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県 江川崎 41 2013-08-12\r\n",
"埼玉県 熊谷 40.9 2007-08-16\r\n",
"岐阜県 多治見 40.9 2007-08-16\r\n",
"山形県 山形 40.8 1933-07-25\r\n",
"山梨県 甲府 40.7 2013-08-10\r\n",
"和歌山県 かつらぎ 40.6 1994-08-08\r\n",
"静岡県 天竜 40.6 1994-08-04\r\n",
"山梨県 勝沼 40.5 2013-08-10\r\n",
"埼玉県 越谷 40.4 2007-08-16\r\n",
"群馬県 館林 40.3 2007-08-16\r\n",
"群馬県 上里見 40.3 1998-07-04\r\n",
"愛知県 愛西 40.3 1994-08-05\r\n",
"千葉県 牛久 40.2 2004-07-20\r\n",
"静岡県 佐久間 40.2 2001-07-24\r\n",
"愛媛県 宇和島 40.2 1927-07-22\r\n",
"山形県 酒田 40.1 1978-08-03\r\n",
"岐阜県 美濃 40 2007-08-16\r\n",
"群馬県 前橋 40 2001-07-24\r\n",
"千葉県 茂原 39.9 2013-08-11\r\n",
"埼玉県 鳩山 39.9 1997-07-05\r\n",
"大阪府 豊中 39.9 1994-08-08\r\n",
"山梨県 大月 39.9 1990-07-19\r\n",
"山形県 鶴岡 39.9 1978-08-03\r\n",
"愛知県 名古屋 39.9 1942-08-02\r\n"
]
}
],
"source": [
"!cat hightemp.txt | tr '\\t' ' '"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12. 1列目をcol1.txtに,2列目をcol2.txtに保存\n",
"> 各行の1列目だけを抜き出したものをcol1.txtに,2列目だけを抜き出したものをcol2.txtとしてファイルに保存せよ.確認にはcutコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.499376",
"start_time": "2016-10-25T16:05:20.476580"
},
"collapsed": true
},
"outputs": [],
"source": [
"with open('col1.txt', 'w') as f1, open('col2.txt', 'w') as f2:\n",
" for line in open('hightemp.txt'):\n",
" fields = line.strip().split()\n",
" f1.write(fields[0] + '\\n')\n",
" f2.write(fields[1] + '\\n')"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.672480",
"start_time": "2016-10-25T16:05:20.502212"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\r\n",
"埼玉県\r\n",
"岐阜県\r\n",
"山形県\r\n",
"山梨県\r\n",
"和歌山県\r\n",
"静岡県\r\n",
"山梨県\r\n",
"埼玉県\r\n",
"群馬県\r\n",
"群馬県\r\n",
"愛知県\r\n",
"千葉県\r\n",
"静岡県\r\n",
"愛媛県\r\n",
"山形県\r\n",
"岐阜県\r\n",
"群馬県\r\n",
"千葉県\r\n",
"埼玉県\r\n",
"大阪府\r\n",
"山梨県\r\n",
"山形県\r\n",
"愛知県\r\n"
]
}
],
"source": [
"!cat col1.txt"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.824272",
"start_time": "2016-10-25T16:05:20.677091"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\r\n",
"埼玉県\r\n",
"岐阜県\r\n",
"山形県\r\n",
"山梨県\r\n",
"和歌山県\r\n",
"静岡県\r\n",
"山梨県\r\n",
"埼玉県\r\n",
"群馬県\r\n",
"群馬県\r\n",
"愛知県\r\n",
"千葉県\r\n",
"静岡県\r\n",
"愛媛県\r\n",
"山形県\r\n",
"岐阜県\r\n",
"群馬県\r\n",
"千葉県\r\n",
"埼玉県\r\n",
"大阪府\r\n",
"山梨県\r\n",
"山形県\r\n",
"愛知県\r\n"
]
}
],
"source": [
"!cut hightemp.txt -f1"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:20.972060",
"start_time": "2016-10-25T16:05:20.829988"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"江川崎\r\n",
"熊谷\r\n",
"多治見\r\n",
"山形\r\n",
"甲府\r\n",
"かつらぎ\r\n",
"天竜\r\n",
"勝沼\r\n",
"越谷\r\n",
"館林\r\n",
"上里見\r\n",
"愛西\r\n",
"牛久\r\n",
"佐久間\r\n",
"宇和島\r\n",
"酒田\r\n",
"美濃\r\n",
"前橋\r\n",
"茂原\r\n",
"鳩山\r\n",
"豊中\r\n",
"大月\r\n",
"鶴岡\r\n",
"名古屋\r\n"
]
}
],
"source": [
"!cat col2.txt"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.120461",
"start_time": "2016-10-25T16:05:20.976163"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"江川崎\r\n",
"熊谷\r\n",
"多治見\r\n",
"山形\r\n",
"甲府\r\n",
"かつらぎ\r\n",
"天竜\r\n",
"勝沼\r\n",
"越谷\r\n",
"館林\r\n",
"上里見\r\n",
"愛西\r\n",
"牛久\r\n",
"佐久間\r\n",
"宇和島\r\n",
"酒田\r\n",
"美濃\r\n",
"前橋\r\n",
"茂原\r\n",
"鳩山\r\n",
"豊中\r\n",
"大月\r\n",
"鶴岡\r\n",
"名古屋\r\n"
]
}
],
"source": [
"!cut hightemp.txt -f2"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 13. col1.txtとcol2.txtをマージ\n",
"> 12で作ったcol1.txtとcol2.txtを結合し,元のファイルの1列目と2列目をタブ区切りで並べたテキストファイルを作成せよ.確認にはpasteコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.137976",
"start_time": "2016-10-25T16:05:21.125577"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\n",
"埼玉県\t熊谷\n",
"岐阜県\t多治見\n",
"山形県\t山形\n",
"山梨県\t甲府\n",
"和歌山県\tかつらぎ\n",
"静岡県\t天竜\n",
"山梨県\t勝沼\n",
"埼玉県\t越谷\n",
"群馬県\t館林\n",
"群馬県\t上里見\n",
"愛知県\t愛西\n",
"千葉県\t牛久\n",
"静岡県\t佐久間\n",
"愛媛県\t宇和島\n",
"山形県\t酒田\n",
"岐阜県\t美濃\n",
"群馬県\t前橋\n",
"千葉県\t茂原\n",
"埼玉県\t鳩山\n",
"大阪府\t豊中\n",
"山梨県\t大月\n",
"山形県\t鶴岡\n",
"愛知県\t名古屋\n"
]
}
],
"source": [
"for line1, line2 in zip(open('col1.txt'), open('col2.txt')):\n",
" print '{}\\t{}'.format(line1.strip(), line2.strip())"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.286998",
"start_time": "2016-10-25T16:05:21.141270"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\r\n",
"埼玉県\t熊谷\r\n",
"岐阜県\t多治見\r\n",
"山形県\t山形\r\n",
"山梨県\t甲府\r\n",
"和歌山県\tかつらぎ\r\n",
"静岡県\t天竜\r\n",
"山梨県\t勝沼\r\n",
"埼玉県\t越谷\r\n",
"群馬県\t館林\r\n",
"群馬県\t上里見\r\n",
"愛知県\t愛西\r\n",
"千葉県\t牛久\r\n",
"静岡県\t佐久間\r\n",
"愛媛県\t宇和島\r\n",
"山形県\t酒田\r\n",
"岐阜県\t美濃\r\n",
"群馬県\t前橋\r\n",
"千葉県\t茂原\r\n",
"埼玉県\t鳩山\r\n",
"大阪府\t豊中\r\n",
"山梨県\t大月\r\n",
"山形県\t鶴岡\r\n",
"愛知県\t名古屋\r\n"
]
}
],
"source": [
"!paste col1.txt col2.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 14. 先頭からN行を出力\n",
"> 自然数Nをコマンドライン引数などの手段で受け取り,入力のうち先頭のN行だけを表示せよ.確認にはheadコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.302357",
"start_time": "2016-10-25T16:05:21.293557"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting p014.py\n"
]
}
],
"source": [
"%%writefile p014.py\n",
"import sys\n",
"\n",
"for i, line in enumerate(open('hightemp.txt'), start=1):\n",
" print line.strip()\n",
" if i == int(sys.argv[1]):\n",
" break"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.501334",
"start_time": "2016-10-25T16:05:21.305324"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\r\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\r\n",
"岐阜県\t多治見\t40.9\t2007-08-16\r\n",
"山形県\t山形\t40.8\t1933-07-25\r\n",
"山梨県\t甲府\t40.7\t2013-08-10\r\n"
]
}
],
"source": [
"!python p014.py 5"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.647588",
"start_time": "2016-10-25T16:05:21.506144"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\r\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\r\n",
"岐阜県\t多治見\t40.9\t2007-08-16\r\n",
"山形県\t山形\t40.8\t1933-07-25\r\n",
"山梨県\t甲府\t40.7\t2013-08-10\r\n"
]
}
],
"source": [
"!head -n 5 hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 15. 末尾のN行を出力\n",
"> 自然数Nをコマンドライン引数などの手段で受け取り,入力のうち末尾のN行だけを表示せよ.確認にはtailコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.671065",
"start_time": "2016-10-25T16:05:21.651278"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting p015.py\n"
]
}
],
"source": [
"%%writefile p015.py\n",
"from collections import deque\n",
"import sys\n",
"\n",
"d = deque(open('hightemp.txt'), int(sys.argv[1]))\n",
"for line in d:\n",
" print line.strip()"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:21.865527",
"start_time": "2016-10-25T16:05:21.674099"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"埼玉県\t鳩山\t39.9\t1997-07-05\r\n",
"大阪府\t豊中\t39.9\t1994-08-08\r\n",
"山梨県\t大月\t39.9\t1990-07-19\r\n",
"山形県\t鶴岡\t39.9\t1978-08-03\r\n",
"愛知県\t名古屋\t39.9\t1942-08-02\r\n"
]
}
],
"source": [
"!python p015.py 5"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.017014",
"start_time": "2016-10-25T16:05:21.869511"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"埼玉県\t鳩山\t39.9\t1997-07-05\r\n",
"大阪府\t豊中\t39.9\t1994-08-08\r\n",
"山梨県\t大月\t39.9\t1990-07-19\r\n",
"山形県\t鶴岡\t39.9\t1978-08-03\r\n",
"愛知県\t名古屋\t39.9\t1942-08-02\r\n"
]
}
],
"source": [
"!tail -n 5 hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 16. ファイルをN分割する\n",
"> 自然数Nをコマンドライン引数などの手段で受け取り,入力のファイルを行単位でN分割せよ.同様の処理をsplitコマンドで実現せよ."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.045498",
"start_time": "2016-10-25T16:05:22.020249"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\n",
"岐阜県\t多治見\t40.9\t2007-08-16\n",
"山形県\t山形\t40.8\t1933-07-25\n",
"山梨県\t甲府\t40.7\t2013-08-10\n",
"和歌山県\tかつらぎ\t40.6\t1994-08-08\n",
"静岡県\t天竜\t40.6\t1994-08-04\n",
"山梨県\t勝沼\t40.5\t2013-08-10\n",
"==========\n",
"埼玉県\t越谷\t40.4\t2007-08-16\n",
"群馬県\t館林\t40.3\t2007-08-16\n",
"群馬県\t上里見\t40.3\t1998-07-04\n",
"愛知県\t愛西\t40.3\t1994-08-05\n",
"千葉県\t牛久\t40.2\t2004-07-20\n",
"静岡県\t佐久間\t40.2\t2001-07-24\n",
"愛媛県\t宇和島\t40.2\t1927-07-22\n",
"山形県\t酒田\t40.1\t1978-08-03\n",
"==========\n",
"岐阜県\t美濃\t40\t2007-08-16\n",
"群馬県\t前橋\t40\t2001-07-24\n",
"千葉県\t茂原\t39.9\t2013-08-11\n",
"埼玉県\t鳩山\t39.9\t1997-07-05\n",
"大阪府\t豊中\t39.9\t1994-08-08\n",
"山梨県\t大月\t39.9\t1990-07-19\n",
"山形県\t鶴岡\t39.9\t1978-08-03\n",
"愛知県\t名古屋\t39.9\t1942-08-02\n",
"==========\n"
]
}
],
"source": [
"def distribute(amount, n_pieces):\n",
" \"\"\"\n",
" >>> distribute(12, 5)\n",
" [3, 3, 2, 2, 2]\n",
" >>> distribute(24, 3)\n",
" [8, 8, 8]\n",
" \"\"\"\n",
" div, mod = divmod(amount, n_pieces)\n",
" return [div + 1 if i < mod else div for i in range(n)]\n",
"\n",
"n = 3\n",
"for n_lines, _ in enumerate(open('hightemp.txt'), start=1): pass\n",
"\n",
"with open('hightemp.txt') as f:\n",
" for n_lines_partly in distribute(n_lines, n):\n",
" for _ in range(n_lines_partly):\n",
" print f.readline().strip()\n",
" print '=' * 10"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.228150",
"start_time": "2016-10-25T16:05:22.049648"
},
"collapsed": true
},
"outputs": [],
"source": [
"!split -n l/3 hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 17. 1列目の文字列の異なり\n",
"> 1列目の文字列の種類(異なる文字列の集合)を求めよ.確認にはsort, uniqコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.240221",
"start_time": "2016-10-25T16:05:22.231879"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"愛知県\n",
"山形県\n",
"岐阜県\n",
"千葉県\n",
"埼玉県\n",
"高知県\n",
"群馬県\n",
"山梨県\n",
"和歌山県\n",
"愛媛県\n",
"大阪府\n",
"静岡県\n"
]
}
],
"source": [
"words = set(fields.split()[0].strip() for fields in open('hightemp.txt'))\n",
"for word in words:\n",
" print word"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.378922",
"start_time": "2016-10-25T16:05:22.242917"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"千葉県\r\n",
"埼玉県\r\n",
"大阪府\r\n",
"山形県\r\n",
"山梨県\r\n",
"岐阜県\r\n",
"愛媛県\r\n",
"愛知県\r\n",
"群馬県\r\n",
"静岡県\r\n",
"高知県\r\n",
"和歌山県\r\n"
]
}
],
"source": [
"!cut -f1 hightemp.txt | sort | uniq"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 18. 各行を3コラム目の数値の降順にソート\n",
"> 各行を3コラム目の数値の逆順で整列せよ(注意: 各行の内容は変更せずに並び替えよ).確認にはsortコマンドを用いよ(この問題はコマンドで実行した時の結果と合わなくてもよい)."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.392527",
"start_time": "2016-10-25T16:05:22.382701"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\n",
"岐阜県\t多治見\t40.9\t2007-08-16\n",
"山形県\t山形\t40.8\t1933-07-25\n",
"山梨県\t甲府\t40.7\t2013-08-10\n",
"和歌山県\tかつらぎ\t40.6\t1994-08-08\n",
"静岡県\t天竜\t40.6\t1994-08-04\n",
"山梨県\t勝沼\t40.5\t2013-08-10\n",
"埼玉県\t越谷\t40.4\t2007-08-16\n",
"群馬県\t館林\t40.3\t2007-08-16\n",
"群馬県\t上里見\t40.3\t1998-07-04\n",
"愛知県\t愛西\t40.3\t1994-08-05\n",
"千葉県\t牛久\t40.2\t2004-07-20\n",
"静岡県\t佐久間\t40.2\t2001-07-24\n",
"愛媛県\t宇和島\t40.2\t1927-07-22\n",
"山形県\t酒田\t40.1\t1978-08-03\n",
"岐阜県\t美濃\t40\t2007-08-16\n",
"群馬県\t前橋\t40\t2001-07-24\n",
"千葉県\t茂原\t39.9\t2013-08-11\n",
"埼玉県\t鳩山\t39.9\t1997-07-05\n",
"大阪府\t豊中\t39.9\t1994-08-08\n",
"山梨県\t大月\t39.9\t1990-07-19\n",
"山形県\t鶴岡\t39.9\t1978-08-03\n",
"愛知県\t名古屋\t39.9\t1942-08-02\n"
]
}
],
"source": [
"lines = open('hightemp.txt').readlines()\n",
"lines.sort(key=lambda s: float(s.split()[2].strip()), reverse=True)\n",
"for line in lines:\n",
" print line.strip()"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.527345",
"start_time": "2016-10-25T16:05:22.395355"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\r\n",
"岐阜県\t多治見\t40.9\t2007-08-16\r\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\r\n",
"山形県\t山形\t40.8\t1933-07-25\r\n",
"山梨県\t甲府\t40.7\t2013-08-10\r\n",
"和歌山県\tかつらぎ\t40.6\t1994-08-08\r\n",
"静岡県\t天竜\t40.6\t1994-08-04\r\n",
"山梨県\t勝沼\t40.5\t2013-08-10\r\n",
"埼玉県\t越谷\t40.4\t2007-08-16\r\n",
"群馬県\t上里見\t40.3\t1998-07-04\r\n",
"群馬県\t館林\t40.3\t2007-08-16\r\n",
"愛知県\t愛西\t40.3\t1994-08-05\r\n",
"静岡県\t佐久間\t40.2\t2001-07-24\r\n",
"愛媛県\t宇和島\t40.2\t1927-07-22\r\n",
"千葉県\t牛久\t40.2\t2004-07-20\r\n",
"山形県\t酒田\t40.1\t1978-08-03\r\n",
"岐阜県\t美濃\t40\t2007-08-16\r\n",
"群馬県\t前橋\t40\t2001-07-24\r\n",
"愛知県\t名古屋\t39.9\t1942-08-02\r\n",
"千葉県\t茂原\t39.9\t2013-08-11\r\n",
"埼玉県\t鳩山\t39.9\t1997-07-05\r\n",
"大阪府\t豊中\t39.9\t1994-08-08\r\n",
"山梨県\t大月\t39.9\t1990-07-19\r\n",
"山形県\t鶴岡\t39.9\t1978-08-03\r\n"
]
}
],
"source": [
"!sort -k3 -nr hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 19. 各行の1コラム目の文字列の出現頻度を求め,出現頻度の高い順に並べる\n",
"> 各行の1列目の文字列の出現頻度を求め,その高い順に並べて表示せよ.確認にはcut, uniq, sortコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.542168",
"start_time": "2016-10-25T16:05:22.531552"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"3 山形県\n",
"3 埼玉県\n",
"3 群馬県\n",
"3 山梨県\n",
"2 愛知県\n",
"2 岐阜県\n",
"2 千葉県\n",
"2 静岡県\n",
"1 高知県\n",
"1 和歌山県\n",
"1 愛媛県\n",
"1 大阪府\n"
]
}
],
"source": [
"from collections import Counter\n",
"\n",
"c = Counter(line.split()[0] for line in open('hightemp.txt'))\n",
"for word, count in c.most_common():\n",
" print count, word"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {
"ExecuteTime": {
"end_time": "2016-10-25T16:05:22.682128",
"start_time": "2016-10-25T16:05:22.544959"
},
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 2 千葉県\r\n",
" 3 埼玉県\r\n",
" 1 大阪府\r\n",
" 3 山形県\r\n",
" 3 山梨県\r\n",
" 2 岐阜県\r\n",
" 1 愛媛県\r\n",
" 2 愛知県\r\n",
" 3 群馬県\r\n",
" 2 静岡県\r\n",
" 1 高知県\r\n",
" 1 和歌山県\r\n"
]
}
],
"source": [
"!cut -f1 hightemp.txt | sort | uniq -c"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.11"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment