Last active
November 16, 2015 03:11
-
-
Save reiyw/5fb693951d4b41282a0b to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"http://www.cl.ecei.tohoku.ac.jp/nlp100/" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"# 第2章: UNIXコマンドの基礎\n", | |
"\n", | |
"[hightemp.txt](http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt)は,日本の最高気温の記録を「都道府県」「地点」「℃」「日」のタブ区切り形式で格納したファイルである.以下の処理を行うプログラムを作成し,[hightemp.txt](http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt)を入力ファイルとして実行せよ.さらに,同様の処理をUNIXコマンドでも実行し,プログラムの実行結果を確認せよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 1, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"File ‘hightemp.txt’ already there; not retrieving.\r\n", | |
"\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!wget -nc http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 10. 行数のカウント\n", | |
"行数をカウントせよ.確認にはwcコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 2, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Overwriting 010.py\n" | |
] | |
} | |
], | |
"source": [ | |
"%%file 010.py\n", | |
"#!/usr/bin/env python\n", | |
"# -*- coding: utf-8 -*-\n", | |
"import pandas as pd\n", | |
"\n", | |
"df = pd.read_table('hightemp.txt', header=None,\n", | |
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n", | |
"print len(df)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 3, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%bash\n", | |
"diff <(python 010.py) <(grep -c '' 'hightemp.txt')" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 11. タブをスペースに置換\n", | |
"タブ1文字につきスペース1文字に置換せよ.確認にはsedコマンド,trコマンド,もしくはexpandコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 4, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Overwriting 011.py\n" | |
] | |
} | |
], | |
"source": [ | |
"%%file 011.py\n", | |
"#!/usr/bin/env python\n", | |
"# -*- coding: utf-8 -*-\n", | |
"import sys\n", | |
"import pandas as pd\n", | |
"\n", | |
"df = pd.read_table('hightemp.txt', header=None, dtype={'temp': str},\n", | |
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n", | |
"df.to_csv(sys.stdout, sep=' ', header=False, index=False)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 5, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%bash\n", | |
"diff <(python 011.py) <(sed 's/\\t/ /g' hightemp.txt)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 12. 1列目をcol1.txtに,2列目をcol2.txtに保存\n", | |
"各行の1列目だけを抜き出したものをcol1.txtに,2列目だけを抜き出したものをcol2.txtとしてファイルに保存せよ.確認にはcutコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 6, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Overwriting 012.py\n" | |
] | |
} | |
], | |
"source": [ | |
"%%file 012.py\n", | |
"#!/usr/bin/env python\n", | |
"# -*- coding: utf-8 -*-\n", | |
"import pandas as pd\n", | |
"\n", | |
"df = pd.read_table('hightemp.txt', header=None,\n", | |
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n", | |
"df.to_csv('col1.txt', columns=['pref'], header=None, index=False)\n", | |
"df.to_csv('col2.txt', columns=['city'], header=None, index=False)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 7, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%bash\n", | |
"python 012.py && diff <(cat col1.txt) <(cut -f1 hightemp.txt) && diff <(cat col2.txt) <(cut -f2 hightemp.txt)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 13. col1.txtとcol2.txtをマージ\n", | |
"12で作ったcol1.txtとcol2.txtを結合し,元のファイルの1列目と2列目をタブ区切りで並べたテキストファイルを作成せよ.確認にはpasteコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 8, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"Overwriting 013.py\n" | |
] | |
} | |
], | |
"source": [ | |
"%%file 013.py\n", | |
"#!/usr/bin/env python\n", | |
"# -*- coding: utf-8 -*-\n", | |
"import sys\n", | |
"import pandas as pd\n", | |
"\n", | |
"col1 = pd.read_table('col1.txt', header=None, names=['pref'])\n", | |
"col2 = pd.read_table('col2.txt', header=None, names=['city'])\n", | |
"pd.concat([col1, col2], axis=1).to_csv(sys.stdout, sep='\\t',\n", | |
" header=None, index=False)" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 9, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [], | |
"source": [ | |
"%%bash\n", | |
"diff <(python 013.py) <(paste col1.txt col2.txt)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 14. 先頭からN行を出力\n", | |
"自然数Nをコマンドライン引数などの手段で受け取り,入力のうち先頭のN行だけを表示せよ.確認にはheadコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 10, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>pref</th>\n", | |
" <th>city</th>\n", | |
" <th>temp</th>\n", | |
" <th>date</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>高知県</td>\n", | |
" <td>江川崎</td>\n", | |
" <td>41.0</td>\n", | |
" <td>2013-08-12</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>埼玉県</td>\n", | |
" <td>熊谷</td>\n", | |
" <td>40.9</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>岐阜県</td>\n", | |
" <td>多治見</td>\n", | |
" <td>40.9</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>山形県</td>\n", | |
" <td>山形</td>\n", | |
" <td>40.8</td>\n", | |
" <td>1933-07-25</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>山梨県</td>\n", | |
" <td>甲府</td>\n", | |
" <td>40.7</td>\n", | |
" <td>2013-08-10</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" pref city temp date\n", | |
"0 高知県 江川崎 41.0 2013-08-12\n", | |
"1 埼玉県 熊谷 40.9 2007-08-16\n", | |
"2 岐阜県 多治見 40.9 2007-08-16\n", | |
"3 山形県 山形 40.8 1933-07-25\n", | |
"4 山梨県 甲府 40.7 2013-08-10" | |
] | |
}, | |
"execution_count": 10, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"import pandas as pd\n", | |
"\n", | |
"df = pd.read_table('hightemp.txt', header=None,\n", | |
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n", | |
"df.head()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 11, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"高知県\t江川崎\t41\t2013-08-12\r\n", | |
"埼玉県\t熊谷\t40.9\t2007-08-16\r\n", | |
"岐阜県\t多治見\t40.9\t2007-08-16\r\n", | |
"山形県\t山形\t40.8\t1933-07-25\r\n", | |
"山梨県\t甲府\t40.7\t2013-08-10\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!head -5 hightemp.txt" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 15. 末尾のN行を出力\n", | |
"自然数Nをコマンドライン引数などの手段で受け取り,入力のうち末尾のN行だけを表示せよ.確認にはtailコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 12, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>pref</th>\n", | |
" <th>city</th>\n", | |
" <th>temp</th>\n", | |
" <th>date</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>埼玉県</td>\n", | |
" <td>鳩山</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1997-07-05</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>大阪府</td>\n", | |
" <td>豊中</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1994-08-08</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>山梨県</td>\n", | |
" <td>大月</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1990-07-19</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>山形県</td>\n", | |
" <td>鶴岡</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1978-08-03</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>愛知県</td>\n", | |
" <td>名古屋</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1942-08-02</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" pref city temp date\n", | |
"19 埼玉県 鳩山 39.9 1997-07-05\n", | |
"20 大阪府 豊中 39.9 1994-08-08\n", | |
"21 山梨県 大月 39.9 1990-07-19\n", | |
"22 山形県 鶴岡 39.9 1978-08-03\n", | |
"23 愛知県 名古屋 39.9 1942-08-02" | |
] | |
}, | |
"execution_count": 12, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.tail()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 13, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"埼玉県\t鳩山\t39.9\t1997-07-05\r\n", | |
"大阪府\t豊中\t39.9\t1994-08-08\r\n", | |
"山梨県\t大月\t39.9\t1990-07-19\r\n", | |
"山形県\t鶴岡\t39.9\t1978-08-03\r\n", | |
"愛知県\t名古屋\t39.9\t1942-08-02\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!tail -5 hightemp.txt" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 16. ファイルをN分割する\n", | |
"自然数Nをコマンドライン引数などの手段で受け取り,入力のファイルを行単位でN分割せよ.同様の処理をsplitコマンドで実現せよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 14, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" pref city temp date\n", | |
"0 高知県 江川崎 41.0 2013-08-12\n", | |
"1 埼玉県 熊谷 40.9 2007-08-16\n", | |
"2 岐阜県 多治見 40.9 2007-08-16\n", | |
"3 山形県 山形 40.8 1933-07-25\n", | |
"4 山梨県 甲府 40.7 2013-08-10\n", | |
" pref city temp date\n", | |
"0 和歌山県 かつらぎ 40.6 1994-08-08\n", | |
"1 静岡県 天竜 40.6 1994-08-04\n", | |
"2 山梨県 勝沼 40.5 2013-08-10\n", | |
"3 埼玉県 越谷 40.4 2007-08-16\n", | |
"4 群馬県 館林 40.3 2007-08-16\n", | |
" pref city temp date\n", | |
"0 群馬県 上里見 40.3 1998-07-04\n", | |
"1 愛知県 愛西 40.3 1994-08-05\n", | |
"2 千葉県 牛久 40.2 2004-07-20\n", | |
"3 静岡県 佐久間 40.2 2001-07-24\n", | |
"4 愛媛県 宇和島 40.2 1927-07-22\n", | |
" pref city temp date\n", | |
"0 山形県 酒田 40.1 1978-08-03\n", | |
"1 岐阜県 美濃 40.0 2007-08-16\n", | |
"2 群馬県 前橋 40.0 2001-07-24\n", | |
"3 千葉県 茂原 39.9 2013-08-11\n", | |
"4 埼玉県 鳩山 39.9 1997-07-05\n", | |
" pref city temp date\n", | |
"0 大阪府 豊中 39.9 1994-08-08\n", | |
"1 山梨県 大月 39.9 1990-07-19\n", | |
"2 山形県 鶴岡 39.9 1978-08-03\n", | |
"3 愛知県 名古屋 39.9 1942-08-02\n" | |
] | |
} | |
], | |
"source": [ | |
"N = 5\n", | |
"n_lines = -(-len(df) // N)\n", | |
"reader = pd.read_table('hightemp.txt', header=None, chunksize=n_lines,\n", | |
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n", | |
"for chunk in reader:\n", | |
" print chunk" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 15, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"高知県\t江川崎\t41\t2013-08-12\n", | |
"埼玉県\t熊谷\t40.9\t2007-08-16\n", | |
"岐阜県\t多治見\t40.9\t2007-08-16\n", | |
"山形県\t山形\t40.8\t1933-07-25\n", | |
"山梨県\t甲府\t40.7\t2013-08-10\n", | |
"和歌山県\tかつらぎ\t40.6\t1994-08-08\n", | |
"静岡県\t天竜\t40.6\t1994-08-04\n", | |
"山梨県\t勝沼\t40.5\t2013-08-10\n", | |
"埼玉県\t越谷\t40.4\t2007-08-16\n", | |
"群馬県\t館林\t40.3\t2007-08-16\n", | |
"群馬県\t上里見\t40.3\t1998-07-04\n", | |
"愛知県\t愛西\t40.3\t1994-08-05\n", | |
"千葉県\t牛久\t40.2\t2004-07-20\n", | |
"静岡県\t佐久間\t40.2\t2001-07-24\n", | |
"愛媛県\t宇和島\t40.2\t1927-07-22\n", | |
"山形県\t酒田\t40.1\t1978-08-03\n", | |
"岐阜県\t美濃\t40\t2007-08-16\n", | |
"群馬県\t前橋\t40\t2001-07-24\n", | |
"千葉県\t茂原\t39.9\t2013-08-11\n", | |
"埼玉県\t鳩山\t39.9\t1997-07-05\n", | |
"大阪府\t豊中\t39.9\t1994-08-08\n", | |
"山梨県\t大月\t39.9\t1990-07-19\n", | |
"山形県\t鶴岡\t39.9\t1978-08-03\n", | |
"愛知県\t名古屋\t39.9\t1942-08-02\n" | |
] | |
} | |
], | |
"source": [ | |
"%%bash\n", | |
"split -n l/5 hightemp.txt && cat xa{a..e}" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 17. 1列目の文字列の異なり\n", | |
"1列目の文字列の種類(異なる文字列の集合)を求めよ.確認にはsort, uniqコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 16, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"[\n", | |
" \"愛知県\", \n", | |
" \"山形県\", \n", | |
" \"岐阜県\", \n", | |
" \"千葉県\", \n", | |
" \"埼玉県\", \n", | |
" \"高知県\", \n", | |
" \"群馬県\", \n", | |
" \"山梨県\", \n", | |
" \"和歌山県\", \n", | |
" \"愛媛県\", \n", | |
" \"大阪府\", \n", | |
" \"静岡県\"\n", | |
"]\n" | |
] | |
} | |
], | |
"source": [ | |
"from prettyprint.prettyprint import pp\n", | |
"\n", | |
"pp(set(df['pref']))" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 17, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
"千葉県\r\n", | |
"和歌山県\r\n", | |
"埼玉県\r\n", | |
"大阪府\r\n", | |
"山形県\r\n", | |
"山梨県\r\n", | |
"岐阜県\r\n", | |
"愛媛県\r\n", | |
"愛知県\r\n", | |
"群馬県\r\n", | |
"静岡県\r\n", | |
"高知県\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!cut -f1 hightemp.txt | sort | uniq | sort" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 18. 各行を3コラム目の数値の降順にソート\n", | |
"各行を3コラム目の数値の逆順で整列せよ(注意: 各行の内容は変更せずに並び替えよ).確認にはsortコマンドを用いよ(この問題はコマンドで実行した時の結果と合わなくてもよい)." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 18, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/html": [ | |
"<div>\n", | |
"<table border=\"1\" class=\"dataframe\">\n", | |
" <thead>\n", | |
" <tr style=\"text-align: right;\">\n", | |
" <th></th>\n", | |
" <th>pref</th>\n", | |
" <th>city</th>\n", | |
" <th>temp</th>\n", | |
" <th>date</th>\n", | |
" </tr>\n", | |
" </thead>\n", | |
" <tbody>\n", | |
" <tr>\n", | |
" <th>0</th>\n", | |
" <td>高知県</td>\n", | |
" <td>江川崎</td>\n", | |
" <td>41.0</td>\n", | |
" <td>2013-08-12</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>2</th>\n", | |
" <td>岐阜県</td>\n", | |
" <td>多治見</td>\n", | |
" <td>40.9</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>1</th>\n", | |
" <td>埼玉県</td>\n", | |
" <td>熊谷</td>\n", | |
" <td>40.9</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>3</th>\n", | |
" <td>山形県</td>\n", | |
" <td>山形</td>\n", | |
" <td>40.8</td>\n", | |
" <td>1933-07-25</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>4</th>\n", | |
" <td>山梨県</td>\n", | |
" <td>甲府</td>\n", | |
" <td>40.7</td>\n", | |
" <td>2013-08-10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>5</th>\n", | |
" <td>和歌山県</td>\n", | |
" <td>かつらぎ</td>\n", | |
" <td>40.6</td>\n", | |
" <td>1994-08-08</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>6</th>\n", | |
" <td>静岡県</td>\n", | |
" <td>天竜</td>\n", | |
" <td>40.6</td>\n", | |
" <td>1994-08-04</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>7</th>\n", | |
" <td>山梨県</td>\n", | |
" <td>勝沼</td>\n", | |
" <td>40.5</td>\n", | |
" <td>2013-08-10</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>8</th>\n", | |
" <td>埼玉県</td>\n", | |
" <td>越谷</td>\n", | |
" <td>40.4</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>9</th>\n", | |
" <td>群馬県</td>\n", | |
" <td>館林</td>\n", | |
" <td>40.3</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>10</th>\n", | |
" <td>群馬県</td>\n", | |
" <td>上里見</td>\n", | |
" <td>40.3</td>\n", | |
" <td>1998-07-04</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>11</th>\n", | |
" <td>愛知県</td>\n", | |
" <td>愛西</td>\n", | |
" <td>40.3</td>\n", | |
" <td>1994-08-05</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>14</th>\n", | |
" <td>愛媛県</td>\n", | |
" <td>宇和島</td>\n", | |
" <td>40.2</td>\n", | |
" <td>1927-07-22</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>12</th>\n", | |
" <td>千葉県</td>\n", | |
" <td>牛久</td>\n", | |
" <td>40.2</td>\n", | |
" <td>2004-07-20</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>13</th>\n", | |
" <td>静岡県</td>\n", | |
" <td>佐久間</td>\n", | |
" <td>40.2</td>\n", | |
" <td>2001-07-24</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>15</th>\n", | |
" <td>山形県</td>\n", | |
" <td>酒田</td>\n", | |
" <td>40.1</td>\n", | |
" <td>1978-08-03</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>16</th>\n", | |
" <td>岐阜県</td>\n", | |
" <td>美濃</td>\n", | |
" <td>40.0</td>\n", | |
" <td>2007-08-16</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>17</th>\n", | |
" <td>群馬県</td>\n", | |
" <td>前橋</td>\n", | |
" <td>40.0</td>\n", | |
" <td>2001-07-24</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>18</th>\n", | |
" <td>千葉県</td>\n", | |
" <td>茂原</td>\n", | |
" <td>39.9</td>\n", | |
" <td>2013-08-11</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>19</th>\n", | |
" <td>埼玉県</td>\n", | |
" <td>鳩山</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1997-07-05</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>20</th>\n", | |
" <td>大阪府</td>\n", | |
" <td>豊中</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1994-08-08</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>21</th>\n", | |
" <td>山梨県</td>\n", | |
" <td>大月</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1990-07-19</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>22</th>\n", | |
" <td>山形県</td>\n", | |
" <td>鶴岡</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1978-08-03</td>\n", | |
" </tr>\n", | |
" <tr>\n", | |
" <th>23</th>\n", | |
" <td>愛知県</td>\n", | |
" <td>名古屋</td>\n", | |
" <td>39.9</td>\n", | |
" <td>1942-08-02</td>\n", | |
" </tr>\n", | |
" </tbody>\n", | |
"</table>\n", | |
"</div>" | |
], | |
"text/plain": [ | |
" pref city temp date\n", | |
"0 高知県 江川崎 41.0 2013-08-12\n", | |
"2 岐阜県 多治見 40.9 2007-08-16\n", | |
"1 埼玉県 熊谷 40.9 2007-08-16\n", | |
"3 山形県 山形 40.8 1933-07-25\n", | |
"4 山梨県 甲府 40.7 2013-08-10\n", | |
"5 和歌山県 かつらぎ 40.6 1994-08-08\n", | |
"6 静岡県 天竜 40.6 1994-08-04\n", | |
"7 山梨県 勝沼 40.5 2013-08-10\n", | |
"8 埼玉県 越谷 40.4 2007-08-16\n", | |
"9 群馬県 館林 40.3 2007-08-16\n", | |
"10 群馬県 上里見 40.3 1998-07-04\n", | |
"11 愛知県 愛西 40.3 1994-08-05\n", | |
"14 愛媛県 宇和島 40.2 1927-07-22\n", | |
"12 千葉県 牛久 40.2 2004-07-20\n", | |
"13 静岡県 佐久間 40.2 2001-07-24\n", | |
"15 山形県 酒田 40.1 1978-08-03\n", | |
"16 岐阜県 美濃 40.0 2007-08-16\n", | |
"17 群馬県 前橋 40.0 2001-07-24\n", | |
"18 千葉県 茂原 39.9 2013-08-11\n", | |
"19 埼玉県 鳩山 39.9 1997-07-05\n", | |
"20 大阪府 豊中 39.9 1994-08-08\n", | |
"21 山梨県 大月 39.9 1990-07-19\n", | |
"22 山形県 鶴岡 39.9 1978-08-03\n", | |
"23 愛知県 名古屋 39.9 1942-08-02" | |
] | |
}, | |
"execution_count": 18, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df.sort_values(by='temp', ascending=False)" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": {}, | |
"source": [ | |
"## 19. 各行の1コラム目の文字列の出現頻度を求め,出現頻度の高い順に並べる\n", | |
"各行の1列目の文字列の出現頻度を求め,その高い順に並べて表示せよ.確認にはcut, uniq, sortコマンドを用いよ." | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 19, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"data": { | |
"text/plain": [ | |
"山形県 3\n", | |
"群馬県 3\n", | |
"埼玉県 3\n", | |
"山梨県 3\n", | |
"千葉県 2\n", | |
"岐阜県 2\n", | |
"愛知県 2\n", | |
"静岡県 2\n", | |
"愛媛県 1\n", | |
"高知県 1\n", | |
"和歌山県 1\n", | |
"大阪府 1\n", | |
"Name: pref, dtype: int64" | |
] | |
}, | |
"execution_count": 19, | |
"metadata": {}, | |
"output_type": "execute_result" | |
} | |
], | |
"source": [ | |
"df['pref'].value_counts()" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": 20, | |
"metadata": { | |
"collapsed": false | |
}, | |
"outputs": [ | |
{ | |
"name": "stdout", | |
"output_type": "stream", | |
"text": [ | |
" 3 群馬県\r\n", | |
" 3 山梨県\r\n", | |
" 3 山形県\r\n", | |
" 3 埼玉県\r\n", | |
" 2 静岡県\r\n", | |
" 2 愛知県\r\n", | |
" 2 岐阜県\r\n", | |
" 2 千葉県\r\n", | |
" 1 高知県\r\n", | |
" 1 愛媛県\r\n", | |
" 1 大阪府\r\n", | |
" 1 和歌山県\r\n" | |
] | |
} | |
], | |
"source": [ | |
"!cut -f1 hightemp.txt | sort | uniq -c | sort -nr" | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"display_name": "Python 2", | |
"language": "python", | |
"name": "python2" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 2 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython2", | |
"version": "2.7.10" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 0 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment