Skip to content

Instantly share code, notes, and snippets.

@reiyw
Last active November 16, 2015 03:11
Show Gist options
  • Save reiyw/5fb693951d4b41282a0b to your computer and use it in GitHub Desktop.
Save reiyw/5fb693951d4b41282a0b to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"http://www.cl.ecei.tohoku.ac.jp/nlp100/"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 第2章: UNIXコマンドの基礎\n",
"\n",
"[hightemp.txt](http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt)は,日本の最高気温の記録を「都道府県」「地点」「℃」「日」のタブ区切り形式で格納したファイルである.以下の処理を行うプログラムを作成し,[hightemp.txt](http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt)を入力ファイルとして実行せよ.さらに,同様の処理をUNIXコマンドでも実行し,プログラムの実行結果を確認せよ."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"File ‘hightemp.txt’ already there; not retrieving.\r\n",
"\r\n"
]
}
],
"source": [
"!wget -nc http://www.cl.ecei.tohoku.ac.jp/nlp100/data/hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 10. 行数のカウント\n",
"行数をカウントせよ.確認にはwcコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting 010.py\n"
]
}
],
"source": [
"%%file 010.py\n",
"#!/usr/bin/env python\n",
"# -*- coding: utf-8 -*-\n",
"import pandas as pd\n",
"\n",
"df = pd.read_table('hightemp.txt', header=None,\n",
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n",
"print len(df)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"diff <(python 010.py) <(grep -c '' 'hightemp.txt')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 11. タブをスペースに置換\n",
"タブ1文字につきスペース1文字に置換せよ.確認にはsedコマンド,trコマンド,もしくはexpandコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting 011.py\n"
]
}
],
"source": [
"%%file 011.py\n",
"#!/usr/bin/env python\n",
"# -*- coding: utf-8 -*-\n",
"import sys\n",
"import pandas as pd\n",
"\n",
"df = pd.read_table('hightemp.txt', header=None, dtype={'temp': str},\n",
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n",
"df.to_csv(sys.stdout, sep=' ', header=False, index=False)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"diff <(python 011.py) <(sed 's/\\t/ /g' hightemp.txt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 12. 1列目をcol1.txtに,2列目をcol2.txtに保存\n",
"各行の1列目だけを抜き出したものをcol1.txtに,2列目だけを抜き出したものをcol2.txtとしてファイルに保存せよ.確認にはcutコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting 012.py\n"
]
}
],
"source": [
"%%file 012.py\n",
"#!/usr/bin/env python\n",
"# -*- coding: utf-8 -*-\n",
"import pandas as pd\n",
"\n",
"df = pd.read_table('hightemp.txt', header=None,\n",
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n",
"df.to_csv('col1.txt', columns=['pref'], header=None, index=False)\n",
"df.to_csv('col2.txt', columns=['city'], header=None, index=False)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"python 012.py && diff <(cat col1.txt) <(cut -f1 hightemp.txt) && diff <(cat col2.txt) <(cut -f2 hightemp.txt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 13. col1.txtとcol2.txtをマージ\n",
"12で作ったcol1.txtとcol2.txtを結合し,元のファイルの1列目と2列目をタブ区切りで並べたテキストファイルを作成せよ.確認にはpasteコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Overwriting 013.py\n"
]
}
],
"source": [
"%%file 013.py\n",
"#!/usr/bin/env python\n",
"# -*- coding: utf-8 -*-\n",
"import sys\n",
"import pandas as pd\n",
"\n",
"col1 = pd.read_table('col1.txt', header=None, names=['pref'])\n",
"col2 = pd.read_table('col2.txt', header=None, names=['city'])\n",
"pd.concat([col1, col2], axis=1).to_csv(sys.stdout, sep='\\t',\n",
" header=None, index=False)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%%bash\n",
"diff <(python 013.py) <(paste col1.txt col2.txt)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 14. 先頭からN行を出力\n",
"自然数Nをコマンドライン引数などの手段で受け取り,入力のうち先頭のN行だけを表示せよ.確認にはheadコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pref</th>\n",
" <th>city</th>\n",
" <th>temp</th>\n",
" <th>date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>高知県</td>\n",
" <td>江川崎</td>\n",
" <td>41.0</td>\n",
" <td>2013-08-12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>埼玉県</td>\n",
" <td>熊谷</td>\n",
" <td>40.9</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>岐阜県</td>\n",
" <td>多治見</td>\n",
" <td>40.9</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>山形県</td>\n",
" <td>山形</td>\n",
" <td>40.8</td>\n",
" <td>1933-07-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>山梨県</td>\n",
" <td>甲府</td>\n",
" <td>40.7</td>\n",
" <td>2013-08-10</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" pref city temp date\n",
"0 高知県 江川崎 41.0 2013-08-12\n",
"1 埼玉県 熊谷 40.9 2007-08-16\n",
"2 岐阜県 多治見 40.9 2007-08-16\n",
"3 山形県 山形 40.8 1933-07-25\n",
"4 山梨県 甲府 40.7 2013-08-10"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"import pandas as pd\n",
"\n",
"df = pd.read_table('hightemp.txt', header=None,\n",
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n",
"df.head()"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\r\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\r\n",
"岐阜県\t多治見\t40.9\t2007-08-16\r\n",
"山形県\t山形\t40.8\t1933-07-25\r\n",
"山梨県\t甲府\t40.7\t2013-08-10\r\n"
]
}
],
"source": [
"!head -5 hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 15. 末尾のN行を出力\n",
"自然数Nをコマンドライン引数などの手段で受け取り,入力のうち末尾のN行だけを表示せよ.確認にはtailコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pref</th>\n",
" <th>city</th>\n",
" <th>temp</th>\n",
" <th>date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>埼玉県</td>\n",
" <td>鳩山</td>\n",
" <td>39.9</td>\n",
" <td>1997-07-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>大阪府</td>\n",
" <td>豊中</td>\n",
" <td>39.9</td>\n",
" <td>1994-08-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>山梨県</td>\n",
" <td>大月</td>\n",
" <td>39.9</td>\n",
" <td>1990-07-19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>山形県</td>\n",
" <td>鶴岡</td>\n",
" <td>39.9</td>\n",
" <td>1978-08-03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>愛知県</td>\n",
" <td>名古屋</td>\n",
" <td>39.9</td>\n",
" <td>1942-08-02</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" pref city temp date\n",
"19 埼玉県 鳩山 39.9 1997-07-05\n",
"20 大阪府 豊中 39.9 1994-08-08\n",
"21 山梨県 大月 39.9 1990-07-19\n",
"22 山形県 鶴岡 39.9 1978-08-03\n",
"23 愛知県 名古屋 39.9 1942-08-02"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.tail()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"埼玉県\t鳩山\t39.9\t1997-07-05\r\n",
"大阪府\t豊中\t39.9\t1994-08-08\r\n",
"山梨県\t大月\t39.9\t1990-07-19\r\n",
"山形県\t鶴岡\t39.9\t1978-08-03\r\n",
"愛知県\t名古屋\t39.9\t1942-08-02\r\n"
]
}
],
"source": [
"!tail -5 hightemp.txt"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 16. ファイルをN分割する\n",
"自然数Nをコマンドライン引数などの手段で受け取り,入力のファイルを行単位でN分割せよ.同様の処理をsplitコマンドで実現せよ."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" pref city temp date\n",
"0 高知県 江川崎 41.0 2013-08-12\n",
"1 埼玉県 熊谷 40.9 2007-08-16\n",
"2 岐阜県 多治見 40.9 2007-08-16\n",
"3 山形県 山形 40.8 1933-07-25\n",
"4 山梨県 甲府 40.7 2013-08-10\n",
" pref city temp date\n",
"0 和歌山県 かつらぎ 40.6 1994-08-08\n",
"1 静岡県 天竜 40.6 1994-08-04\n",
"2 山梨県 勝沼 40.5 2013-08-10\n",
"3 埼玉県 越谷 40.4 2007-08-16\n",
"4 群馬県 館林 40.3 2007-08-16\n",
" pref city temp date\n",
"0 群馬県 上里見 40.3 1998-07-04\n",
"1 愛知県 愛西 40.3 1994-08-05\n",
"2 千葉県 牛久 40.2 2004-07-20\n",
"3 静岡県 佐久間 40.2 2001-07-24\n",
"4 愛媛県 宇和島 40.2 1927-07-22\n",
" pref city temp date\n",
"0 山形県 酒田 40.1 1978-08-03\n",
"1 岐阜県 美濃 40.0 2007-08-16\n",
"2 群馬県 前橋 40.0 2001-07-24\n",
"3 千葉県 茂原 39.9 2013-08-11\n",
"4 埼玉県 鳩山 39.9 1997-07-05\n",
" pref city temp date\n",
"0 大阪府 豊中 39.9 1994-08-08\n",
"1 山梨県 大月 39.9 1990-07-19\n",
"2 山形県 鶴岡 39.9 1978-08-03\n",
"3 愛知県 名古屋 39.9 1942-08-02\n"
]
}
],
"source": [
"N = 5\n",
"n_lines = -(-len(df) // N)\n",
"reader = pd.read_table('hightemp.txt', header=None, chunksize=n_lines,\n",
" names=['pref', 'city', 'temp', 'date'], parse_dates=[3])\n",
"for chunk in reader:\n",
" print chunk"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"高知県\t江川崎\t41\t2013-08-12\n",
"埼玉県\t熊谷\t40.9\t2007-08-16\n",
"岐阜県\t多治見\t40.9\t2007-08-16\n",
"山形県\t山形\t40.8\t1933-07-25\n",
"山梨県\t甲府\t40.7\t2013-08-10\n",
"和歌山県\tかつらぎ\t40.6\t1994-08-08\n",
"静岡県\t天竜\t40.6\t1994-08-04\n",
"山梨県\t勝沼\t40.5\t2013-08-10\n",
"埼玉県\t越谷\t40.4\t2007-08-16\n",
"群馬県\t館林\t40.3\t2007-08-16\n",
"群馬県\t上里見\t40.3\t1998-07-04\n",
"愛知県\t愛西\t40.3\t1994-08-05\n",
"千葉県\t牛久\t40.2\t2004-07-20\n",
"静岡県\t佐久間\t40.2\t2001-07-24\n",
"愛媛県\t宇和島\t40.2\t1927-07-22\n",
"山形県\t酒田\t40.1\t1978-08-03\n",
"岐阜県\t美濃\t40\t2007-08-16\n",
"群馬県\t前橋\t40\t2001-07-24\n",
"千葉県\t茂原\t39.9\t2013-08-11\n",
"埼玉県\t鳩山\t39.9\t1997-07-05\n",
"大阪府\t豊中\t39.9\t1994-08-08\n",
"山梨県\t大月\t39.9\t1990-07-19\n",
"山形県\t鶴岡\t39.9\t1978-08-03\n",
"愛知県\t名古屋\t39.9\t1942-08-02\n"
]
}
],
"source": [
"%%bash\n",
"split -n l/5 hightemp.txt && cat xa{a..e}"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 17. 1列目の文字列の異なり\n",
"1列目の文字列の種類(異なる文字列の集合)を求めよ.確認にはsort, uniqコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[\n",
" \"愛知県\", \n",
" \"山形県\", \n",
" \"岐阜県\", \n",
" \"千葉県\", \n",
" \"埼玉県\", \n",
" \"高知県\", \n",
" \"群馬県\", \n",
" \"山梨県\", \n",
" \"和歌山県\", \n",
" \"愛媛県\", \n",
" \"大阪府\", \n",
" \"静岡県\"\n",
"]\n"
]
}
],
"source": [
"from prettyprint.prettyprint import pp\n",
"\n",
"pp(set(df['pref']))"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"千葉県\r\n",
"和歌山県\r\n",
"埼玉県\r\n",
"大阪府\r\n",
"山形県\r\n",
"山梨県\r\n",
"岐阜県\r\n",
"愛媛県\r\n",
"愛知県\r\n",
"群馬県\r\n",
"静岡県\r\n",
"高知県\r\n"
]
}
],
"source": [
"!cut -f1 hightemp.txt | sort | uniq | sort"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 18. 各行を3コラム目の数値の降順にソート\n",
"各行を3コラム目の数値の逆順で整列せよ(注意: 各行の内容は変更せずに並び替えよ).確認にはsortコマンドを用いよ(この問題はコマンドで実行した時の結果と合わなくてもよい)."
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>pref</th>\n",
" <th>city</th>\n",
" <th>temp</th>\n",
" <th>date</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>高知県</td>\n",
" <td>江川崎</td>\n",
" <td>41.0</td>\n",
" <td>2013-08-12</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>岐阜県</td>\n",
" <td>多治見</td>\n",
" <td>40.9</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>埼玉県</td>\n",
" <td>熊谷</td>\n",
" <td>40.9</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>山形県</td>\n",
" <td>山形</td>\n",
" <td>40.8</td>\n",
" <td>1933-07-25</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>山梨県</td>\n",
" <td>甲府</td>\n",
" <td>40.7</td>\n",
" <td>2013-08-10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>和歌山県</td>\n",
" <td>かつらぎ</td>\n",
" <td>40.6</td>\n",
" <td>1994-08-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>静岡県</td>\n",
" <td>天竜</td>\n",
" <td>40.6</td>\n",
" <td>1994-08-04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>山梨県</td>\n",
" <td>勝沼</td>\n",
" <td>40.5</td>\n",
" <td>2013-08-10</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>埼玉県</td>\n",
" <td>越谷</td>\n",
" <td>40.4</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>9</th>\n",
" <td>群馬県</td>\n",
" <td>館林</td>\n",
" <td>40.3</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>10</th>\n",
" <td>群馬県</td>\n",
" <td>上里見</td>\n",
" <td>40.3</td>\n",
" <td>1998-07-04</td>\n",
" </tr>\n",
" <tr>\n",
" <th>11</th>\n",
" <td>愛知県</td>\n",
" <td>愛西</td>\n",
" <td>40.3</td>\n",
" <td>1994-08-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>14</th>\n",
" <td>愛媛県</td>\n",
" <td>宇和島</td>\n",
" <td>40.2</td>\n",
" <td>1927-07-22</td>\n",
" </tr>\n",
" <tr>\n",
" <th>12</th>\n",
" <td>千葉県</td>\n",
" <td>牛久</td>\n",
" <td>40.2</td>\n",
" <td>2004-07-20</td>\n",
" </tr>\n",
" <tr>\n",
" <th>13</th>\n",
" <td>静岡県</td>\n",
" <td>佐久間</td>\n",
" <td>40.2</td>\n",
" <td>2001-07-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>15</th>\n",
" <td>山形県</td>\n",
" <td>酒田</td>\n",
" <td>40.1</td>\n",
" <td>1978-08-03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>16</th>\n",
" <td>岐阜県</td>\n",
" <td>美濃</td>\n",
" <td>40.0</td>\n",
" <td>2007-08-16</td>\n",
" </tr>\n",
" <tr>\n",
" <th>17</th>\n",
" <td>群馬県</td>\n",
" <td>前橋</td>\n",
" <td>40.0</td>\n",
" <td>2001-07-24</td>\n",
" </tr>\n",
" <tr>\n",
" <th>18</th>\n",
" <td>千葉県</td>\n",
" <td>茂原</td>\n",
" <td>39.9</td>\n",
" <td>2013-08-11</td>\n",
" </tr>\n",
" <tr>\n",
" <th>19</th>\n",
" <td>埼玉県</td>\n",
" <td>鳩山</td>\n",
" <td>39.9</td>\n",
" <td>1997-07-05</td>\n",
" </tr>\n",
" <tr>\n",
" <th>20</th>\n",
" <td>大阪府</td>\n",
" <td>豊中</td>\n",
" <td>39.9</td>\n",
" <td>1994-08-08</td>\n",
" </tr>\n",
" <tr>\n",
" <th>21</th>\n",
" <td>山梨県</td>\n",
" <td>大月</td>\n",
" <td>39.9</td>\n",
" <td>1990-07-19</td>\n",
" </tr>\n",
" <tr>\n",
" <th>22</th>\n",
" <td>山形県</td>\n",
" <td>鶴岡</td>\n",
" <td>39.9</td>\n",
" <td>1978-08-03</td>\n",
" </tr>\n",
" <tr>\n",
" <th>23</th>\n",
" <td>愛知県</td>\n",
" <td>名古屋</td>\n",
" <td>39.9</td>\n",
" <td>1942-08-02</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" pref city temp date\n",
"0 高知県 江川崎 41.0 2013-08-12\n",
"2 岐阜県 多治見 40.9 2007-08-16\n",
"1 埼玉県 熊谷 40.9 2007-08-16\n",
"3 山形県 山形 40.8 1933-07-25\n",
"4 山梨県 甲府 40.7 2013-08-10\n",
"5 和歌山県 かつらぎ 40.6 1994-08-08\n",
"6 静岡県 天竜 40.6 1994-08-04\n",
"7 山梨県 勝沼 40.5 2013-08-10\n",
"8 埼玉県 越谷 40.4 2007-08-16\n",
"9 群馬県 館林 40.3 2007-08-16\n",
"10 群馬県 上里見 40.3 1998-07-04\n",
"11 愛知県 愛西 40.3 1994-08-05\n",
"14 愛媛県 宇和島 40.2 1927-07-22\n",
"12 千葉県 牛久 40.2 2004-07-20\n",
"13 静岡県 佐久間 40.2 2001-07-24\n",
"15 山形県 酒田 40.1 1978-08-03\n",
"16 岐阜県 美濃 40.0 2007-08-16\n",
"17 群馬県 前橋 40.0 2001-07-24\n",
"18 千葉県 茂原 39.9 2013-08-11\n",
"19 埼玉県 鳩山 39.9 1997-07-05\n",
"20 大阪府 豊中 39.9 1994-08-08\n",
"21 山梨県 大月 39.9 1990-07-19\n",
"22 山形県 鶴岡 39.9 1978-08-03\n",
"23 愛知県 名古屋 39.9 1942-08-02"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df.sort_values(by='temp', ascending=False)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 19. 各行の1コラム目の文字列の出現頻度を求め,出現頻度の高い順に並べる\n",
"各行の1列目の文字列の出現頻度を求め,その高い順に並べて表示せよ.確認にはcut, uniq, sortコマンドを用いよ."
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {
"collapsed": false
},
"outputs": [
{
"data": {
"text/plain": [
"山形県 3\n",
"群馬県 3\n",
"埼玉県 3\n",
"山梨県 3\n",
"千葉県 2\n",
"岐阜県 2\n",
"愛知県 2\n",
"静岡県 2\n",
"愛媛県 1\n",
"高知県 1\n",
"和歌山県 1\n",
"大阪府 1\n",
"Name: pref, dtype: int64"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df['pref'].value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {
"collapsed": false
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" 3 群馬県\r\n",
" 3 山梨県\r\n",
" 3 山形県\r\n",
" 3 埼玉県\r\n",
" 2 静岡県\r\n",
" 2 愛知県\r\n",
" 2 岐阜県\r\n",
" 2 千葉県\r\n",
" 1 高知県\r\n",
" 1 愛媛県\r\n",
" 1 大阪府\r\n",
" 1 和歌山県\r\n"
]
}
],
"source": [
"!cut -f1 hightemp.txt | sort | uniq -c | sort -nr"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 2",
"language": "python",
"name": "python2"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 2
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython2",
"version": "2.7.10"
}
},
"nbformat": 4,
"nbformat_minor": 0
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment