Cartman0/02_Unixコマンドの基礎.ipynb

## 02_Unixコマンドの基礎.ipynb
{
  "cells": [
    {
      "metadata": {
        "toc": "true"
      },
      "cell_type": "markdown",
      "source": "# Table of Contents\n <p><div class=\"lev1\"><a href=\"#2章-Unixコマンドの基礎-1\"><span class=\"toc-item-num\">1&nbsp;&nbsp;</span>2章 Unixコマンドの基礎</a></div><div class=\"lev2\"><a href=\"#10.-行数のカウント-1.1\"><span class=\"toc-item-num\">1.1&nbsp;&nbsp;</span>10. 行数のカウント</a></div><div class=\"lev3\"><a href=\"#powershell-1.1.1\"><span class=\"toc-item-num\">1.1.1&nbsp;&nbsp;</span>powershell</a></div><div class=\"lev2\"><a href=\"#11.-タブをスペースに置換-1.2\"><span class=\"toc-item-num\">1.2&nbsp;&nbsp;</span>11. タブをスペースに置換</a></div><div class=\"lev3\"><a href=\"#powershell-1.2.1\"><span class=\"toc-item-num\">1.2.1&nbsp;&nbsp;</span>powershell</a></div><div class=\"lev2\"><a href=\"#12.-1列目をcol1.txtに，2列目をcol2.txtに保存-1.3\"><span class=\"toc-item-num\">1.3&nbsp;&nbsp;</span>12. 1列目をcol1.txtに，2列目をcol2.txtに保存</a></div><div class=\"lev3\"><a href=\"#powershell-の場合-1.3.1\"><span class=\"toc-item-num\">1.3.1&nbsp;&nbsp;</span>powershell の場合</a></div><div class=\"lev2\"><a href=\"#13.-col1.txtとcol2.txtをマージ-1.4\"><span class=\"toc-item-num\">1.4&nbsp;&nbsp;</span>13. col1.txtとcol2.txtをマージ</a></div><div class=\"lev3\"><a href=\"#powershell-の場合-1.4.1\"><span class=\"toc-item-num\">1.4.1&nbsp;&nbsp;</span>powershell の場合</a></div><div class=\"lev2\"><a href=\"#14.-先頭からN行を出力-1.5\"><span class=\"toc-item-num\">1.5&nbsp;&nbsp;</span>14. 先頭からN行を出力</a></div><div class=\"lev3\"><a href=\"#powershellの場合-1.5.1\"><span class=\"toc-item-num\">1.5.1&nbsp;&nbsp;</span>powershellの場合</a></div><div class=\"lev2\"><a href=\"#15.-末尾のN行を出力-1.6\"><span class=\"toc-item-num\">1.6&nbsp;&nbsp;</span>15. 末尾のN行を出力</a></div><div class=\"lev3\"><a href=\"#powershell-の場合-1.6.1\"><span class=\"toc-item-num\">1.6.1&nbsp;&nbsp;</span>powershell の場合</a></div><div class=\"lev2\"><a href=\"#16.-ファイルをN分割する-1.7\"><span class=\"toc-item-num\">1.7&nbsp;&nbsp;</span>16. ファイルをN分割する</a></div><div class=\"lev3\"><a href=\"#powershell-の場合-1.7.1\"><span class=\"toc-item-num\">1.7.1&nbsp;&nbsp;</span>powershell の場合</a></div><div class=\"lev2\"><a href=\"#17.-１列目の文字列の異なり-1.8\"><span class=\"toc-item-num\">1.8&nbsp;&nbsp;</span>17. １列目の文字列の異なり</a></div><div class=\"lev3\"><a href=\"#powershell-の場合-1.8.1\"><span class=\"toc-item-num\">1.8.1&nbsp;&nbsp;</span>powershell の場合</a></div><div class=\"lev2\"><a href=\"#18.-各行を3コラム目の数値の降順にソート-1.9\"><span class=\"toc-item-num\">1.9&nbsp;&nbsp;</span>18. 各行を3コラム目の数値の降順にソート</a></div><div class=\"lev3\"><a href=\"#windows-powershell-1.9.1\"><span class=\"toc-item-num\">1.9.1&nbsp;&nbsp;</span>windows powershell</a></div><div class=\"lev2\"><a href=\"#19.-各行の1コラム目の文字列の出現頻度を求め，出現頻度の高い順に並べる-1.10\"><span class=\"toc-item-num\">1.10&nbsp;&nbsp;</span>19. 各行の1コラム目の文字列の出現頻度を求め，出現頻度の高い順に並べる</a></div><div class=\"lev3\"><a href=\"#powershell-の場合-1.10.1\"><span class=\"toc-item-num\">1.10.1&nbsp;&nbsp;</span>powershell の場合</a></div><div class=\"lev2\"><a href=\"#参考リンク-1.11\"><span class=\"toc-item-num\">1.11&nbsp;&nbsp;</span>参考リンク</a></div>"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "- [言語処理100本ノック 1章（準備運動編）](http://nbviewer.jupyter.org/gist/Cartman0/77c669b28f674179e459869881da7a56)"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "# 2章 Unixコマンドの基礎"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "hightemp.txtは，日本の最高気温の記録を「都道府県」「地点」「℃」「日」のタブ区切り形式で格納したファイルである．\n以下の処理を行うプログラムを作成し，hightemp.txtを入力ファイルとして実行せよ．\nさらに，同様の処理をUNIXコマンドでも実行し，プログラムの実行結果を確認せよ．"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 10. 行数のカウント\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "行数をカウントせよ．確認にはwcコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import sys\nsys.getdefaultencoding()",
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 1,
          "data": {
            "text/plain": "'utf-8'"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "with open('hightemp.txt', 'r', encoding='utf-8') as file:\n    print(len(file.readlines()))",
      "execution_count": 2,
      "outputs": [
        {
          "text": "24\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {
        "collapsed": true
      },
      "cell_type": "markdown",
      "source": "### powershell"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\nGet-Content -Encoding UTF8 .\\hightemp.txt | Measure-Object -Line\n```\nor\n```\ncat -Encoding UTF8 .\\hightemp.txt | Measure-Object -Line\n```\n\nでもいける。\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 11. タブをスペースに置換"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "タブ1文字につきスペース1文字に置換せよ．確認にはsedコマンド，trコマンド，もしくはexpandコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "with open('hightemp.txt', 'r', encoding='utf-8') as file:\n    #print(file.readlines())\n    replace_space = file.read().replace('\\t', ' ')\n    #print(list(replace_space))\n    print(replace_space)",
      "execution_count": 3,
      "outputs": [
        {
          "text": "高知県 江川崎 41 2013-08-12\n埼玉県 熊谷 40.9 2007-08-16\n岐阜県 多治見 40.9 2007-08-16\n山形県 山形 40.8 1933-07-25\n山梨県 甲府 40.7 2013-08-10\n和歌山県 かつらぎ 40.6 1994-08-08\n静岡県 天竜 40.6 1994-08-04\n山梨県 勝沼 40.5 2013-08-10\n埼玉県 越谷 40.4 2007-08-16\n群馬県 館林 40.3 2007-08-16\n群馬県 上里見 40.3 1998-07-04\n愛知県 愛西 40.3 1994-08-05\n千葉県 牛久 40.2 2004-07-20\n静岡県 佐久間 40.2 2001-07-24\n愛媛県 宇和島 40.2 1927-07-22\n山形県 酒田 40.1 1978-08-03\n岐阜県 美濃 40 2007-08-16\n群馬県 前橋 40 2001-07-24\n千葉県 茂原 39.9 2013-08-11\n埼玉県 鳩山 39.9 1997-07-05\n大阪府 豊中 39.9 1994-08-08\n山梨県 大月 39.9 1990-07-19\n山形県 鶴岡 39.9 1978-08-03\n愛知県 名古屋 39.9 1942-08-02\n\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "powershell の場合、タブ文字は `\\`t` を使う。"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\n$file = Get-Content -Encoding UTF8 .\\hightemp.txt\n$file -replace \"`t\", \" \"\n```"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 12. 1列目をcol1.txtに，2列目をcol2.txtに保存"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "各行の1列目だけを抜き出したものをcol1.txtに，2列目だけを抜き出したものをcol2.txtとしてファイルに保存せよ．確認にはcutコマンドを用いよ．"
    },
    {
      "metadata": {
        "code_folding": [],
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "def cut_col_lines(file_name_in:str, cut_idx:int):\n    with open(file_name_in,  'r', encoding='utf-8') as file_in:\n        col_lines = [line.split()[cut_idx] for line in file_in.readlines()]\n        return col_lines\n    \ndef cut_col_out(file_name_in:str, cut_idx:int, file_name_out:str):\n    col_lines = cut_col_lines(file_name_in, cut_idx)\n    with open(file_name_out, 'w', encoding='utf-8') as file_out:\n            for line in col_lines:\n                file_out.write(line + '\\n')\n          \ncut_col_out('hightemp.txt', 0, 'col1.txt')\ncut_col_out('hightemp.txt', 1, 'col2.txt')",
      "execution_count": 4,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell の場合"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\n$file = Get-Content -Encoding UTF8 .\\hightemp.txt\n$f = $file -replace \"`t\", \" \"\n\n$f | %{$_.split(\" \")[0]} > col1.txt\n$f | %{$_.split(\" \")[1]} > col2.txt\n```"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 13. col1.txtとcol2.txtをマージ"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "12で作ったcol1.txtとcol2.txtを結合し，元のファイルの1列目と2列目をタブ区切りで並べたテキストファイルを作成せよ．確認にはpasteコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "def merge(filename1:str, filename2:str, filename_out:str, separate='\\t'):\n    with open(filename1, 'r', encoding='utf-8') as file1, open(filename2, 'r', encoding='utf-8') as file2:\n        with open(filename_out, 'w', encoding='utf-8') as file_out:\n                for l1, l2 in zip(file1.readlines(), file2.readlines()):\n                    file_out.write(l1.split()[0] + separate + l2.split()[0] + '\\n') ",
      "execution_count": 5,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "merge('col1.txt', 'col2.txt', 'merge.txt')",
      "execution_count": 6,
      "outputs": []
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell の場合"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "windows では難しそう"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 14. 先頭からN行を出力\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "自然数Nをコマンドライン引数などの手段で受け取り，入力のうち先頭のN行だけを表示せよ．確認にはheadコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import sys\ndef head(filename, N=1):\n    with open(filename, 'r', encoding='utf-8') as file:\n        for i in range(N):\n            try:\n                sys.stdout.write(file.readline())\n            except:\n                sys.stderr.write('out of range')",
      "execution_count": 7,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "head('col1.txt', 26)",
      "execution_count": 8,
      "outputs": [
        {
          "text": "高知県\n埼玉県\n岐阜県\n山形県\n山梨県\n和歌山県\n静岡県\n山梨県\n埼玉県\n群馬県\n群馬県\n愛知県\n千葉県\n静岡県\n愛媛県\n山形県\n岐阜県\n群馬県\n千葉県\n埼玉県\n大阪府\n山梨県\n山形県\n愛知県\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershellの場合"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\ncat -Encoding UTF8 .\\col1.txt -Head 25\n```"
    },
    {
      "metadata": {
        "collapsed": true
      },
      "cell_type": "markdown",
      "source": "## 15. 末尾のN行を出力\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "自然数Nをコマンドライン引数などの手段で受け取り，入力のうち末尾のN行だけを表示せよ．確認にはtailコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import sys\ndef tail(filename, N=1):\n    with open(filename, 'r', encoding='utf-8') as file:\n        sys.stdout.write(''.join(file.readlines()[-N:]))",
      "execution_count": 9,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "tail('col1.txt', 25)",
      "execution_count": 10,
      "outputs": [
        {
          "text": "高知県\n埼玉県\n岐阜県\n山形県\n山梨県\n和歌山県\n静岡県\n山梨県\n埼玉県\n群馬県\n群馬県\n愛知県\n千葉県\n静岡県\n愛媛県\n山形県\n岐阜県\n群馬県\n千葉県\n埼玉県\n大阪府\n山梨県\n山形県\n愛知県\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell の場合"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\ncat -Encoding UTF8 .\\col1.txt -Tail 2\n```"
    },
    {
      "metadata": {
        "collapsed": true
      },
      "cell_type": "markdown",
      "source": "## 16. ファイルをN分割する"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "自然数Nをコマンドライン引数などの手段で受け取り，入力のファイルを行単位でN分割せよ．同様の処理をsplitコマンドで実現せよ"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import sys\nimport math\n\ndef split(filename:str, N=2, filename_out=False):\n     with open(filename, 'r', encoding='utf-8') as file:\n            lines = file.readlines()\n            idx = 0\n            length = len(lines)\n            ratio = math.ceil(length/N)\n            for i in range(N):\n                sys.stdout.writelines(lines[idx:idx + ratio])\n                print()\n                \n                if filename_out:\n                    with open(filename_out + str(i) + '.txt', 'w', encoding='utf-8') as file_out:\n                        file_out.writelines(lines[idx:idx + ratio])\n                idx = idx + ratio",
      "execution_count": 11,
      "outputs": []
    },
    {
      "metadata": {
        "scrolled": true,
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "split('hightemp.txt', 6)",
      "execution_count": 12,
      "outputs": [
        {
          "text": "高知県\t江川崎\t41\t2013-08-12\n埼玉県\t熊谷\t40.9\t2007-08-16\n岐阜県\t多治見\t40.9\t2007-08-16\n山形県\t山形\t40.8\t1933-07-25\n\n山梨県\t甲府\t40.7\t2013-08-10\n和歌山県\tかつらぎ\t40.6\t1994-08-08\n静岡県\t天竜\t40.6\t1994-08-04\n山梨県\t勝沼\t40.5\t2013-08-10\n\n埼玉県\t越谷\t40.4\t2007-08-16\n群馬県\t館林\t40.3\t2007-08-16\n群馬県\t上里見\t40.3\t1998-07-04\n愛知県\t愛西\t40.3\t1994-08-05\n\n千葉県\t牛久\t40.2\t2004-07-20\n静岡県\t佐久間\t40.2\t2001-07-24\n愛媛県\t宇和島\t40.2\t1927-07-22\n山形県\t酒田\t40.1\t1978-08-03\n\n岐阜県\t美濃\t40\t2007-08-16\n群馬県\t前橋\t40\t2001-07-24\n千葉県\t茂原\t39.9\t2013-08-11\n埼玉県\t鳩山\t39.9\t1997-07-05\n\n大阪府\t豊中\t39.9\t1994-08-08\n山梨県\t大月\t39.9\t1990-07-19\n山形県\t鶴岡\t39.9\t1978-08-03\n愛知県\t名古屋\t39.9\t1942-08-02\n\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell の場合"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "powershell の場合難しそう\n\n```\n$split_num      = 2    # 分割\n\n$count = 0;\n$file_name = \".\\hightemp.txt\"\n$file = cat -Encoding UTF8 $file_name \ncat -Encoding UTF8 $file_name  -ReadCount ($file.count / $split_num) | \n    ForEach-Object { \n        $count ++\n        $cfs = \"{0:D3}\" -f $count;\n        $_ > ($file_name + '_' + $cfs)\n    }\n```"
    },
    {
      "metadata": {
        "collapsed": true
      },
      "cell_type": "markdown",
      "source": "## 17. １列目の文字列の異なり"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "1列目の文字列の種類（異なる文字列の集合）を求めよ．確認にはsort, uniqコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "def cut_col_set(file_name_in:str, col_idx:int):\n    lines = cut_col_lines(file_name_in, col_idx)\n    return set(lines)\n                \ncut_col_set('hightemp.txt', 0)",
      "execution_count": 13,
      "outputs": [
        {
          "output_type": "execute_result",
          "execution_count": 13,
          "data": {
            "text/plain": "{'千葉県',\n '和歌山県',\n '埼玉県',\n '大阪府',\n '山形県',\n '山梨県',\n '岐阜県',\n '愛媛県',\n '愛知県',\n '群馬県',\n '静岡県',\n '高知県'}"
          },
          "metadata": {}
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell の場合\n"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\n$file = Get-Content -Encoding UTF8 .\\hightemp.txt\n$f = $file -replace \"`t\", \" \"\n$f | %{$_.split(\" \")[0]} | sort | Get-Unique\n```"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 18. 各行を3コラム目の数値の降順にソート"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "各行を3コラム目の数値の逆順で整列せよ（注意: 各行の内容は変更せずに並び替えよ）．確認にはsortコマンドを用いよ（この問題はコマンドで実行した時の結果と合わなくてもよい）．"
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import sys\ndef sort(filename:str, col_idx=2):\n    with open(filename, 'r', encoding='utf-8') as file:\n        file_lines = [l.replace('\\t', ' ') for l in file.readlines()]\n        # key値は関数\n        sys.stdout.writelines(sorted(file_lines, key=lambda l: l.split()[col_idx], reverse=True))",
      "execution_count": 14,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "sort('hightemp.txt', 2)",
      "execution_count": 15,
      "outputs": [
        {
          "text": "高知県 江川崎 41 2013-08-12\n埼玉県 熊谷 40.9 2007-08-16\n岐阜県 多治見 40.9 2007-08-16\n山形県 山形 40.8 1933-07-25\n山梨県 甲府 40.7 2013-08-10\n和歌山県 かつらぎ 40.6 1994-08-08\n静岡県 天竜 40.6 1994-08-04\n山梨県 勝沼 40.5 2013-08-10\n埼玉県 越谷 40.4 2007-08-16\n群馬県 館林 40.3 2007-08-16\n群馬県 上里見 40.3 1998-07-04\n愛知県 愛西 40.3 1994-08-05\n千葉県 牛久 40.2 2004-07-20\n静岡県 佐久間 40.2 2001-07-24\n愛媛県 宇和島 40.2 1927-07-22\n山形県 酒田 40.1 1978-08-03\n岐阜県 美濃 40 2007-08-16\n群馬県 前橋 40 2001-07-24\n千葉県 茂原 39.9 2013-08-11\n埼玉県 鳩山 39.9 1997-07-05\n大阪府 豊中 39.9 1994-08-08\n山梨県 大月 39.9 1990-07-19\n山形県 鶴岡 39.9 1978-08-03\n愛知県 名古屋 39.9 1942-08-02\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### windows powershell"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\n# CSV を使うと楽\n\nImport-Csv -Encoding UTF8 -Delimiter \"`t\" -Header \"loc1\", \"loc2\", \"hight\", \"date\"  .\\hightemp.txt  | sort -Property hight -Descending\n```"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "linuxでは、\n```\nsort -r -k 3 hoge.txt\n```\n\nで並び替え"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 19. 各行の1コラム目の文字列の出現頻度を求め，出現頻度の高い順に並べる"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "各行の1列目の文字列の出現頻度を求め，その高い順に並べて表示せよ．確認にはcut, uniq, sortコマンドを用いよ．"
    },
    {
      "metadata": {
        "collapsed": true,
        "trusted": true
      },
      "cell_type": "code",
      "source": "import collections\n\ndef count_dict(filename_in:str, col_idx=0):\n    d = collections.defaultdict(int)\n    with open(filename_in, 'r', encoding='utf-8') as file:\n        f_lines = file.readlines()\n        for line in f_lines:\n            d[line.split()[col_idx]] += 1\n    return d\n\ndef count_sort(filename_in:str, col_idx=0, descending=True):\n    d = count_dict(filename_in, col_idx)\n    print(sorted(d.items(), key=lambda l:l[1], reverse=descending))",
      "execution_count": 16,
      "outputs": []
    },
    {
      "metadata": {
        "collapsed": false,
        "trusted": true
      },
      "cell_type": "code",
      "source": "count_dict('hightemp.txt', 0)\ncount_sort('hightemp.txt', 0)",
      "execution_count": 17,
      "outputs": [
        {
          "text": "[('山形県', 3), ('群馬県', 3), ('山梨県', 3), ('埼玉県', 3), ('静岡県', 2), ('岐阜県', 2), ('愛知県', 2), ('千葉県', 2), ('大阪府', 1), ('和歌山県', 1), ('高知県', 1), ('愛媛県', 1)]\n",
          "output_type": "stream",
          "name": "stdout"
        }
      ]
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "### powershell の場合"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "```\ncat -Encoding UTF8 .\\hightemp.txt | %{$_.split()[0]} | Group-Object | sort -Property count -Descending\n```"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "## 参考リンク"
    },
    {
      "metadata": {},
      "cell_type": "markdown",
      "source": "- [言語処理100本ノック with Python（第2章・前編）](http://qiita.com/gamma1129/items/92b23219a5b9d8333dad)\n- [言語処理100本ノック with Python（第2章・後編）](http://qiita.com/gamma1129/items/6afee2034d6028847e1a)\n- [言語処理100本ノック 第2章 in Python](http://qiita.com/piyo56/items/37cf702c2b5a7f5b5d72)"
    }
  ],
  "metadata": {
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3",
      "language": "python"
    },
    "language_info": {
      "nbconvert_exporter": "python",
      "name": "python",
      "codemirror_mode": {
        "version": 3,
        "name": "ipython"
      },
      "version": "3.5.1",
      "file_extension": ".py",
      "pygments_lexer": "ipython3",
      "mimetype": "text/x-python"
    },
    "toc": {
      "toc_threshold": "6",
      "toc_cell": true,
      "toc_number_sections": true,
      "toc_window_display": false
    },
    "hide_input": false,
    "gist": {
      "id": "",
      "data": {
        "description": "言語処理100本ノック 2章メモ（Unixコマンドの基礎）",
        "public": true
      }
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}