Skip to content

Instantly share code, notes, and snippets.

@yuji96
Created June 2, 2021 08:28
Show Gist options
  • Save yuji96/9f358ad48f8e7d477f1baa6eb3b9542e to your computer and use it in GitHub Desktop.
Save yuji96/9f358ad48f8e7d477f1baa6eb3b9542e to your computer and use it in GitHub Desktop.
インフルエンザ感染者数の csv ファイルを、厚労省報道発表資料(週ごとの pdf ファイル)作成
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# インフルエンザ感染者数の csv ファイルを、週ごとの pdf 内の表から作成\n",
"\n",
"元データ: https://www.mhlw.go.jp/stf/seisakunitsuite/bunya/kenkou_iryou/kenkou/kekkaku-kansenshou01/houdou_00004.html"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## モジュールをインポート"
]
},
{
"cell_type": "code",
"execution_count": 61,
"metadata": {},
"outputs": [],
"source": [
"import datetime as dt\n",
"import pathlib\n",
"import re\n",
"\n",
"from IPython.display import display\n",
"import pandas as pd\n",
"import PyPDF2\n",
"from tabula import read_pdf\n",
"from tqdm import tqdm"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## pdf から表を抽出する関数を定義"
]
},
{
"cell_type": "code",
"execution_count": 169,
"metadata": {},
"outputs": [],
"source": [
"def get_table(path, page):\n",
" out = read_pdf(path, pages=[page], multiple_tables=False, silent=True,\n",
" pandas_options=dict(header=1))\n",
" if not out:\n",
" raise ValueError(f\"{page} ページ目には表がありません.{path} を確認してください.\")\n",
" if \"報告数\" not in (df := out[0]).columns:\n",
" raise ValueError(f\"{page} ページで想定外の表が見つかりました.{path} を確認してください.\")\n",
" \n",
" df_rename = df.rename(columns={'Unnamed: 0': '都道府県'})\n",
" df_clean = df_rename.drop('定点当たり', axis=\"columns\")\n",
" return df_clean"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## pdf のメタデータから統計日を取得(偶然)\n",
"\n",
"- メタデータに `{\"/Title\": \"1★プレスn週HP表紙\"}` という項目があったので, これを使って 2018/1/1 + 7*n として統計週を求めている.\n",
"`\"/CreationDate\"` は統計週より若干遅れている. これは他の年では通用しないやり方.\n",
"- アップロードされ直されたりしているため, ファイル名ではなくファイルの作成日で並び替えなおしている."
]
},
{
"cell_type": "code",
"execution_count": 76,
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"PdfReadWarning: Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]\n"
]
},
{
"data": {
"text/plain": [
"[['pdf/000368224.pdf', datetime.datetime(2018, 9, 3, 0, 0)],\n",
" ['pdf/000358530.pdf', datetime.datetime(2018, 9, 10, 0, 0)],\n",
" ['pdf/000368225.pdf', datetime.datetime(2018, 9, 17, 0, 0)],\n",
" ['pdf/000362661.pdf', datetime.datetime(2018, 9, 24, 0, 0)],\n",
" ['pdf/000368228.pdf', datetime.datetime(2018, 10, 1, 0, 0)]]"
]
},
"execution_count": 76,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def calc_date(info):\n",
" title = info[\"/Title\"]\n",
" week_n = int(re.search(r\"1★プレス([0-9]{1,2})週HP表紙\", title).group(1))\n",
" date = dt.datetime(2018, 1, 1) if week_n >= 36 else dt.datetime(2019, 1, 1)\n",
" return date + dt.timedelta(weeks=week_n - 1)\n",
"\n",
"\n",
"path_and_info = []\n",
"for p in map(str, pathlib.Path('pdf').iterdir()):\n",
" if \"000368228\" in p:\n",
" info = PyPDF2.PdfFileReader(p).documentInfo.copy()\n",
" info[\"/Title\"] = \"1★プレス40週HP表紙\"\n",
" path_and_info.append([p, info])\n",
" else:\n",
" path_and_info.append([p, PyPDF2.PdfFileReader(p).documentInfo])\n",
"path_and_info = sorted(path_and_info, key=lambda pair: pair[-1][\"/CreationDate\"])\n",
"path_and_date = [[path, calc_date(info)] for path, info in path_and_info]\n",
"path_and_date[:5]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## pdf から表の抽出を実行\n",
"週によって目的の表が2ページ目にあったり4ページ目にあったりする"
]
},
{
"cell_type": "code",
"execution_count": 170,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
" 30%|██▉ | 11/37 [00:13<00:33, 1.30s/it]The output file is empty.\n",
"100%|██████████| 37/37 [00:58<00:00, 1.57s/it]\n"
]
}
],
"source": [
"dfs = []\n",
"index = []\n",
"page = 2\n",
"for path, date in tqdm(path_and_date):\n",
" if \"000454254\" in str(path):\n",
" df = get_table(path, page=5)\n",
" else:\n",
" try:\n",
" df = get_table(path, page)\n",
" except ValueError:\n",
" page = 4\n",
" df = get_table(path, page)\n",
" dfs.append(df)\n",
" index.append(date)"
]
},
{
"cell_type": "code",
"execution_count": 171,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"( 都道府県 報告数\n",
" 0 総 数 338\n",
" 1 北海道 -\n",
" 2 青森県 -\n",
" 3 岩手県 -\n",
" 4 宮城県 1,\n",
" datetime.datetime(2018, 9, 3, 0, 0))"
]
},
"execution_count": 171,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dfs[0].head(), index[0]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 整形\n",
"- `1,000` のような桁区切りは、解除してから数値化"
]
},
{
"cell_type": "code",
"execution_count": 190,
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"df_reindex = [df.set_axis(df[\"都道府県\"].str.replace(\" \", \"\"), axis='index').drop(columns=[\"都道府県\"]) for df in dfs]\n",
"df_concat = pd.concat(df_reindex, axis=1).T.set_axis(index, axis='index').drop(columns=[\"昨年同期(総数)\", \"総数\"])\n",
"df_clean = df_clean.apply(lambda series: pd.to_numeric(series.str.replace(\",\", \"\"), errors='coerce')).fillna(0).astype(int)\n",
"df_clean[\"総数\"] = df_clean.sum(axis=\"columns\")"
]
},
{
"cell_type": "code",
"execution_count": 191,
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>都道府県</th>\n",
" <th>北海道</th>\n",
" <th>青森県</th>\n",
" <th>岩手県</th>\n",
" <th>宮城県</th>\n",
" <th>秋田県</th>\n",
" <th>山形県</th>\n",
" <th>福島県</th>\n",
" <th>茨城県</th>\n",
" <th>栃木県</th>\n",
" <th>群馬県</th>\n",
" <th>...</th>\n",
" <th>高知県</th>\n",
" <th>福岡県</th>\n",
" <th>佐賀県</th>\n",
" <th>長崎県</th>\n",
" <th>熊本県</th>\n",
" <th>大分県</th>\n",
" <th>宮崎県</th>\n",
" <th>鹿児島県</th>\n",
" <th>沖縄県</th>\n",
" <th>総数</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2018-09-03</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>11</td>\n",
" <td>10</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>22</td>\n",
" <td>20</td>\n",
" <td>4</td>\n",
" <td>7</td>\n",
" <td>19</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>39</td>\n",
" <td>338</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-09-10</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>7</td>\n",
" <td>4</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>10</td>\n",
" <td>11</td>\n",
" <td>20</td>\n",
" <td>5</td>\n",
" <td>4</td>\n",
" <td>19</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>77</td>\n",
" <td>655</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-09-17</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>4</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>4</td>\n",
" <td>22</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>8</td>\n",
" <td>19</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>14</td>\n",
" <td>4</td>\n",
" <td>6</td>\n",
" <td>11</td>\n",
" <td>147</td>\n",
" <td>668</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-09-24</th>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>6</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>18</td>\n",
" <td>14</td>\n",
" <td>8</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>3</td>\n",
" <td>32</td>\n",
" <td>2</td>\n",
" <td>8</td>\n",
" <td>28</td>\n",
" <td>7</td>\n",
" <td>4</td>\n",
" <td>23</td>\n",
" <td>214</td>\n",
" <td>795</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-01</th>\n",
" <td>4</td>\n",
" <td>5</td>\n",
" <td>7</td>\n",
" <td>23</td>\n",
" <td>1</td>\n",
" <td>5</td>\n",
" <td>18</td>\n",
" <td>9</td>\n",
" <td>7</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>22</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>18</td>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" <td>5</td>\n",
" <td>162</td>\n",
" <td>848</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-08</th>\n",
" <td>17</td>\n",
" <td>6</td>\n",
" <td>2</td>\n",
" <td>29</td>\n",
" <td>0</td>\n",
" <td>11</td>\n",
" <td>6</td>\n",
" <td>11</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>31</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>28</td>\n",
" <td>4</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>79</td>\n",
" <td>617</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-15</th>\n",
" <td>81</td>\n",
" <td>16</td>\n",
" <td>6</td>\n",
" <td>11</td>\n",
" <td>0</td>\n",
" <td>26</td>\n",
" <td>4</td>\n",
" <td>14</td>\n",
" <td>2</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>14</td>\n",
" <td>0</td>\n",
" <td>4</td>\n",
" <td>38</td>\n",
" <td>5</td>\n",
" <td>0</td>\n",
" <td>8</td>\n",
" <td>128</td>\n",
" <td>955</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-22</th>\n",
" <td>96</td>\n",
" <td>10</td>\n",
" <td>10</td>\n",
" <td>6</td>\n",
" <td>0</td>\n",
" <td>12</td>\n",
" <td>3</td>\n",
" <td>17</td>\n",
" <td>0</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>51</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>36</td>\n",
" <td>3</td>\n",
" <td>0</td>\n",
" <td>10</td>\n",
" <td>122</td>\n",
" <td>959</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-10-29</th>\n",
" <td>42</td>\n",
" <td>9</td>\n",
" <td>2</td>\n",
" <td>13</td>\n",
" <td>0</td>\n",
" <td>18</td>\n",
" <td>8</td>\n",
" <td>23</td>\n",
" <td>5</td>\n",
" <td>2</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>60</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>36</td>\n",
" <td>2</td>\n",
" <td>2</td>\n",
" <td>33</td>\n",
" <td>72</td>\n",
" <td>1029</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-11-05</th>\n",
" <td>37</td>\n",
" <td>35</td>\n",
" <td>13</td>\n",
" <td>51</td>\n",
" <td>0</td>\n",
" <td>18</td>\n",
" <td>9</td>\n",
" <td>55</td>\n",
" <td>17</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>1</td>\n",
" <td>95</td>\n",
" <td>3</td>\n",
" <td>3</td>\n",
" <td>37</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>35</td>\n",
" <td>79</td>\n",
" <td>1705</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-11-12</th>\n",
" <td>80</td>\n",
" <td>30</td>\n",
" <td>9</td>\n",
" <td>51</td>\n",
" <td>0</td>\n",
" <td>13</td>\n",
" <td>5</td>\n",
" <td>46</td>\n",
" <td>14</td>\n",
" <td>4</td>\n",
" <td>...</td>\n",
" <td>2</td>\n",
" <td>97</td>\n",
" <td>8</td>\n",
" <td>10</td>\n",
" <td>38</td>\n",
" <td>5</td>\n",
" <td>8</td>\n",
" <td>57</td>\n",
" <td>61</td>\n",
" <td>1885</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-11-19</th>\n",
" <td>141</td>\n",
" <td>77</td>\n",
" <td>18</td>\n",
" <td>36</td>\n",
" <td>3</td>\n",
" <td>1</td>\n",
" <td>16</td>\n",
" <td>80</td>\n",
" <td>9</td>\n",
" <td>9</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>147</td>\n",
" <td>8</td>\n",
" <td>46</td>\n",
" <td>37</td>\n",
" <td>5</td>\n",
" <td>15</td>\n",
" <td>109</td>\n",
" <td>60</td>\n",
" <td>2572</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-11-26</th>\n",
" <td>269</td>\n",
" <td>111</td>\n",
" <td>14</td>\n",
" <td>19</td>\n",
" <td>22</td>\n",
" <td>4</td>\n",
" <td>45</td>\n",
" <td>100</td>\n",
" <td>14</td>\n",
" <td>63</td>\n",
" <td>...</td>\n",
" <td>10</td>\n",
" <td>218</td>\n",
" <td>30</td>\n",
" <td>41</td>\n",
" <td>69</td>\n",
" <td>34</td>\n",
" <td>18</td>\n",
" <td>153</td>\n",
" <td>92</td>\n",
" <td>4599</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-12-03</th>\n",
" <td>882</td>\n",
" <td>108</td>\n",
" <td>27</td>\n",
" <td>62</td>\n",
" <td>61</td>\n",
" <td>33</td>\n",
" <td>59</td>\n",
" <td>145</td>\n",
" <td>26</td>\n",
" <td>81</td>\n",
" <td>...</td>\n",
" <td>34</td>\n",
" <td>418</td>\n",
" <td>29</td>\n",
" <td>85</td>\n",
" <td>161</td>\n",
" <td>107</td>\n",
" <td>32</td>\n",
" <td>254</td>\n",
" <td>75</td>\n",
" <td>8438</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-12-10</th>\n",
" <td>2138</td>\n",
" <td>112</td>\n",
" <td>93</td>\n",
" <td>97</td>\n",
" <td>114</td>\n",
" <td>158</td>\n",
" <td>77</td>\n",
" <td>263</td>\n",
" <td>58</td>\n",
" <td>227</td>\n",
" <td>...</td>\n",
" <td>55</td>\n",
" <td>803</td>\n",
" <td>45</td>\n",
" <td>154</td>\n",
" <td>310</td>\n",
" <td>264</td>\n",
" <td>68</td>\n",
" <td>392</td>\n",
" <td>174</td>\n",
" <td>16589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-12-17</th>\n",
" <td>5059</td>\n",
" <td>132</td>\n",
" <td>171</td>\n",
" <td>278</td>\n",
" <td>202</td>\n",
" <td>253</td>\n",
" <td>169</td>\n",
" <td>675</td>\n",
" <td>177</td>\n",
" <td>462</td>\n",
" <td>...</td>\n",
" <td>236</td>\n",
" <td>1809</td>\n",
" <td>164</td>\n",
" <td>538</td>\n",
" <td>888</td>\n",
" <td>421</td>\n",
" <td>242</td>\n",
" <td>1025</td>\n",
" <td>373</td>\n",
" <td>39589</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2018-12-24</th>\n",
" <td>7152</td>\n",
" <td>220</td>\n",
" <td>310</td>\n",
" <td>327</td>\n",
" <td>415</td>\n",
" <td>273</td>\n",
" <td>365</td>\n",
" <td>1019</td>\n",
" <td>432</td>\n",
" <td>661</td>\n",
" <td>...</td>\n",
" <td>539</td>\n",
" <td>2691</td>\n",
" <td>311</td>\n",
" <td>733</td>\n",
" <td>1162</td>\n",
" <td>563</td>\n",
" <td>345</td>\n",
" <td>904</td>\n",
" <td>556</td>\n",
" <td>54517</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-01-01</th>\n",
" <td>7252</td>\n",
" <td>434</td>\n",
" <td>851</td>\n",
" <td>587</td>\n",
" <td>941</td>\n",
" <td>301</td>\n",
" <td>1123</td>\n",
" <td>2478</td>\n",
" <td>566</td>\n",
" <td>1014</td>\n",
" <td>...</td>\n",
" <td>1445</td>\n",
" <td>5015</td>\n",
" <td>599</td>\n",
" <td>1358</td>\n",
" <td>1757</td>\n",
" <td>934</td>\n",
" <td>787</td>\n",
" <td>1670</td>\n",
" <td>1655</td>\n",
" <td>78116</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-01-08</th>\n",
" <td>8319</td>\n",
" <td>809</td>\n",
" <td>1755</td>\n",
" <td>2779</td>\n",
" <td>938</td>\n",
" <td>928</td>\n",
" <td>2678</td>\n",
" <td>4729</td>\n",
" <td>2242</td>\n",
" <td>3367</td>\n",
" <td>...</td>\n",
" <td>2409</td>\n",
" <td>10271</td>\n",
" <td>1801</td>\n",
" <td>3406</td>\n",
" <td>4703</td>\n",
" <td>2197</td>\n",
" <td>2577</td>\n",
" <td>4815</td>\n",
" <td>2422</td>\n",
" <td>190527</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-01-15</th>\n",
" <td>7384</td>\n",
" <td>1568</td>\n",
" <td>2602</td>\n",
" <td>4462</td>\n",
" <td>1515</td>\n",
" <td>1583</td>\n",
" <td>4537</td>\n",
" <td>8166</td>\n",
" <td>3989</td>\n",
" <td>5180</td>\n",
" <td>...</td>\n",
" <td>3168</td>\n",
" <td>13301</td>\n",
" <td>2081</td>\n",
" <td>3876</td>\n",
" <td>4504</td>\n",
" <td>3511</td>\n",
" <td>3187</td>\n",
" <td>5223</td>\n",
" <td>3169</td>\n",
" <td>267596</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-01-22</th>\n",
" <td>8119</td>\n",
" <td>2430</td>\n",
" <td>3062</td>\n",
" <td>6632</td>\n",
" <td>2530</td>\n",
" <td>2349</td>\n",
" <td>5245</td>\n",
" <td>7426</td>\n",
" <td>5092</td>\n",
" <td>5106</td>\n",
" <td>...</td>\n",
" <td>2449</td>\n",
" <td>12414</td>\n",
" <td>1923</td>\n",
" <td>3900</td>\n",
" <td>3831</td>\n",
" <td>3796</td>\n",
" <td>3210</td>\n",
" <td>4725</td>\n",
" <td>2847</td>\n",
" <td>283388</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-01-29</th>\n",
" <td>8023</td>\n",
" <td>2041</td>\n",
" <td>2577</td>\n",
" <td>5583</td>\n",
" <td>1938</td>\n",
" <td>1960</td>\n",
" <td>4265</td>\n",
" <td>5737</td>\n",
" <td>3899</td>\n",
" <td>3936</td>\n",
" <td>...</td>\n",
" <td>1805</td>\n",
" <td>8475</td>\n",
" <td>1294</td>\n",
" <td>2742</td>\n",
" <td>2372</td>\n",
" <td>3024</td>\n",
" <td>2760</td>\n",
" <td>3633</td>\n",
" <td>2734</td>\n",
" <td>214592</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-02-05</th>\n",
" <td>5663</td>\n",
" <td>1544</td>\n",
" <td>1798</td>\n",
" <td>3133</td>\n",
" <td>1310</td>\n",
" <td>1417</td>\n",
" <td>2645</td>\n",
" <td>3158</td>\n",
" <td>2306</td>\n",
" <td>2399</td>\n",
" <td>...</td>\n",
" <td>1066</td>\n",
" <td>5406</td>\n",
" <td>691</td>\n",
" <td>1768</td>\n",
" <td>1588</td>\n",
" <td>2037</td>\n",
" <td>1790</td>\n",
" <td>2349</td>\n",
" <td>2059</td>\n",
" <td>129989</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-02-12</th>\n",
" <td>3471</td>\n",
" <td>962</td>\n",
" <td>1090</td>\n",
" <td>1472</td>\n",
" <td>765</td>\n",
" <td>788</td>\n",
" <td>1777</td>\n",
" <td>1469</td>\n",
" <td>1039</td>\n",
" <td>1221</td>\n",
" <td>...</td>\n",
" <td>500</td>\n",
" <td>2717</td>\n",
" <td>356</td>\n",
" <td>829</td>\n",
" <td>709</td>\n",
" <td>1118</td>\n",
" <td>1007</td>\n",
" <td>965</td>\n",
" <td>1348</td>\n",
" <td>61992</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-02-19</th>\n",
" <td>2322</td>\n",
" <td>786</td>\n",
" <td>808</td>\n",
" <td>1292</td>\n",
" <td>678</td>\n",
" <td>783</td>\n",
" <td>1374</td>\n",
" <td>932</td>\n",
" <td>741</td>\n",
" <td>819</td>\n",
" <td>...</td>\n",
" <td>383</td>\n",
" <td>1837</td>\n",
" <td>271</td>\n",
" <td>901</td>\n",
" <td>645</td>\n",
" <td>708</td>\n",
" <td>816</td>\n",
" <td>590</td>\n",
" <td>768</td>\n",
" <td>44601</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-02-26</th>\n",
" <td>1847</td>\n",
" <td>678</td>\n",
" <td>605</td>\n",
" <td>858</td>\n",
" <td>472</td>\n",
" <td>558</td>\n",
" <td>1032</td>\n",
" <td>530</td>\n",
" <td>408</td>\n",
" <td>660</td>\n",
" <td>...</td>\n",
" <td>187</td>\n",
" <td>1149</td>\n",
" <td>160</td>\n",
" <td>535</td>\n",
" <td>342</td>\n",
" <td>491</td>\n",
" <td>516</td>\n",
" <td>419</td>\n",
" <td>564</td>\n",
" <td>29384</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-03-05</th>\n",
" <td>1359</td>\n",
" <td>571</td>\n",
" <td>442</td>\n",
" <td>734</td>\n",
" <td>491</td>\n",
" <td>425</td>\n",
" <td>730</td>\n",
" <td>299</td>\n",
" <td>307</td>\n",
" <td>463</td>\n",
" <td>...</td>\n",
" <td>86</td>\n",
" <td>775</td>\n",
" <td>95</td>\n",
" <td>418</td>\n",
" <td>244</td>\n",
" <td>336</td>\n",
" <td>326</td>\n",
" <td>248</td>\n",
" <td>355</td>\n",
" <td>20454</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-03-12</th>\n",
" <td>1022</td>\n",
" <td>726</td>\n",
" <td>350</td>\n",
" <td>558</td>\n",
" <td>443</td>\n",
" <td>300</td>\n",
" <td>489</td>\n",
" <td>143</td>\n",
" <td>170</td>\n",
" <td>291</td>\n",
" <td>...</td>\n",
" <td>51</td>\n",
" <td>608</td>\n",
" <td>140</td>\n",
" <td>346</td>\n",
" <td>198</td>\n",
" <td>248</td>\n",
" <td>264</td>\n",
" <td>178</td>\n",
" <td>337</td>\n",
" <td>14488</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-03-19</th>\n",
" <td>776</td>\n",
" <td>618</td>\n",
" <td>277</td>\n",
" <td>513</td>\n",
" <td>521</td>\n",
" <td>218</td>\n",
" <td>357</td>\n",
" <td>127</td>\n",
" <td>178</td>\n",
" <td>160</td>\n",
" <td>...</td>\n",
" <td>20</td>\n",
" <td>514</td>\n",
" <td>140</td>\n",
" <td>339</td>\n",
" <td>170</td>\n",
" <td>212</td>\n",
" <td>274</td>\n",
" <td>134</td>\n",
" <td>302</td>\n",
" <td>12320</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-03-26</th>\n",
" <td>478</td>\n",
" <td>346</td>\n",
" <td>149</td>\n",
" <td>323</td>\n",
" <td>351</td>\n",
" <td>192</td>\n",
" <td>265</td>\n",
" <td>109</td>\n",
" <td>98</td>\n",
" <td>178</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>309</td>\n",
" <td>96</td>\n",
" <td>230</td>\n",
" <td>156</td>\n",
" <td>130</td>\n",
" <td>179</td>\n",
" <td>125</td>\n",
" <td>277</td>\n",
" <td>8567</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-04-02</th>\n",
" <td>364</td>\n",
" <td>326</td>\n",
" <td>138</td>\n",
" <td>230</td>\n",
" <td>432</td>\n",
" <td>149</td>\n",
" <td>279</td>\n",
" <td>106</td>\n",
" <td>73</td>\n",
" <td>121</td>\n",
" <td>...</td>\n",
" <td>25</td>\n",
" <td>180</td>\n",
" <td>62</td>\n",
" <td>170</td>\n",
" <td>90</td>\n",
" <td>85</td>\n",
" <td>118</td>\n",
" <td>78</td>\n",
" <td>204</td>\n",
" <td>7227</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-04-09</th>\n",
" <td>325</td>\n",
" <td>277</td>\n",
" <td>175</td>\n",
" <td>290</td>\n",
" <td>383</td>\n",
" <td>151</td>\n",
" <td>290</td>\n",
" <td>148</td>\n",
" <td>139</td>\n",
" <td>126</td>\n",
" <td>...</td>\n",
" <td>20</td>\n",
" <td>320</td>\n",
" <td>86</td>\n",
" <td>177</td>\n",
" <td>103</td>\n",
" <td>83</td>\n",
" <td>107</td>\n",
" <td>101</td>\n",
" <td>231</td>\n",
" <td>8282</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-04-16</th>\n",
" <td>505</td>\n",
" <td>217</td>\n",
" <td>219</td>\n",
" <td>315</td>\n",
" <td>339</td>\n",
" <td>253</td>\n",
" <td>452</td>\n",
" <td>240</td>\n",
" <td>130</td>\n",
" <td>314</td>\n",
" <td>...</td>\n",
" <td>23</td>\n",
" <td>246</td>\n",
" <td>106</td>\n",
" <td>184</td>\n",
" <td>91</td>\n",
" <td>100</td>\n",
" <td>97</td>\n",
" <td>148</td>\n",
" <td>219</td>\n",
" <td>12613</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-04-23</th>\n",
" <td>397</td>\n",
" <td>198</td>\n",
" <td>281</td>\n",
" <td>195</td>\n",
" <td>336</td>\n",
" <td>133</td>\n",
" <td>402</td>\n",
" <td>97</td>\n",
" <td>105</td>\n",
" <td>141</td>\n",
" <td>...</td>\n",
" <td>19</td>\n",
" <td>183</td>\n",
" <td>60</td>\n",
" <td>123</td>\n",
" <td>69</td>\n",
" <td>64</td>\n",
" <td>71</td>\n",
" <td>161</td>\n",
" <td>260</td>\n",
" <td>10601</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-04-30</th>\n",
" <td>349</td>\n",
" <td>114</td>\n",
" <td>180</td>\n",
" <td>69</td>\n",
" <td>249</td>\n",
" <td>38</td>\n",
" <td>256</td>\n",
" <td>107</td>\n",
" <td>51</td>\n",
" <td>33</td>\n",
" <td>...</td>\n",
" <td>21</td>\n",
" <td>119</td>\n",
" <td>16</td>\n",
" <td>63</td>\n",
" <td>41</td>\n",
" <td>73</td>\n",
" <td>27</td>\n",
" <td>38</td>\n",
" <td>407</td>\n",
" <td>4703</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-05-07</th>\n",
" <td>145</td>\n",
" <td>71</td>\n",
" <td>123</td>\n",
" <td>89</td>\n",
" <td>97</td>\n",
" <td>29</td>\n",
" <td>143</td>\n",
" <td>73</td>\n",
" <td>31</td>\n",
" <td>43</td>\n",
" <td>...</td>\n",
" <td>23</td>\n",
" <td>58</td>\n",
" <td>30</td>\n",
" <td>23</td>\n",
" <td>24</td>\n",
" <td>34</td>\n",
" <td>15</td>\n",
" <td>39</td>\n",
" <td>267</td>\n",
" <td>3636</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2019-05-14</th>\n",
" <td>221</td>\n",
" <td>76</td>\n",
" <td>115</td>\n",
" <td>113</td>\n",
" <td>59</td>\n",
" <td>76</td>\n",
" <td>232</td>\n",
" <td>86</td>\n",
" <td>44</td>\n",
" <td>49</td>\n",
" <td>...</td>\n",
" <td>11</td>\n",
" <td>136</td>\n",
" <td>37</td>\n",
" <td>47</td>\n",
" <td>28</td>\n",
" <td>28</td>\n",
" <td>24</td>\n",
" <td>48</td>\n",
" <td>258</td>\n",
" <td>4559</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>37 rows × 48 columns</p>\n",
"</div>"
],
"text/plain": [
"都道府県 北海道 青森県 岩手県 宮城県 秋田県 山形県 福島県 茨城県 栃木県 群馬県 ... \\\n",
"2018-09-03 0 0 0 1 0 1 5 11 10 0 ... \n",
"2018-09-10 1 0 0 2 1 4 7 4 3 2 ... \n",
"2018-09-17 1 0 1 4 0 3 4 22 3 0 ... \n",
"2018-09-24 2 0 10 6 2 3 18 14 8 0 ... \n",
"2018-10-01 4 5 7 23 1 5 18 9 7 1 ... \n",
"2018-10-08 17 6 2 29 0 11 6 11 2 1 ... \n",
"2018-10-15 81 16 6 11 0 26 4 14 2 0 ... \n",
"2018-10-22 96 10 10 6 0 12 3 17 0 2 ... \n",
"2018-10-29 42 9 2 13 0 18 8 23 5 2 ... \n",
"2018-11-05 37 35 13 51 0 18 9 55 17 0 ... \n",
"2018-11-12 80 30 9 51 0 13 5 46 14 4 ... \n",
"2018-11-19 141 77 18 36 3 1 16 80 9 9 ... \n",
"2018-11-26 269 111 14 19 22 4 45 100 14 63 ... \n",
"2018-12-03 882 108 27 62 61 33 59 145 26 81 ... \n",
"2018-12-10 2138 112 93 97 114 158 77 263 58 227 ... \n",
"2018-12-17 5059 132 171 278 202 253 169 675 177 462 ... \n",
"2018-12-24 7152 220 310 327 415 273 365 1019 432 661 ... \n",
"2019-01-01 7252 434 851 587 941 301 1123 2478 566 1014 ... \n",
"2019-01-08 8319 809 1755 2779 938 928 2678 4729 2242 3367 ... \n",
"2019-01-15 7384 1568 2602 4462 1515 1583 4537 8166 3989 5180 ... \n",
"2019-01-22 8119 2430 3062 6632 2530 2349 5245 7426 5092 5106 ... \n",
"2019-01-29 8023 2041 2577 5583 1938 1960 4265 5737 3899 3936 ... \n",
"2019-02-05 5663 1544 1798 3133 1310 1417 2645 3158 2306 2399 ... \n",
"2019-02-12 3471 962 1090 1472 765 788 1777 1469 1039 1221 ... \n",
"2019-02-19 2322 786 808 1292 678 783 1374 932 741 819 ... \n",
"2019-02-26 1847 678 605 858 472 558 1032 530 408 660 ... \n",
"2019-03-05 1359 571 442 734 491 425 730 299 307 463 ... \n",
"2019-03-12 1022 726 350 558 443 300 489 143 170 291 ... \n",
"2019-03-19 776 618 277 513 521 218 357 127 178 160 ... \n",
"2019-03-26 478 346 149 323 351 192 265 109 98 178 ... \n",
"2019-04-02 364 326 138 230 432 149 279 106 73 121 ... \n",
"2019-04-09 325 277 175 290 383 151 290 148 139 126 ... \n",
"2019-04-16 505 217 219 315 339 253 452 240 130 314 ... \n",
"2019-04-23 397 198 281 195 336 133 402 97 105 141 ... \n",
"2019-04-30 349 114 180 69 249 38 256 107 51 33 ... \n",
"2019-05-07 145 71 123 89 97 29 143 73 31 43 ... \n",
"2019-05-14 221 76 115 113 59 76 232 86 44 49 ... \n",
"\n",
"都道府県 高知県 福岡県 佐賀県 長崎県 熊本県 大分県 宮崎県 鹿児島県 沖縄県 総数 \n",
"2018-09-03 8 22 20 4 7 19 0 11 39 338 \n",
"2018-09-10 10 11 20 5 4 19 3 3 77 655 \n",
"2018-09-17 8 19 0 3 14 4 6 11 147 668 \n",
"2018-09-24 3 32 2 8 28 7 4 23 214 795 \n",
"2018-10-01 1 22 0 4 18 11 0 5 162 848 \n",
"2018-10-08 1 31 2 0 28 4 1 1 79 617 \n",
"2018-10-15 0 14 0 4 38 5 0 8 128 955 \n",
"2018-10-22 0 51 0 1 36 3 0 10 122 959 \n",
"2018-10-29 0 60 1 2 36 2 2 33 72 1029 \n",
"2018-11-05 1 95 3 3 37 3 1 35 79 1705 \n",
"2018-11-12 2 97 8 10 38 5 8 57 61 1885 \n",
"2018-11-19 0 147 8 46 37 5 15 109 60 2572 \n",
"2018-11-26 10 218 30 41 69 34 18 153 92 4599 \n",
"2018-12-03 34 418 29 85 161 107 32 254 75 8438 \n",
"2018-12-10 55 803 45 154 310 264 68 392 174 16589 \n",
"2018-12-17 236 1809 164 538 888 421 242 1025 373 39589 \n",
"2018-12-24 539 2691 311 733 1162 563 345 904 556 54517 \n",
"2019-01-01 1445 5015 599 1358 1757 934 787 1670 1655 78116 \n",
"2019-01-08 2409 10271 1801 3406 4703 2197 2577 4815 2422 190527 \n",
"2019-01-15 3168 13301 2081 3876 4504 3511 3187 5223 3169 267596 \n",
"2019-01-22 2449 12414 1923 3900 3831 3796 3210 4725 2847 283388 \n",
"2019-01-29 1805 8475 1294 2742 2372 3024 2760 3633 2734 214592 \n",
"2019-02-05 1066 5406 691 1768 1588 2037 1790 2349 2059 129989 \n",
"2019-02-12 500 2717 356 829 709 1118 1007 965 1348 61992 \n",
"2019-02-19 383 1837 271 901 645 708 816 590 768 44601 \n",
"2019-02-26 187 1149 160 535 342 491 516 419 564 29384 \n",
"2019-03-05 86 775 95 418 244 336 326 248 355 20454 \n",
"2019-03-12 51 608 140 346 198 248 264 178 337 14488 \n",
"2019-03-19 20 514 140 339 170 212 274 134 302 12320 \n",
"2019-03-26 11 309 96 230 156 130 179 125 277 8567 \n",
"2019-04-02 25 180 62 170 90 85 118 78 204 7227 \n",
"2019-04-09 20 320 86 177 103 83 107 101 231 8282 \n",
"2019-04-16 23 246 106 184 91 100 97 148 219 12613 \n",
"2019-04-23 19 183 60 123 69 64 71 161 260 10601 \n",
"2019-04-30 21 119 16 63 41 73 27 38 407 4703 \n",
"2019-05-07 23 58 30 23 24 34 15 39 267 3636 \n",
"2019-05-14 11 136 37 47 28 28 24 48 258 4559 \n",
"\n",
"[37 rows x 48 columns]"
]
},
"execution_count": 191,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_clean"
]
},
{
"cell_type": "code",
"execution_count": 198,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<AxesSubplot:>"
]
},
"execution_count": 198,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df_clean[\"総数\"].plot()"
]
},
{
"cell_type": "code",
"execution_count": 199,
"metadata": {},
"outputs": [],
"source": [
"df_clean.to_csv(\"2018-2019.csv\")"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "osken",
"language": "python",
"name": "osken"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.0"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment