Skip to content

Instantly share code, notes, and snippets.

@wutali
Last active September 11, 2021 08:48
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save wutali/a5e6b34be1d9357e0b82a064567640b9 to your computer and use it in GitHub Desktop.
Save wutali/a5e6b34be1d9357e0b82a064567640b9 to your computer and use it in GitHub Desktop.
basic-data-analysis.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "basic-data-analysis.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyNVjGFEUhagHgOnEbRrlEbQ",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/wutali/a5e6b34be1d9357e0b82a064567640b9/basic-data-analysis.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_YKOMaiJVQaf"
},
"source": [
"## 高専向けデータ分析基礎\n",
"\n",
"この記事はフラー [Advent Calendar 2020](https://adventar.org/calendars/5034) の5日目の記事です。4日目は [@furusax](https://twitter.com/furusax) さんで「[CloudFormation でドリフトを検出してしまった Aurora MySQL クラスターを再インポートした話](https://furusax0621.hatenablog.com/entry/2020/12/04/000000)」でした。\n",
"\n",
"---\n",
"\n",
"フラーでは高専向けにデータ分析の授業を行っております。\n",
"\n",
"* 苫小牧高専\n",
"* 函館高専\n",
"* 長岡高専 (予定)\n",
"\n",
"データ分析と言っても、とても簡素なものです。全部で3部構成の授業となっており、授業が終わった後には、Pythonの基礎構文と簡単なクローラーの作り方、データの可視化の仕方を学ぶことが出来ます。\n",
"\n",
"1. Jupyter Notebookの使い方とPythonの基礎構文\n",
"2. ライブラリの使い方とCSSセレクタの書き方\n",
"3. クローラーの作成とデータの可視化\n",
"\n",
"それぞれの内容をまとめていこうと思います。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "44H-RL9KWHTl"
},
"source": [
"## Jyupyter Notebookの使い方とPythonの基礎構文\n",
"\n",
"ここではJupyter Notebookのセルの概念とPythonのデータ型や条件分岐、繰り返し文などを学びます。簡単にC言語のコードを紹介しながら、授業を進めています。\n",
"\n",
"初めに、自己紹介と今日やることをMarkdownで書いてもらっています。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "g9--8KO-j-LF"
},
"source": [
"### 自己紹介\n",
"\n",
"* 名前: 藤原敬弘(ふじわらたかひろ)\n",
"* 仕事: フラー株式会社CTO\n",
"* 趣味: クラフトビール\n",
"* Twitter: @wutali"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "rbDm81FEk0yJ"
},
"source": [
"### 今日やること\n",
"\n",
"* Pythonのデータ型\n",
"* Pythonの条件分岐と繰り返し\n",
"* Pythonの関数の書き方\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "4oSPtMM5Xy2x"
},
"source": [
"次にPythonのデータ型や関数の書き方、繰り返しなどを教えていきます。途中、 *TODO* で書いた部分は課題となっており、高専生同士で相談しながら解いてもらいます。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "jFwgxBxLXEo6"
},
"source": [
"### Pythonのデータ型"
]
},
{
"cell_type": "code",
"metadata": {
"id": "31rPu248mHTL",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "1883cedb-03af-4918-f337-522ae92ccdb4"
},
"source": [
"# 整数型 (int)\n",
"\n",
"a = 1\n",
"a"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"1"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "nxCTGtmnmuE4",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "94890795-91b0-4101-f241-90e3dc3515e5"
},
"source": [
"# 浮動小数点 (float)\n",
"a = 2.3\n",
"a"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"2.3"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "R4rf4WeGnDY0",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 37
},
"outputId": "a095e4d7-0cc7-4a2c-a126-1f21e646d23e"
},
"source": [
"# 文字列 (str)\n",
"a = \"Hello Hakodate!\"\n",
"a"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'Hello Hakodate!'"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "8AKSz_9_nhqZ",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "b5cfa168-ff72-4ceb-8396-8dc6261b1418"
},
"source": [
"# 配列 (list)\n",
"a = [1, 2, 3, 4, 5]\n",
"a"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[1, 2, 3, 4, 5]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "0zGCDne2n14K",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "d9a7c314-34cf-4932-c013-c9eee041ff1b"
},
"source": [
"# マップ (dict)\n",
"a = {\"key1\": \"value1\", \"key2\": \"value2\", \"key3\": \"value3\"}\n",
"a"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"{'key1': 'value1', 'key2': 'value2', 'key3': 'value3'}"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_lT8NfakpBWT"
},
"source": [
"## 条件分岐と繰り返し"
]
},
{
"cell_type": "code",
"metadata": {
"id": "4JV58Nd3pJ-y",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "279c85a0-359b-48df-8f2e-12327ef84924"
},
"source": [
"# 条件分岐\n",
"# if (true) {\n",
"# printf(\"hello world\"); \n",
"# }\n",
"\n",
"# TODO: 変数aを用意して、変数aが10以上の場合、\n",
"# hello worldが出力される条件分岐を書く\n",
"a = 11\n",
"if a >= 10:\n",
" print(\"hello world\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"hello world\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "qzX4hQzkqrdI",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "78d53ebe-ab87-4037-a6fe-40731eacacc2"
},
"source": [
"# 繰り返し\n",
"# for (int i = 0; i < 10; i++) {\n",
"# printf(\"%d\", i);\n",
"# }\n",
"\n",
"# TODO1: rangeをlist型のデータに変えて、繰り返しを実行してみる\n",
"# TODO2: rangeをstr型のデータに変えて、繰り返しを実行してみる\n",
"a = [1, 2, 3]\n",
"b = \"Hello World!\" \n",
"for i in b:\n",
" print(i)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"H\n",
"e\n",
"l\n",
"l\n",
"o\n",
" \n",
"W\n",
"o\n",
"r\n",
"l\n",
"d\n",
"!\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "VutD34HVs3Xa",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "3697cfd5-29d2-458a-c94e-fcbf1996a761"
},
"source": [
"for i in range(3):\n",
" print(i)\n",
"\n",
"for i in [0, 1, 2]:\n",
" print(i)"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"0\n",
"1\n",
"2\n",
"0\n",
"1\n",
"2\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "1ihx56fjtI6O"
},
"source": [
"## 関数の書き方"
]
},
{
"cell_type": "code",
"metadata": {
"id": "21LVtuSktLkW",
"colab": {
"base_uri": "https://localhost:8080/"
},
"outputId": "8ba1ef7d-0fad-4880-882a-6f7a66a5972b"
},
"source": [
"# int add(int a, int b) {\n",
"# return a + b;\n",
"# }\n",
"\n",
"# TODO1: 掛け算するmulti関数を定義して、実行してみる\n",
"# TODO2: list型のデータを受け取って、その合計値を返すsum関数を定義して、\n",
"# 実行してみる\n",
"def add(a, b):\n",
" return a + b\n",
"\n",
"add(102, 2)"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"104"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "DIZ3dYNBYL7W"
},
"source": [
"ここまでで1日目 (90分) の授業が終了です。基礎はそこそこにして、実際に動くプログラムを書いていきます。また、授業はライブコーディング形式で行っており、自分が書いたコードを書き写してもらっています。\n",
"\n",
"その時の気分で課題を変えたりしているので、上記は例です。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "C3RUa7U7XoVY"
},
"source": [
"## ライブラリの使い方とCSSセレクタの書き方\n",
"\n",
"requestsライブラリの使い方、Beautiful Soupライブラリの使い方を教えます。フラーのニュースサイトのクローリングとデータ可視化を行ってもらいます。"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "0vvexCFkDtHH"
},
"source": [
"## 今日やること\n",
"\n",
"* フラーのニュースサイトをクローリングしてみる\n",
"* 酒造免許をクローリングしてみる\n",
"* グラフ化してみる"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "BTM0Nt7ZaLY0"
},
"source": [
"### フラーのニュースサイトを取得する"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "BeVEm6MfD6PM",
"outputId": "78223936-e0cf-40d8-9fce-803c53857619"
},
"source": [
"import requests\n",
"\n",
"res = requests.get(\"https://fuller-inc.com/news/\")\n",
"res"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<Response [200]>"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "CKALW7l9aOfk"
},
"source": [
"### HTMLをパースする"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "cgGd5HveESvl",
"outputId": "065c9644-dac0-4d5a-ab6e-be5f453b4ec9"
},
"source": [
"from bs4 import BeautifulSoup\n",
"\n",
"soup = BeautifulSoup(res.content)\n",
"soup"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<!DOCTYPE html>\n",
"<html lang=\"ja\">\n",
"<head>\n",
"<script>(function (w, d, s, l, i) {\n",
" w[l] = w[l] || [];\n",
" w[l].push({\n",
" 'gtm.start': new Date().getTime(), event: 'gtm.js'\n",
" });\n",
" var f = d.getElementsByTagName(s)[0],\n",
" j = d.createElement(s), dl = l != 'dataLayer' ? '&l=' + l : '';\n",
" j.async = true;\n",
" j.src = 'https://www.googletagmanager.com/gtm.js?id=' + i + dl + '&gtm_auth=_ZgIb2eTMTwaqb2s0D_nZw&gtm_preview=env-15&gtm_cookies_win=x';\n",
" f.parentNode.insertBefore(j, f);\n",
" })(window, document, 'script', 'dataLayer', 'GTM-NCX3698');</script>\n",
"<script src=\"/modernizr-custom.js\"></script>\n",
"<title>ニュース - フラー株式会社</title>\n",
"<meta content=\"フラーは『世界一、ヒトを惹きつける会社を創る。』というユメに挑戦し続け、アプリとデータをテーマに事業を展開するIT企業です。柏の葉キャンパスに本社を構え、新潟・韓国・東京にも拠点があります。\" name=\"description\"/>\n",
"<meta charset=\"utf-8\"/>\n",
"<meta content=\"IE=edge\" http-equiv=\"X-UA-Compatible\"/>\n",
"<meta content=\"width=device-width, initial-scale=1, viewport-fit=cover\" name=\"viewport\"/>\n",
"<meta content=\"https://fuller-inc.com/news/\" property=\"og:url\"/>\n",
"<meta content=\"ニュース - フラー株式会社\" property=\"og:title\"/>\n",
"<meta content=\"website\" property=\"og:type\"/>\n",
"<meta content=\"フラー株式会社\" property=\"og:site_name\"/>\n",
"<meta content=\"ja_JP\" property=\"og:locale\"/>\n",
"<meta content=\"フラーは『世界一、ヒトを惹きつける会社を創る。』というユメに挑戦し続け、アプリとデータをテーマに事業を展開するIT企業です。柏の葉キャンパスに本社を構え、新潟・韓国・東京にも拠点があります。\" property=\"og:description\"/>\n",
"<meta content=\"ニュース - フラー株式会社\" name=\"twitter:title\"/>\n",
"<meta content=\"フラーは『世界一、ヒトを惹きつける会社を創る。』というユメに挑戦し続け、アプリとデータをテーマに事業を展開するIT企業です。柏の葉キャンパスに本社を構え、新潟・韓国・東京にも拠点があります。\" name=\"twitter:description\"/>\n",
"<meta content=\"summary_large_image\" name=\"twitter:card\"/>\n",
"<meta content=\"@fuller_inc\" name=\"twitter:site\"/>\n",
"<meta content=\"https://fuller-inc.com/news/ogp.jpeg\" property=\"og:image\"/>\n",
"<meta content=\"1710\" property=\"og:image:width\"/>\n",
"<meta content=\"900\" property=\"og:image:height\"/>\n",
"<meta content=\"https://fuller-inc.com/news/ogp.jpeg\" name=\"twitter:image\"/>\n",
"<meta content=\"#ffffff\" name=\"theme-color\"/>\n",
"<link href=\"/apple-touch-icon.png\" rel=\"apple-touch-icon\" sizes=\"180x180\"/>\n",
"<link href=\"/favicon-32x32.png\" rel=\"icon\" sizes=\"32x32\" type=\"image/png\"/>\n",
"<link href=\"/favicon-16x16.png\" rel=\"icon\" sizes=\"16x16\" type=\"image/png\"/>\n",
"<link href=\"/manifest.json\" rel=\"manifest\"/>\n",
"<link color=\"#2a5caa\" href=\"/safari-pinned-tab.svg\" rel=\"mask-icon\"/>\n",
"<script type=\"application/ld+json\">\n",
" {\n",
" \"@context\": \"http://schema.org\",\n",
" \"@type\": \"Organization\",\n",
" \"url\": \"https://fuller-inc.com\",\n",
" \"name\": \"Fuller, Inc.\",\n",
" \"logo\": \"https://fuller-inc.com/apple-touch-icon.png\",\n",
" \"sameAs\": [\n",
" \"https://www.facebook.com/fuller.official\",\n",
" \"https://twitter.com/fuller_inc\"\n",
" ]\n",
" }\n",
" </script>\n",
"<link as=\"style\" href=\"/main.css?1605577977\" rel=\"preload\"/>\n",
"<link as=\"script\" href=\"/main.js?1605577977\" rel=\"preload\"/>\n",
"<link href=\"/main.css?1605577977\" id=\"main-style\" rel=\"stylesheet\"/>\n",
"</head>\n",
"<body>\n",
"<noscript>\n",
"<iframe height=\"0\" src=\"https://www.googletagmanager.com/ns.html?id=GTM-NCX3698&amp;gtm_auth=_ZgIb2eTMTwaqb2s0D_nZw&amp;gtm_preview=env-15&amp;gtm_cookies_win=x\" style=\"display:none;visibility:hidden\" width=\"0\"></iframe>\n",
"</noscript>\n",
"<section class=\"hero is-halfheight center-cover\" style=\"background-image: linear-gradient(to top, rgba(0, 0, 0, 0), rgba(0, 0, 0, 0.62)), url(/news/header.jpeg);\">\n",
"<div class=\"hero-head\">\n",
"<nav class=\"navbar has-change-on-scroll-white-to-black is-fixed-top\">\n",
"<div class=\"navbar-brand\">\n",
"<a class=\"navbar-item fuller-logo\" href=\"/\">\n",
"<div style=\"width: 130px; height: 25px;\">\n",
"<svg data-name=\"レイヤー 1\" id=\"レイヤー_1\" viewbox=\"0 0 155.57 30\" xmlns=\"http://www.w3.org/2000/svg\"><defs><style>.cls-1{fill:#fff;}</style></defs><title>アートボード 2</title><path class=\"cls-1\" d=\"M76.87,17c0,1.38-.43,3.89-3.25,3.89s-3.31-2.5-3.31-3.87V4.76h-4a.84.84,0,0,0-.83.85v12c0,2.33.88,7.58,8.05,7.58s8.21-6,8.21-7.52V4.76H76.87Z\"></path><path class=\"cls-1\" d=\"M120.8,23.21v1a.84.84,0,0,0,.84.83h12.24V20.86H126.5a.84.84,0,0,1-.83-.84V16.59h6.54a.83.83,0,0,0,.83-.83V12.39h-6.63a.84.84,0,0,1-.74-.84V10.21h0V9h8.21V4.76H120.8Z\"></path><path class=\"cls-1\" d=\"M91.45,4.76H86.59V24.22a.84.84,0,0,0,.84.83h11.4V20.86H91.45Z\"></path><path class=\"cls-1\" d=\"M48.18,4.76a.85.85,0,0,0-.84.84V25.05h4.87V18.51a.84.84,0,0,1,.83-.84h5.71a.84.84,0,0,0,.84-.83V13.49H52.21V9.79A.84.84,0,0,1,53,9h7.38V4.76Z\"></path><path class=\"cls-1\" d=\"M108.56,4.76h-4.87V24.22a.84.84,0,0,0,.84.83h11.41V20.86h-7.38Z\"></path><path class=\"cls-1\" d=\"M149.87,16.65a6.46,6.46,0,0,0-3.49-11.89h-7.63V24.22a.83.83,0,0,0,.84.83h4V17.67h1.65l4.58,7.38h5.73Zm-4.25-3.17h-2V9h2a2.27,2.27,0,1,1,0,4.53Z\"></path><path class=\"cls-1\" d=\"M25.09,12.88a1.65,1.65,0,0,1,1.76.17l3.22,2.76a1.08,1.08,0,0,0,1.53-.12L34,12.85a1.09,1.09,0,0,0-.11-1.53l-4-3.45a3.65,3.65,0,0,1-1-1.78l-1-5.2A1.09,1.09,0,0,0,26.63,0L23,.71A1.09,1.09,0,0,0,22.08,2l.79,4.16a1.67,1.67,0,0,1-.74,1.62l-4,2.32a2.08,2.08,0,0,1-1.88,0l-4-2.32a1.64,1.64,0,0,1-.74-1.62L12.22,2A1.09,1.09,0,0,0,11.35.71L7.67,0A1.09,1.09,0,0,0,6.4.89l-1,5.2a3.57,3.57,0,0,1-1,1.78l-4,3.45a1.08,1.08,0,0,0-.12,1.53L2.7,15.69a1.09,1.09,0,0,0,1.54.12l3.21-2.76a1.65,1.65,0,0,1,1.76-.17l4,2.33a2.09,2.09,0,0,1,.94,1.63V21.5a1.64,1.64,0,0,1-1,1.45l-4,1.4a1.09,1.09,0,0,0-.67,1.39l1.24,3.53a1.09,1.09,0,0,0,1.39.67l5-1.75a3.58,3.58,0,0,1,2.05,0l5,1.75a1.09,1.09,0,0,0,1.39-.67l1.24-3.53a1.09,1.09,0,0,0-.67-1.39l-4-1.4a1.64,1.64,0,0,1-1-1.45V16.84a2.12,2.12,0,0,1,.94-1.63Z\"></path></svg>\n",
"</div>\n",
"</a>\n",
"<span class=\"navbar-burger burger\" data-target=\"navbar-menu-hero\">\n",
"<span></span>\n",
"<span></span>\n",
"<span></span>\n",
"</span>\n",
"</div>\n",
"<div class=\"navbar-menu\" id=\"navbar-menu-hero\">\n",
"<div class=\"navbar-end\">\n",
"<a class=\" has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/corporate/\">\n",
" 企業情報\n",
" </a>\n",
"<a class=\" has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/business/\">\n",
" 事業内容\n",
" </a>\n",
"<a class=\" has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/approach/\">\n",
" 取り組み\n",
" </a>\n",
"<a class=\"is-active has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/news/\">\n",
" ニュース\n",
" </a>\n",
"<a class=\" has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/story/\">\n",
" ストーリー\n",
" </a>\n",
"<a class=\" has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/fulife/\">\n",
" フライフ\n",
" </a>\n",
"<a class=\" has-margin-vertical-5 navbar-item is-fuller-navbar-item is-letter-spacing-2px has-text-weight-bold\" href=\"/career/\">\n",
" 採用情報\n",
" </a>\n",
"</div>\n",
"</div>\n",
"</nav>\n",
"</div>\n",
"<div class=\"hero-body\">\n",
"<div class=\"container has-text-centered\">\n",
"<h1 class=\"has-text-weight-bold has-text-white is-size-4 is-letter-spacing-4px\">\n",
" ニュース\n",
" </h1>\n",
"<h2 class=\"has-text-weight-bold has-text-white is-size-7 has-margin-top-3 is-letter-spacing-2px\">\n",
" フラーの様々なニュースをお届け。\n",
" </h2>\n",
"</div>\n",
"</div>\n",
"</section>\n",
"<section class=\"section\">\n",
"<div class=\"container\">\n",
"<div class=\"columns is-multiline is-centered\">\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/11/niigata-headquarter/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/11/niigata-headquarter/thumbnail_huc5de0a07fd31ad9bbe36ee678ed88566_819825_1200x630_resize_q75_box.jpeg\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年11月16日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" フラー株式会社 本店移転のお知らせ\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/11/kosencaravan2020-1111/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/11/kosencaravan2020-1111/thumbnail_hueab7309064aa33b732383879961dd314_251967_600x315_resize_box_2.png\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年11月11日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" フラー、全国の高専生を対象にしたオンラインキャリアイベント「高専キャラバン 2020 冬の陣」開催を決定!\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/10/niigata-nagaoka-kosen-prof/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/10/niigata-nagaoka-kosen-prof/thumbnail_hu0f092d082311b4b780f5b0bc91e7feca_2221845_3000x1575_resize_q75_box.jpeg\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年10月20日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" フラー代表取締役会長 渋谷、長岡高専の客員教授に就任\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/10/team-change-announcement/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/10/team-change-announcement/thumbnail_hu8518e79cdda948a7ad9007db9ed30243_3707784_1500x844_resize_q75_box.jpeg\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年10月01日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" フラー株式会社 会長及び社長就任のお知らせ\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/09/2020-09-29-tbs-app-ape-use-case/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/09/2020-09-29-tbs-app-ape-use-case/thumbnail_hud5519d2c3bc71c395724fcdb42b10e6a_94475_800x450_resize_box_2.png\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年09月29日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" TBSが国内No.1 アプリ分析ツール「App Ape」導入\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/09/officialclubpartner-20200921/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/09/officialclubpartner-20200921/thumbnail_hu87548c528f0a46b9981a5c6654ef04dc_131964_800x450_resize_q75_box.jpeg\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年09月21日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" フラー、アルビレックス新潟とオフィシャルクラブパートナー契約を締結\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/09/2020-09-11-ntt-data-app-ape-use-case/\" rel=\"noopener noreferrer\" target=\"_blank\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/09/2020-09-11-ntt-data-app-ape-use-case/thumbnail_huf17ca9bbb9c2394dc528156bafb77001_88073_800x450_resize_box_2.png\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>メディア</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年09月11日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" NTTデータにおけるエンタープライズ向けモバイルアプリ開発をサポート\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/09/nva/\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/09/nva/thumbnail_hue925fc66d93cb127a07d07c9acff29ad_162726_932x493_resize_q75_box.jpeg\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>リリース</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年09月10日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" フラー代表取締役 渋谷が新潟ベンチャー協会代表理事に就任\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"column is-one-third\">\n",
"<a href=\"/news/2020/08/2020-08-11-united-app-ape-use-case/\" rel=\"noopener noreferrer\" target=\"_blank\">\n",
"<div class=\"card is-full-height\">\n",
"<div class=\"card-image\">\n",
"<figure class=\"image is-16by9\">\n",
"<img class=\"is-cover\" src=\"/news/2020/08/2020-08-11-united-app-ape-use-case/thumbnail_hudc4d7c1404173ce2e95506845634889e_114215_800x450_resize_box_2.png\"/>\n",
"</figure>\n",
"</div>\n",
"<div class=\"card-content\">\n",
"<div class=\"card-top-content\">\n",
"<div class=\"tag-for-xxx\">\n",
"<span>メディア</span>\n",
"</div>\n",
"<div class=\"is-date-text\">\n",
" 2020年08月11日\n",
" </div>\n",
"</div>\n",
"<h2 class=\"is-news-title\">\n",
" ユナイテッドが国内No.1 アプリ分析ツール「App Ape」導入\n",
" </h2>\n",
"<p class=\"has-padding-vertical-5 is-read-more\">\n",
" さらに詳しく 〉\n",
" </p>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"</div>\n",
"<div class=\"has-margin-vertical-8 is-hidden-mobile\">\n",
"<nav class=\"pagination is-rounded has-text-centered\">\n",
"<ul class=\"pagination-list is-rounded\" style=\"justify-content: center;\">\n",
"<li><a class=\"pagination-link is-news-page-nav is-current\" href=\"/news/\">1</a></li>\n",
"<li><a class=\"pagination-link is-news-page-nav\" href=\"/news/page/2/\">2</a></li>\n",
"<li><a class=\"pagination-link is-news-page-nav\" href=\"/news/page/3/\">3</a></li>\n",
"<li><p class=\"pagination-ellipsis is-news-page-nav\">…</p></li>\n",
"<li><a class=\"pagination-link is-news-page-nav\" href=\"/news/page/24/\">24</a></li>\n",
"<li>\n",
"<a class=\"pagination-next is-news-page-nav-button\" href=\"/news/page/2/\">→</a>\n",
"</li>\n",
"</ul>\n",
"</nav>\n",
"</div>\n",
"<div class=\"has-margin-vertical-8 is-hidden-tablet\">\n",
"<nav class=\"pagination is-rounded has-text-centered\">\n",
"<ul class=\"pagination-list is-rounded\" style=\"justify-content: center;\">\n",
"<li><a class=\"pagination-link is-news-page-nav is-current\" href=\"/news/\">1</a></li>\n",
"<li><a class=\"pagination-link is-news-page-nav\" href=\"/news/page/2/\">2</a></li>\n",
"<li><a class=\"pagination-link is-news-page-nav\" href=\"/news/page/3/\">3</a></li>\n",
"<li>\n",
"<a class=\"pagination-next is-news-page-nav-button\" href=\"/news/page/2/\">→</a>\n",
"</li>\n",
"</ul>\n",
"</nav>\n",
"</div>\n",
"</div>\n",
"</div>\n",
"</section>\n",
"<a class=\"career-banner hero is-medium is-primary has-bg-img\" href=\"/career/\">\n",
"<div class=\"hero-body\">\n",
"<div class=\"container has-text-white\">\n",
"<h1 class=\"is-title has-text-weight-bold has-text-white has-padding-top-9 is-size-3 is-size-5-mobile is-letter-spacing-5px\">モバイルの未来を共に創る仲間、募集中。</h1>\n",
"<h2 class=\"has-padding-top-4 has-text-weight-bold has-padding-bottom-10 is-size-5 is-size-6-mobile\">採用情報へ 〉</h2>\n",
"</div>\n",
"</div>\n",
"</a>\n",
"<footer class=\"footer fullerblue-gradation-bg\">\n",
"<div class=\"container\">\n",
"<div class=\"content has-text-centered\">\n",
"<div class=\"has-padding-top-5 has-padding-bottom-10\">\n",
"<div class=\"columns is-centered\">\n",
"<div class=\"column is-te\">\n",
"<a href=\"/\">\n",
"<img src=\"/img/logo.svg\" style=\"max-width: 200px;\"/>\n",
"</a>\n",
"</div>\n",
"</div>\n",
"</div>\n",
"<div class=\"columns is-multiline is-centered\">\n",
"<div class=\"column is-narrow\">\n",
"<a class=\"button is-rounded is-lang-link\" href=\"https://en.fuller-inc.com/\"> English </a>\n",
"</div>\n",
"<div class=\"column is-narrow\">\n",
"<a class=\"button is-rounded is-lang-link\" href=\"https://ko.fuller-inc.com/\"> 한국어 </a>\n",
"</div>\n",
"</div>\n",
" \n",
" <div class=\"columns is-multiline is-centered\">\n",
"<div class=\"column is-narrow\">\n",
"<a class=\"has-text-white\" href=\"/docs/privacy-policy/\"> プライバシーポリシー </a>\n",
"</div>\n",
"<div class=\"column is-narrow\">\n",
"<a class=\"has-text-white\" href=\"/docs/terms/\"> サイト利用規約 </a>\n",
"</div>\n",
"<div class=\"column is-narrow\">\n",
"<a class=\"has-text-white\" href=\"/contact-us/\"> お問い合わせ </a>\n",
"</div>\n",
"</div>\n",
"<div class=\"has-margin-vertical-7 align-items-horizontal-centered\">\n",
"<a class=\"has-margin-horizontal-5\" href=\"https://www.facebook.com/fuller.official/\">\n",
"<img class=\"image is-32x32\" src=\"/img/facebook.svg\"/>\n",
"</a>\n",
"<a class=\"has-margin-horizontal-5\" href=\"https://www.instagram.com/fulife_official/\">\n",
"<img class=\"image is-32x32\" src=\"/img/instagram.svg\"/>\n",
"</a>\n",
"<a class=\"has-margin-horizontal-5\" href=\"https://www.youtube.com/channel/UCzqj8Da2DVcqWdPvpc5_tqg\">\n",
"<img class=\"image is-32x32\" src=\"/img/you-tube.svg\"/>\n",
"</a>\n",
"<a class=\"has-margin-horizontal-5\" href=\"https://twitter.com/fuller_inc\">\n",
"<img class=\"image is-32x32\" src=\"/img/twitter.svg\"/>\n",
"</a>\n",
"</div>\n",
"<div>\n",
"<p class=\"has-text-white\">Copyright © 2018 Fuller, Inc.</p>\n",
"</div>\n",
"</div>\n",
"</div>\n",
"</footer>\n",
"<script src=\"/main.js?1605577982\"></script>\n",
"</body>\n",
"</html>"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "sKAcUCS0aSGg"
},
"source": [
"### CSSセレクタについて"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "KMtWlKr7EqqO",
"outputId": "cf3b1d57-4cdb-4953-a939-09e26d587af7"
},
"source": [
"from bs4 import BeautifulSoup\n",
"\n",
"practise_soup = BeautifulSoup(\"\"\"\n",
"<html>\n",
" <head>\n",
" <title>こんにちは</title>\n",
" </head>\n",
"<body>\n",
" <h1>これはタイトルです</h1>\n",
" <div>\n",
" <div class=\"news\">\n",
" <h1>これはニュース1です</h1>\n",
" <p>これは説明です</p>\n",
" </div>\n",
" <div class=\"news\">\n",
" <h1>これはニュース2です</h1>\n",
" <p>これは説明です</p>\n",
" </div>\n",
" <div class=\"news\">\n",
" <h1>これはニュース3です</h1>\n",
" <p>これは説明です</p>\n",
" </div>\n",
" </div>\n",
"</body>\n",
"</html>\n",
"\"\"\")\n",
"\n",
"# headのtitleを取り出してみる\n",
"print(practise_soup.select_one(\"head title\").text)\n",
"\n",
"# newsのタイトルを取り出してみる\n",
"print(practise_soup.select(\".news\"))\n",
"\n",
"# TODO1: newsクラスの中のh1だけを取り出すCSSセレクタを書く\n",
"print(practise_soup.select(\".news h1\"))\n",
"\n",
"# TODO2: TODO1の結果をループで回して、タイトルのテキストだけprintする\n",
"for h1 in practise_soup.select(\".news h1\"):\n",
" print(h1.text)\n",
"\n",
"# 期待する出力 (下記が出力される)\n",
"# これはニュース1です\n",
"# これはニュース2です\n",
"# これはニュース3です"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"こんにちは\n",
"[<div class=\"news\">\n",
"<h1>これはニュース1です</h1>\n",
"<p>これは説明です</p>\n",
"</div>, <div class=\"news\">\n",
"<h1>これはニュース2です</h1>\n",
"<p>これは説明です</p>\n",
"</div>, <div class=\"news\">\n",
"<h1>これはニュース3です</h1>\n",
"<p>これは説明です</p>\n",
"</div>]\n",
"[<h1>これはニュース1です</h1>, <h1>これはニュース2です</h1>, <h1>これはニュース3です</h1>]\n",
"これはニュース1です\n",
"これはニュース2です\n",
"これはニュース3です\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "5sM07MY5aYDo"
},
"source": [
"### データを取得してみる"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uYslXT1_I0Lu",
"outputId": "cbae4830-b4ca-43bd-b2d8-7b77e29a0303"
},
"source": [
"news_data = []\n",
"for content in soup.select(\".card-content\"):\n",
" # 日付を取得する\n",
" date = content.select_one(\".is-date-text\").text.strip()\n",
" # TODO: タイトルをprintするコードを追加する\n",
" title = content.select_one(\"h2\").text.strip()\n",
" # 取り出したデータを保存しておく\n",
" news_data.append({\n",
" \"date\": date,\n",
" \"title\": title,\n",
" })\n",
"news_data"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[{'date': '2020年11月16日', 'title': 'フラー株式会社 本店移転のお知らせ'},\n",
" {'date': '2020年11月11日',\n",
" 'title': 'フラー、全国の高専生を対象にしたオンラインキャリアイベント「高専キャラバン 2020 冬の陣」開催を決定!'},\n",
" {'date': '2020年10月20日', 'title': 'フラー代表取締役会長 渋谷、長岡高専の客員教授に就任'},\n",
" {'date': '2020年10月01日', 'title': 'フラー株式会社 会長及び社長就任のお知らせ'},\n",
" {'date': '2020年09月29日', 'title': 'TBSが国内No.1 アプリ分析ツール「App Ape」導入'},\n",
" {'date': '2020年09月21日', 'title': 'フラー、アルビレックス新潟とオフィシャルクラブパートナー契約を締結'},\n",
" {'date': '2020年09月11日', 'title': 'NTTデータにおけるエンタープライズ向けモバイルアプリ開発をサポート'},\n",
" {'date': '2020年09月10日', 'title': 'フラー代表取締役 渋谷が新潟ベンチャー協会代表理事に就任'},\n",
" {'date': '2020年08月11日', 'title': 'ユナイテッドが国内No.1 アプリ分析ツール「App Ape」導入'}]"
]
},
"metadata": {
"tags": []
},
"execution_count": 26
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "MrVi-qNCaaTo"
},
"source": [
"### データを加工してみる"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 507
},
"id": "VbKKjjESKte6",
"outputId": "590737e1-855a-4d8c-abb0-5f9d829f2938"
},
"source": [
"import pandas as pd\n",
"\n",
"# 入力サンプル: d = \"2020年11月10日\"\n",
"# 出力サンプル: \"2020年11月\"\n",
"def parse_month(d):\n",
" # TODO: 日付を取り除くコードを書く\n",
" parts = d.split(\"月\") \n",
" return parts[0] + \"月\" # parts[:-3]\n",
"\n",
"print(parse_month(\"2020年11月10日\"))\n",
"\n",
"# TODO: news_dataをDataFrameに変換するコードを追加する\n",
"df = pd.DataFrame(news_data)\n",
"df[\"month\"] = df[\"date\"].apply(parse_month)\n",
"df"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"2020年11月\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>date</th>\n",
" <th>title</th>\n",
" <th>month</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>2020年11月16日</td>\n",
" <td>フラー株式会社 本店移転のお知らせ</td>\n",
" <td>2020年11月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>2020年11月11日</td>\n",
" <td>フラー、全国の高専生を対象にしたオンラインキャリアイベント「高専キャラバン 2020 冬の陣...</td>\n",
" <td>2020年11月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>2020年10月20日</td>\n",
" <td>フラー代表取締役会長 渋谷、長岡高専の客員教授に就任</td>\n",
" <td>2020年10月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>2020年10月01日</td>\n",
" <td>フラー株式会社 会長及び社長就任のお知らせ</td>\n",
" <td>2020年10月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>2020年09月29日</td>\n",
" <td>TBSが国内No.1 アプリ分析ツール「App Ape」導入</td>\n",
" <td>2020年09月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>5</th>\n",
" <td>2020年09月21日</td>\n",
" <td>フラー、アルビレックス新潟とオフィシャルクラブパートナー契約を締結</td>\n",
" <td>2020年09月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>6</th>\n",
" <td>2020年09月11日</td>\n",
" <td>NTTデータにおけるエンタープライズ向けモバイルアプリ開発をサポート</td>\n",
" <td>2020年09月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>7</th>\n",
" <td>2020年09月10日</td>\n",
" <td>フラー代表取締役 渋谷が新潟ベンチャー協会代表理事に就任</td>\n",
" <td>2020年09月</td>\n",
" </tr>\n",
" <tr>\n",
" <th>8</th>\n",
" <td>2020年08月11日</td>\n",
" <td>ユナイテッドが国内No.1 アプリ分析ツール「App Ape」導入</td>\n",
" <td>2020年08月</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" date title month\n",
"0 2020年11月16日 フラー株式会社 本店移転のお知らせ 2020年11月\n",
"1 2020年11月11日 フラー、全国の高専生を対象にしたオンラインキャリアイベント「高専キャラバン 2020 冬の陣... 2020年11月\n",
"2 2020年10月20日 フラー代表取締役会長 渋谷、長岡高専の客員教授に就任 2020年10月\n",
"3 2020年10月01日 フラー株式会社 会長及び社長就任のお知らせ 2020年10月\n",
"4 2020年09月29日 TBSが国内No.1 アプリ分析ツール「App Ape」導入 2020年09月\n",
"5 2020年09月21日 フラー、アルビレックス新潟とオフィシャルクラブパートナー契約を締結 2020年09月\n",
"6 2020年09月11日 NTTデータにおけるエンタープライズ向けモバイルアプリ開発をサポート 2020年09月\n",
"7 2020年09月10日 フラー代表取締役 渋谷が新潟ベンチャー協会代表理事に就任 2020年09月\n",
"8 2020年08月11日 ユナイテッドが国内No.1 アプリ分析ツール「App Ape」導入 2020年08月"
]
},
"metadata": {
"tags": []
},
"execution_count": 33
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "POOpfWhUae_e"
},
"source": [
"### データを集計してみる"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 206
},
"id": "tqxIZSOFOuRh",
"outputId": "c50d32a3-a325-4daa-d41b-934531de13a0"
},
"source": [
"df_count = df.groupby(\"month\").count()\n",
"df_count = df_count[[\"date\"]].rename(columns={\"date\": \"count\"})\n",
"df_count"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>count</th>\n",
" </tr>\n",
" <tr>\n",
" <th>month</th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>2020年08月</th>\n",
" <td>1</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020年09月</th>\n",
" <td>4</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020年10月</th>\n",
" <td>2</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2020年11月</th>\n",
" <td>2</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" count\n",
"month \n",
"2020年08月 1\n",
"2020年09月 4\n",
"2020年10月 2\n",
"2020年11月 2"
]
},
"metadata": {
"tags": []
},
"execution_count": 40
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 516
},
"id": "03xzZCSuPkPT",
"outputId": "9b84f7f8-5c84-4d06-a309-755644ea0219"
},
"source": [
"!pip install japanize-matplotlib\n",
"import japanize_matplotlib\n",
"df_count.plot.bar(y=\"count\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "stream",
"text": [
"Requirement already satisfied: japanize-matplotlib in /usr/local/lib/python3.6/dist-packages (1.1.3)\n",
"Requirement already satisfied: matplotlib in /usr/local/lib/python3.6/dist-packages (from japanize-matplotlib) (3.2.2)\n",
"Requirement already satisfied: numpy>=1.11 in /usr/local/lib/python3.6/dist-packages (from matplotlib->japanize-matplotlib) (1.18.5)\n",
"Requirement already satisfied: python-dateutil>=2.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->japanize-matplotlib) (2.8.1)\n",
"Requirement already satisfied: kiwisolver>=1.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->japanize-matplotlib) (1.3.1)\n",
"Requirement already satisfied: cycler>=0.10 in /usr/local/lib/python3.6/dist-packages (from matplotlib->japanize-matplotlib) (0.10.0)\n",
"Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /usr/local/lib/python3.6/dist-packages (from matplotlib->japanize-matplotlib) (2.4.7)\n",
"Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.6/dist-packages (from python-dateutil>=2.1->matplotlib->japanize-matplotlib) (1.15.0)\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x7f83d12c3fd0>"
]
},
"metadata": {
"tags": []
},
"execution_count": 44
},
{
"output_type": "display_data",
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "OncGaprgZE6x"
},
"source": [
"ここまでで2日目が終了です。可視化まで行かないこともありますが、授業の進捗によって途中を端折ったりしています。\n",
"\n",
"ちなみに授業の進捗は、Slackを使ってこんな感じで同期しています。完全にオンラインの授業のため、相手のリアクションがわかりません。終わったらSlackのリアクションを押してもらうようにしています。人数が把握出来るため、結構便利です。\n",
"\n",
"![image.png]()\n"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_kNq0_RUZWiM"
},
"source": [
"## クローラーの作成とデータの可視化\n",
"\n",
"今回はアドバンスドな授業をまとめてみます。本当はもっと簡素なバージョンで行っています。集中講義など、Pythonを学びたい学生がたくさんいるときは、こちらのバージョンを使います。\n",
"\n",
"今年はクラフトビールにハマっているので、データの対象は酒造免許ですが、自分の趣味に応じで取得するデータは変えています。\n",
"\n",
"データ可視化のガイドラインは、Material DesignのData Visualizationがわかりやすいため、これを読むように伝えています。\n",
"\n",
"https://material.io/design/communication/data-visualization.html"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "J3T7_5r_SrxG"
},
"source": [
"### 今日やること\n",
"\n",
"1. エクセルのURLリストをrequestsとBeautifulSoupのライブラリを使って、クローリングする\n",
"2. エクセルのURLから、直接Pandas DataFrameの形式に変換する\n",
"3. データを整形して、正規化する\n",
"4. 様々な角度からデータを切り出して、グラフで表示してみる"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "_AXX4p-wTiZE"
},
"source": [
"### エクセルのURLリストをクローリングする"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "MDY6a87kThZB",
"outputId": "23a13f97-e687-47fa-9cbb-b072de44e1de"
},
"source": [
"import requests\n",
"\n",
"res = requests.get(\"https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/zenkoku.htm\")\n",
"res"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<Response [200]>"
]
},
"metadata": {
"tags": []
},
"execution_count": 2
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "r-oE50fdT9yc",
"outputId": "eab66197-1676-4e9d-ed14-603efdb4216f"
},
"source": [
"from bs4 import BeautifulSoup\n",
"\n",
"soup = BeautifulSoup(res.content)\n",
"soup"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<!DOCTYPE html>\n",
"<html lang=\"ja\">\n",
"<head>\n",
"<meta content=\"text/html; charset=utf-8\" http-equiv=\"Content-Type\"/>\n",
"<meta content=\"IE=edge\" http-equiv=\"X-UA-Compatible\"/>\n",
"<meta content=\"width=device-width, initial-scale=1\" name=\"viewport\"/>\n",
"<title>全国分 | 酒類の免許|国税庁</title>\n",
"<link href=\"/template/css/bootstrap.min.css\" rel=\"stylesheet\"/>\n",
"<!--[if lt IE 9]>\n",
"\t\t\t<script src=\"/template/js/html5shiv.min.js\"></script>\n",
"\t\t\t<script src=\"/template/js/respond.min.js\"></script>\n",
"\t\t<![endif]-->\n",
"<script src=\"/template/js/jquery.min.js\"></script>\n",
"<script src=\"/template/js/bootstrap.min.js\"></script>\n",
"<link href=\"/template/css/common.css\" rel=\"stylesheet\"/>\n",
"<link href=\"/template/css/import.css\" rel=\"stylesheet\"/>\n",
"<script src=\"/template/js/custom.js\"></script>\n",
"<script src=\"/template/js/include.js\"></script>\n",
"</head>\n",
"<body>\n",
"<noscript>すべての機能をご利用いただくにはJavascriptを有効にしてください。</noscript>\n",
"<div class=\"container\">\n",
"<div id=\"header\"></div>\n",
"<div class=\"wrap\">\n",
"<div class=\"clearfix\">\n",
"<div class=\"left-content contents\">\n",
"<div id=\"contents\">\n",
"<div class=\"imp-cnt\" id=\"bodyArea\">\n",
"<p class=\"skip\"></p>\n",
"<ol class=\"breadcrumb\"><li><a href=\"/\">ホーム</a></li><li><a href=\"/taxes/index.htm\">税の情報・手続・用紙</a></li><li><a href=\"/taxes/sake/index.htm\">お酒に関する情報</a></li><li><a href=\"/taxes/sake/menkyo/mokuji.htm\">酒類の免許</a></li><li class=\"active\">全国分</li></ol>\n",
"<div class=\"page-header\" id=\"page-top\"><h1>酒類等製造免許の新規取得者名等一覧</h1></div>\n",
"<p> 酒類等製造免許の新規取得者名等については、平成26年より、品目別に全国分をまとめたものを掲載します。</p>\n",
"<div class=\"table-responsive\"><table class=\"table table-bordered tbl_kohyo3\">\n",
"<colgroup>\n",
"<col class=\"\" span=\"1\"/>\n",
"<col class=\"\" span=\"7\"/>\n",
"</colgroup>\n",
"<tbody><tr>\n",
"<th scope=\"col\" width=\"20%\">品目</th>\n",
"<th scope=\"col\" width=\"11%\">平成26年</th>\n",
"<th scope=\"col\" width=\"11%\">平成27年</th>\n",
"<th scope=\"col\" width=\"11%\">平成28年</th>\n",
"<th scope=\"col\" width=\"11%\">平成29年</th>\n",
"<th scope=\"col\" width=\"11%\">平成30年</th>\n",
"<th scope=\"col\" width=\"11%\">令和元年<br/></th>\n",
"<th scope=\"col\" width=\"11%\">令和2年<br/>\n",
" (8月分まで)</th>\n",
"</tr>\n",
"<tr>\n",
"<td>果実酒</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/15.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/01.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/08.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/22.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/01.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/01.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/></a><br/></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/01.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/></a><br/></td>\n",
"</tr>\n",
"<tr>\n",
"<td>ビール</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/16.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/02.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/09.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/23.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/02.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/02.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/02.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"</tr>\n",
"<tr>\n",
"<td>発泡酒</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/17.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/03.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/10.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/24.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/03.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/03.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/03.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"</tr>\n",
"<tr>\n",
"<td>ウイスキー</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/18.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/04.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/11.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/25.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/04.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/04.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/04.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"</tr>\n",
"<tr>\n",
"<td>その他の醸造酒</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/19.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/05.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/12.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/26.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/05.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/05.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/05.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"</tr>\n",
"<tr>\n",
"<td>リキュール</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/20.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/06.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/13.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/27.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/06.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/06.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/06.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"</tr>\n",
"<tr>\n",
"<td>上記以外</td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/21.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/07.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/14.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/28.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/07.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/07.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"<td class=\"center\"><a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/07.htm\"><img alt=\"詳細ページへのリンク\" src=\"/template/img/template/html_icon.png\"/><br/></a></td>\n",
"</tr>\n",
"</tbody></table></div><br/>\n",
"<p class=\"marginLeft2em\" style=\"margin-top:-2em;\">(注)氏名又は名称等については、常用漢字による表示をする場合があります。</p>\n",
"<br/>\n",
"<p class=\"marginLeft2em\">全ての品目の一覧はこちらをご覧ください。<br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/h26.xlsx\">平成26年Excelファイル/45KB</a><br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/h27.xlsx\">平成27年Excelファイル/53KB</a><br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/h28.xlsx\">平成28年Excelファイル/43KB</a><br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/h29.xlsx\">平成29年Excelファイル/42KB</a><br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/h30/12/h30.xlsx\">平成30年Excelファイル/45KB</a><br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/r01/10/r01.xlsx\">令和元年Excelファイル/49KB</a><br/>\n",
"<a href=\"/taxes/sake/menkyo/shinki/seizo/02/r02/02/r02.xlsx\">令和2年(8月分まで)Excelファイル/45KB</a>\n",
"</p>\n",
"</div>\n",
"</div><p class=\"page-top-link\"><a href=\"#page-top\">このページの先頭へ</a></p>\n",
"</div>\n",
"<!--/left-menu-->\n",
"<div class=\"right-menu\">\n",
"<div class=\"panel panel-default sidenavi bmg20\">\n",
"<div class=\"panel-heading\">\n",
"<h2 class=\"panel-title\"><a href=\"/taxes/index.htm\">税の情報・手続・用紙</a></h2>\n",
"</div>\n",
"<div class=\"panel-group\" id=\"navi-Accordion\">\n",
"<h3 class=\"sidenavi-title\"><a class=\"navi_title\" href=\"/taxes/shiraberu/index.htm\">税について調べる</a><a class=\"navi_btn collapsed\" data-parent=\"#sidenaviAccordion\" data-toggle=\"collapse\" href=\"#sidenaviAccordionCollapse1\"><img alt=\"メニューを開く\" src=\"/template/img/template/navi_down.png\"/></a></h3><div class=\"panel-collapse collapse\" id=\"sidenaviAccordionCollapse1\"><ul><li><a href=\"/taxes/shiraberu/shinkoku/kakutei.htm\">所得税(確定申告書等作成コーナー)</a></li><li><a href=\"/taxes/shiraberu/taxanswer/index2.htm\">タックスアンサー(よくある税の質問)</a></li><li><a href=\"/taxes/shiraberu/sodan/index.htm\">税の相談</a></li><li><a href=\"/taxes/shiraberu/zeimokubetsu/index.htm\">税目別情報</a></li><li><a href=\"http://www.rosenka.nta.go.jp/\">路線価図・評価倍率表</a></li><li><a href=\"/taxes/shiraberu/saigai/index.htm\">災害関連情報</a></li><li><a href=\"/taxes/shiraberu/kokusai/index.htm\">国際税務関係情報</a></li><li><a href=\"/taxes/shiraberu/shirabekata/info.htm\">税についての上手な調べ方</a></li></ul></div><h3 class=\"sidenavi-title\"><a class=\"navi_title\" href=\"/taxes/tetsuzuki/index.htm\">申告手続・用紙</a><a class=\"navi_btn collapsed\" data-parent=\"#sidenaviAccordion\" data-toggle=\"collapse\" href=\"#sidenaviAccordionCollapse2\"><img alt=\"メニューを開く\" src=\"/template/img/template/navi_down.png\"/></a></h3><div class=\"panel-collapse collapse\" id=\"sidenaviAccordionCollapse2\"><ul><li><a href=\"/taxes/tetsuzuki/shinsei/index.htm\">申告・申請・届出等、用紙(手続の案内・様式)</a></li><li><a href=\"/taxes/tetsuzuki/mynumberinfo/index.htm\">社会保障・税番号制度(マイナンバー)</a></li></ul></div><h3 class=\"sidenavi-title\"><a class=\"navi_title\" href=\"/taxes/nozei/index.htm\">納税・納税証明書手続</a><a class=\"navi_btn collapsed\" data-parent=\"#sidenaviAccordion\" data-toggle=\"collapse\" href=\"#sidenaviAccordionCollapse3\"><img alt=\"メニューを開く\" src=\"/template/img/template/navi_down.png\"/></a></h3><div class=\"panel-collapse collapse\" id=\"sidenaviAccordionCollapse3\"><ul><li><a href=\"/taxes/nozei/nofu/01.htm\">国税の納付手続</a></li><li><a href=\"/taxes/nozei/nozei-shomei/01.htm\">納税証明書</a></li><li><a href=\"/taxes/nozei/enno-butsuno/01.htm\">延納・物納申請等</a></li><li><a href=\"/taxes/nozei/entaizei/keisan/entai.htm\">延滞税</a></li></ul></div><h3 class=\"sidenavi-title\"><a class=\"navi_title\" href=\"/taxes/zeirishi/index.htm\">税理士に関する情報</a><a class=\"navi_btn collapsed\" data-parent=\"#sidenaviAccordion\" data-toggle=\"collapse\" href=\"#sidenaviAccordionCollapse4\"><img alt=\"メニューを開く\" src=\"/template/img/template/navi_down.png\"/></a></h3><div class=\"panel-collapse collapse\" id=\"sidenaviAccordionCollapse4\"><ul><li><a href=\"/taxes/zeirishi/index.htm#news\">新着情報</a></li><li><a href=\"/taxes/zeirishi/zeirishiseido/seido.htm\">税理士制度</a></li><li><a href=\"/taxes/zeirishi/zeirishishiken/zeirishi.htm\">税理士試験</a></li><li><a href=\"/taxes/zeirishi/chokai/chokai.htm\">税理士に関する懲戒処分等</a></li><li><a href=\"/taxes/zeirishi/search/search.htm\">税理士をお探しの方</a></li><li><a href=\"/taxes/zeirishi/zeirishiseido/qa.htm\">税理士関係法令等・Q&A</a></li></ul></div><h3 class=\"sidenavi-title\"><a class=\"navi_title\" href=\"/taxes/sake/index.htm\">お酒に関する情報</a><a class=\"navi_btn\" data-parent=\"#sidenaviAccordion\" data-toggle=\"collapse\" href=\"#sidenaviAccordionCollapse5\"><img alt=\"メニューを閉じる\" src=\"/template/img/template/navi_up.png\"/></a></h3><div class=\"panel-collapse collapse in\" id=\"sidenaviAccordionCollapse5\"><ul><li><a href=\"/taxes/sake/index.htm#syuzei\">酒税関係及び各種施策</a></li><li><a href=\"/taxes/sake/index.htm#ippan\">一般的な酒税の取扱い</a></li><li><a href=\"/taxes/sake/index.htm#senmonsyuzei\">専門的な酒税の取扱い</a></li><li><a href=\"/taxes/sake/index.htm#senmonsodan\">専門的な酒税の相談</a></li><li><a href=\"/taxes/sake/index.htm#shinkoku\">酒税の申告・納付や届出等</a></li><li><a href=\"/taxes/sake/index.htm#denwa\">電話相談センター・税務署の案内</a></li><li><a href=\"/taxes/sake/index.htm#shingikai\">審議会等</a></li><li><a href=\"/taxes/sake/index.htm#tokei\">統計情報・各種資料</a></li><li><a href=\"/taxes/sake/index.htm#qa\">お酒に関するQ&A(よくある質問)</a></li></ul></div><h3 class=\"sidenavi-title\"><a class=\"navi_title\" href=\"/taxes/kids/index.htm\">税の学習コーナー</a></h3>\n",
"</div>\n",
"</div>\n",
"</div>\n",
"<!--/right-menu-->\n",
"</div>\n",
"<!--/clearfix-->\n",
"</div>\n",
"<!--/wrap-->\n",
"<footer>\n",
"<h2>サイトマップ(コンテンツ一覧)</h2>\n",
"<ol class=\"breadcrumb footer\"><li><a href=\"/\">ホーム</a></li><li><a href=\"/taxes/index.htm\">税の情報・手続・用紙</a></li><li><a href=\"/taxes/sake/index.htm\">お酒に関する情報</a></li><li><a href=\"/taxes/sake/menkyo/mokuji.htm\">酒類の免許</a></li><li class=\"active\">全国分</li></ol>\n",
"<div id=\"footer\"></div>\n",
"</footer>\n",
"</div>\n",
"<!--/container-->\n",
"</body>\n",
"</html>"
]
},
"metadata": {
"tags": []
},
"execution_count": 3
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "NqF8r2otUR7b",
"outputId": "d114eb67-fe6c-4933-d9ec-69a285a71ef9"
},
"source": [
"# TODO: エクセルのURLリストをCSSセレクタを使って取得する\n",
"url_list = []\n",
"# for a in soup.select(\".marginLeft2em a\"):\n",
"for a in soup.select(\"a[href$='.xlsx']\"):\n",
" url_list.append(\"https://www.nta.go.jp\" + a.get(\"href\"))\n",
"url_list"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"['https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/h26.xlsx',\n",
" 'https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/h27.xlsx',\n",
" 'https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/h28.xlsx',\n",
" 'https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/h29.xlsx',\n",
" 'https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/h30/12/h30.xlsx',\n",
" 'https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/r01/10/r01.xlsx',\n",
" 'https://www.nta.go.jp/taxes/sake/menkyo/shinki/seizo/02/r02/02/r02.xlsx']"
]
},
"metadata": {
"tags": []
},
"execution_count": 4
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "epdG1I16YjCX",
"outputId": "23aab629-ff96-4ad1-9248-3bc4d79987ca"
},
"source": [
"import pandas as pd\n",
"\n",
"dataset = []\n",
"for url in url_list:\n",
" df = pd.read_excel(url)\n",
" dataset.append(df)\n",
"dataset"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[ 酒類等製造免許の新規取得者名等一覧(平成26年分) ... Unnamed: 9\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   平成26年1月1日から平成26年12月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 審査項目\n",
" .. ... ... ...\n",
" 223 岩手 ... M1\n",
" 224 山梨 ... M1\n",
" 225 大阪 ... M4\n",
" 226 長野 ... M1\n",
" 227 長野 ... M1\n",
" \n",
" [228 rows x 10 columns],\n",
" 酒類等製造免許の新規取得者名等一覧(平成27年分) ... Unnamed: 9\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   平成27年1月1日から平成27年12月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 審査項目\n",
" .. ... ... ...\n",
" 261 島根 ... M1\n",
" 262 島根 ... M1\n",
" 263 北海道 ... M1\n",
" 264 神奈川 ... M1\n",
" 265 東京 ... M1\n",
" \n",
" [266 rows x 10 columns],\n",
" 酒類等製造免許の新規取得者名等一覧(平成28年分) ... Unnamed: 9\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   平成28年1月1日から平成28年12月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 審査項目\n",
" .. ... ... ...\n",
" 220 長野 ... M1\n",
" 221 徳島 ... M1\n",
" 222 和歌山 ... M4\n",
" 223 大阪 ... M1\n",
" 224 兵庫 ... M1\n",
" \n",
" [225 rows x 10 columns],\n",
" 酒類等製造免許の新規取得者名等一覧(平成29年分) ... Unnamed: 8\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   平成29年1月1日から平成29年12月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 処理区分\n",
" .. ... ... ...\n",
" 231 佐賀 ... 新規\n",
" 232 香川 ... 新規\n",
" 233 滋賀 ... 新規\n",
" 234 滋賀 ... 新規\n",
" 235 滋賀 ... 新規\n",
" \n",
" [236 rows x 9 columns],\n",
" 酒類等製造免許の新規取得者名等一覧(平成30年分) ... Unnamed: 8\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   平成30年1月1日から平成30年12月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 処理区分\n",
" .. ... ... ...\n",
" 289 長野 ... 新規\n",
" 290 京都 ... 新規\n",
" 291 栃木 ... 新規\n",
" 292 鹿児島 ... 新規\n",
" 293 福島 ... 移転\n",
" \n",
" [294 rows x 9 columns],\n",
" 酒類等製造免許の新規取得者名等一覧(令和元年分) ... Unnamed: 8\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   平成31年1月1日から令和元年12月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 処理区分\n",
" .. ... ... ...\n",
" 321 熊本 ... 新規\n",
" 322 新潟 ... 新規\n",
" 323 東京 ... 新規\n",
" 324 岐阜 ... 新規\n",
" 325 山形 ... 新規\n",
" \n",
" [326 rows x 9 columns],\n",
" 酒類等製造免許の新規取得者名等一覧(令和2年分) ... Unnamed: 8\n",
" 0 NaN ... NaN\n",
" 1 NaN ... NaN\n",
" 2   令和2年1月1日から令和2年8月31日までの酒類等製造免許の取得者等は次のとおりです。 ... NaN\n",
" 3 NaN ... NaN\n",
" 4 都道府県名 ... 処理区分\n",
" .. ... ... ...\n",
" 290 新潟 ... 新規\n",
" 291 東京 ... 新規\n",
" 292 山梨 ... 新規\n",
" 293 北海道 ... 新規\n",
" 294 北海道 ... 新規\n",
" \n",
" [295 rows x 9 columns]]"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ebxrOgC-b3Cx"
},
"source": [
"### データを整形して、正規化する"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 133
},
"id": "j1l39r1lZ3B_",
"outputId": "325625fd-934a-40ea-de71-8a44e651e2c7"
},
"source": [
"df = dataset[0]\n",
"df[4:5]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>酒類等製造免許の新規取得者名等一覧(平成26年分)</th>\n",
" <th>Unnamed: 1</th>\n",
" <th>Unnamed: 2</th>\n",
" <th>Unnamed: 3</th>\n",
" <th>Unnamed: 4</th>\n",
" <th>Unnamed: 5</th>\n",
" <th>Unnamed: 6</th>\n",
" <th>Unnamed: 7</th>\n",
" <th>Unnamed: 8</th>\n",
" <th>Unnamed: 9</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>都道府県名</td>\n",
" <td>税務署名</td>\n",
" <td>免許等年月日</td>\n",
" <td>申請等年月日</td>\n",
" <td>製造者氏名又は名称</td>\n",
" <td>製造場所在地</td>\n",
" <td>免許等区分</td>\n",
" <td>品目</td>\n",
" <td>処理区分</td>\n",
" <td>審査項目</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 酒類等製造免許の新規取得者名等一覧(平成26年分) Unnamed: 1 ... Unnamed: 8 Unnamed: 9\n",
"4 都道府県名 税務署名 ... 処理区分 審査項目\n",
"\n",
"[1 rows x 10 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 133
},
"id": "iv9ndLwGcoY9",
"outputId": "a8cca43e-9c61-4ac3-8f63-ec44baefff0e"
},
"source": [
"df = dataset[3]\n",
"df[4:5]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>酒類等製造免許の新規取得者名等一覧(平成29年分)</th>\n",
" <th>Unnamed: 1</th>\n",
" <th>Unnamed: 2</th>\n",
" <th>Unnamed: 3</th>\n",
" <th>Unnamed: 4</th>\n",
" <th>Unnamed: 5</th>\n",
" <th>Unnamed: 6</th>\n",
" <th>Unnamed: 7</th>\n",
" <th>Unnamed: 8</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>都道府県名</td>\n",
" <td>税務署名</td>\n",
" <td>免許等年月日</td>\n",
" <td>申請等年月日</td>\n",
" <td>製造者氏名又は名称</td>\n",
" <td>製造場所在地</td>\n",
" <td>免許等区分</td>\n",
" <td>品目</td>\n",
" <td>処理区分</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 酒類等製造免許の新規取得者名等一覧(平成29年分) Unnamed: 1 ... Unnamed: 7 Unnamed: 8\n",
"4 都道府県名 税務署名 ... 品目 処理区分\n",
"\n",
"[1 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 7
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "S0mqioPDdShO",
"outputId": "e264d5a2-d5d5-4bfe-ef29-0ec5be1b8dc4"
},
"source": [
"df.columns"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"Index(['酒類等製造免許の新規取得者名等一覧(平成29年分)', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3',\n",
" 'Unnamed: 4', 'Unnamed: 5', 'Unnamed: 6', 'Unnamed: 7', 'Unnamed: 8'],\n",
" dtype='object')"
]
},
"metadata": {
"tags": []
},
"execution_count": 8
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "uOE5zKlsdEVC",
"outputId": "ade1af27-9dbc-4d46-9ee8-bf55e731b596"
},
"source": [
"clean_dataset = []\n",
"for df in dataset:\n",
" if \"Unnamed: 9\" in df:\n",
" df = df.drop(columns=[\"Unnamed: 9\"])\n",
" df = df[5:]\n",
" df.columns = [\n",
" \"都道府県名\",\n",
" \"税務署名\",\n",
" \"免許等年月日\",\n",
" \"申請等年月日\",\n",
" \"製造者氏名又は名称\",\n",
" \"製造場所在地\",\n",
" \"免許等区分\",\n",
" \"品目\",\n",
" \"処理区分\"\n",
" ]\n",
" clean_dataset.append(df)\n",
"clean_dataset"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"[ 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
" 5 岩手 一関 2014-01-01 00:00:00 ... 酒類 単式蒸留しょうちゅう 新規\n",
" 6 愛知 名古屋中村 2014-01-08 00:00:00 ... 酒類 ビール 新規\n",
" 7 愛知 名古屋中村 2014-01-08 00:00:00 ... 酒類 発泡酒 新規\n",
" 8 福岡 八女 2014-01-08 00:00:00 ... 試験免許 その他の醸造酒 新規\n",
" 9 熊本 山鹿 2014-01-27 00:00:00 ... 酒類 単式蒸留しょうちゅう 法人成り等\n",
" .. ... ... ... ... ... ... ...\n",
" 223 岩手 花巻 2014-12-15 00:00:00 ... もろみ NaN 新規\n",
" 224 山梨 甲府 2014-12-17 00:00:00 ... もろみ NaN 新規\n",
" 225 大阪 堺 2014-12-19 00:00:00 ... 酒類 清酒 移転\n",
" 226 長野 伊那 2014-12-24 00:00:00 ... 試験免許 清酒 新規\n",
" 227 長野 伊那 2014-12-24 00:00:00 ... 試験免許 その他の醸造酒 新規\n",
" \n",
" [223 rows x 9 columns],\n",
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
" 5 北海道 函館 42005 ... 酒類 果実酒 法人成り等\n",
" 6 熊本 熊本東 42005 ... 酒類 清酒 法人成り等\n",
" 7 熊本 熊本東 42005 ... 酒類 単式蒸留しょうちゅう 法人成り等\n",
" 8 熊本 熊本東 42005 ... 酒類 リキュール 新規\n",
" 9 北海道 深川 42013 ... 酒類 果実酒 新規\n",
" .. ... ... ... ... ... ... ...\n",
" 261 島根 益田 2015-12-22 00:00:00 ... 酒類 果実酒 新規\n",
" 262 島根 浜田 2015-12-22 00:00:00 ... 酒類 発泡酒 新規\n",
" 263 北海道 函館 42360 ... 試験\\n免許 清酒 新規\n",
" 264 神奈川 横浜中 42362 ... 試験\\n免許 スピリッツ 新規\n",
" 265 東京 神田 42363 ... 酒類 その他の醸造酒 新規\n",
" \n",
" [261 rows x 9 columns],\n",
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
" 5 埼玉 川口 2016-01-01 00:00:00 ... 酒類 発泡酒 新規\n",
" 6 岡山 瀬戸 2016-01-01 00:00:00 ... 酒類 果実酒 法人成り等\n",
" 7 岡山 瀬戸 2016-01-01 00:00:00 ... 酒類 甘味果実酒 法人成り等\n",
" 8 岡山 瀬戸 2016-01-01 00:00:00 ... 酒類 ブランデー 法人成り等\n",
" 9 岡山 瀬戸 2016-01-01 00:00:00 ... 酒類 スピリッツ 法人成り等\n",
" .. ... ... ... ... ... ... ...\n",
" 220 長野 飯田 42724 ... 酒類 果実酒 新規\n",
" 221 徳島 徳島 42724 ... 酒母 NaN 新規\n",
" 222 和歌山 和歌山 42726 ... 酒類 発泡酒 移転\n",
" 223 大阪 富田林 42726 ... 試験免許 果実酒 新規\n",
" 224 兵庫 明石 42730 ... 酒類 発泡酒 新規\n",
" \n",
" [220 rows x 9 columns],\n",
" 都道府県名 税務署名 免許等年月日 申請等年月日 ... 製造場所在地 免許等区分 品目 処理区分\n",
" 5 三重 上野 42736 42551 ... 名張市緑が丘東182番地 酒類 その他の醸造酒 新規\n",
" 6 長崎 島原 42736 42641 ... 南島原市布津町丙271番地4 酒類 その他の醸造酒 新規\n",
" 7 岩手 一関 42741 42580 ... 西磐井郡平泉町長島字砂子沢172番地6 酒類 果実酒 新規\n",
" 8 高知 須崎 42747 42620 ... 高岡郡檮原町松原400番地1 酒類 その他の醸造酒 新規\n",
" 9 京都 下京 42751 42657 ... 京都市南区西九条高畠町25番地1 酒類 ビール 新規\n",
" .. ... ... ... ... ... ... ... ... ...\n",
" 231 佐賀 伊万里 43089 43041 ... 西松浦郡有田町戸矢乙340番地28 試験免許 発泡酒 新規\n",
" 232 香川 丸亀 43090 42846 ... 丸亀市北平山町二丁目21番地41 酒類 発泡酒 新規\n",
" 233 滋賀 大津 43095 43034 ... 大津市瀬田大江町1番地5 龍谷大学瀬田学舎9号館 試験免許 清酒 新規\n",
" 234 滋賀 大津 43095 43034 ... 大津市瀬田大江町1番地5 龍谷大学瀬田学舎9号館 試験免許 ビール 新規\n",
" 235 滋賀 大津 43095 43034 ... 大津市瀬田大江町1番地5 龍谷大学瀬田学舎9号館 試験免許 果実酒 新規\n",
" \n",
" [231 rows x 9 columns],\n",
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
" 5 鳥取 倉吉 平成30年1月1日 ... 酒類 果実酒 法人成り等\n",
" 6 鳥取 倉吉 平成30年1月1日 ... 酒類 甘味果実酒 法人成り等\n",
" 7 埼玉 朝霞 平成30年1月11日 ... 酒類 発泡酒 新規\n",
" 8 静岡 浜松西 平成30年1月11日 ... 酒類 発泡酒 新規\n",
" 9 静岡 掛川 平成30年1月11日 ... 酒類 発泡酒 新規\n",
" .. ... ... ... ... ... ... ...\n",
" 289 長野 信濃中野 平成30年12月20日 ... 酒類 その他の醸造酒 新規\n",
" 290 京都 宮津 平成30年12月20日 ... 酒類 その他の醸造酒 新規\n",
" 291 栃木 大田原 平成30年12月21日 ... 酒類 果実酒 新規\n",
" 292 鹿児島 種子島 平成30年12月21日 ... もろみ NaN 新規\n",
" 293 福島 相馬 平成30年12月25日 ... 酒類 その他の醸造酒 移転\n",
" \n",
" [289 rows x 9 columns],\n",
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
" 5 熊本 人吉 平成31年1月1日 ... 酒類 単式蒸留焼酎 法人成り等\n",
" 6 東京 麹町 平成31年1月1日 ... 酒類 ビール 法人成り等\n",
" 7 長野 上田 平成31年1月1日 ... 酒類 ウイスキー 新規\n",
" 8 長野 佐久 平成31年1月1日 ... 酒類 発泡酒 新規\n",
" 9 東京 麻布 平成31年1月1日 ... 酒類 発泡酒 新規\n",
" .. ... ... ... ... ... ... ...\n",
" 321 熊本 熊本東 令和元年12月19日 ... 酒類 スピリッツ 新規\n",
" 322 新潟 新津 令和元年12月20日 ... 試験免許 清酒 新規\n",
" 323 東京 蒲田 令和元年12月20日 ... 酒類 リキュール 新規\n",
" 324 岐阜 関 令和元年12月20日 ... 酒類 リキュール 新規\n",
" 325 山形 山形 令和元年12月23日 ... 酒類 発泡酒 新規\n",
" \n",
" [321 rows x 9 columns],\n",
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
" 5 北海道 札幌中 令和2年1月1日 ... 酒類 発泡酒 法人成り等\n",
" 6 北海道 札幌中 令和2年1月1日 ... 酒類 ビール 法人成り等\n",
" 7 青森 五所川原 令和2年1月1日 ... 酒類 清酒 移転\n",
" 8 青森 五所川原 令和2年1月1日 ... 酒類 雑酒 移転\n",
" 9 青森 五所川原 令和2年1月1日 ... 酒類 スピリッツ 移転\n",
" .. ... ... ... ... ... ... ...\n",
" 290 新潟 十日町 令和2年8月31日 ... 酒類 発泡酒 新規\n",
" 291 東京 神田 令和2年8月31日 ... 酒類 発泡酒 新規\n",
" 292 山梨 甲府 令和2年8月31日 ... 酒類 単式蒸留焼酎 新規\n",
" 293 北海道 札幌南 令和2年8月31日 ... 酒類 スピリッツ 新規\n",
" 294 北海道 札幌南 令和2年8月31日 ... 酒類 リキュール 新規\n",
" \n",
" [290 rows x 9 columns]]"
]
},
"metadata": {
"tags": []
},
"execution_count": 9
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 962
},
"id": "EF_lMI0lephj",
"outputId": "50275461-5395-4783-f084-45b6132e00ac"
},
"source": [
"df_all = pd.concat(clean_dataset, ignore_index=True)\n",
"df_all"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>都道府県名</th>\n",
" <th>税務署名</th>\n",
" <th>免許等年月日</th>\n",
" <th>申請等年月日</th>\n",
" <th>製造者氏名又は名称</th>\n",
" <th>製造場所在地</th>\n",
" <th>免許等区分</th>\n",
" <th>品目</th>\n",
" <th>処理区分</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>岩手</td>\n",
" <td>一関</td>\n",
" <td>2014-01-01 00:00:00</td>\n",
" <td>2013-08-26 00:00:00</td>\n",
" <td>岩手銘醸株式会社</td>\n",
" <td>一関市千厩町千厩字北方134番地</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留しょうちゅう</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>愛知</td>\n",
" <td>名古屋中村</td>\n",
" <td>2014-01-08 00:00:00</td>\n",
" <td>2013-06-21 00:00:00</td>\n",
" <td>株式会社ワイマーケット\\nY.MARKET BREWING</td>\n",
" <td>名古屋市中村区名駅四丁目1705番地</td>\n",
" <td>酒類</td>\n",
" <td>ビール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>愛知</td>\n",
" <td>名古屋中村</td>\n",
" <td>2014-01-08 00:00:00</td>\n",
" <td>2013-06-21 00:00:00</td>\n",
" <td>株式会社ワイマーケット\\nY.MARKET BREWING</td>\n",
" <td>名古屋市中村区名駅四丁目1705番地</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>福岡</td>\n",
" <td>八女</td>\n",
" <td>2014-01-08 00:00:00</td>\n",
" <td>2013-12-06 00:00:00</td>\n",
" <td>西吉田酒造株式会社</td>\n",
" <td>筑後市大字和泉612番地</td>\n",
" <td>試験免許</td>\n",
" <td>その他の醸造酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>熊本</td>\n",
" <td>山鹿</td>\n",
" <td>2014-01-27 00:00:00</td>\n",
" <td>2013-09-12 00:00:00</td>\n",
" <td>株式会社VinEx山鹿\\nVinEx山鹿</td>\n",
" <td>山鹿市鹿央町合里980番地1</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留しょうちゅう</td>\n",
" <td>法人成り等</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1830</th>\n",
" <td>新潟</td>\n",
" <td>十日町</td>\n",
" <td>令和2年8月31日</td>\n",
" <td>令和2年2月25日</td>\n",
" <td>法人番号3110001034090\\n株式会社 醸燻酒類研究所</td>\n",
" <td>十日町市本町5丁目55番地8</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1831</th>\n",
" <td>東京</td>\n",
" <td>神田</td>\n",
" <td>令和2年8月31日</td>\n",
" <td>令和元年12月16日</td>\n",
" <td>法人番号7010401007149\\n株式会社魚金</td>\n",
" <td>千代田区神田錦町2丁目2番1号 KANDA SQUARE1階店舗9区画</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1832</th>\n",
" <td>山梨</td>\n",
" <td>甲府</td>\n",
" <td>令和2年8月31日</td>\n",
" <td>令和2年6月26日</td>\n",
" <td>法人番号1090001011219\\n山梨銘醸株式会社</td>\n",
" <td>北杜市白州町台ヶ原2283番地</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留焼酎</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1833</th>\n",
" <td>北海道</td>\n",
" <td>札幌南</td>\n",
" <td>令和2年8月31日</td>\n",
" <td>令和2年4月9日</td>\n",
" <td>法人番号9430001021794\\n北海道コカ・コーラボトリング株式会社\\n北海道コカ・コ...</td>\n",
" <td>札幌市清田区清田一条一丁目2番1号</td>\n",
" <td>酒類</td>\n",
" <td>スピリッツ</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1834</th>\n",
" <td>北海道</td>\n",
" <td>札幌南</td>\n",
" <td>令和2年8月31日</td>\n",
" <td>令和2年4月9日</td>\n",
" <td>法人番号9430001021794\\n北海道コカ・コーラボトリング株式会社\\n北海道コカ・コ...</td>\n",
" <td>札幌市清田区清田一条一丁目2番1号</td>\n",
" <td>酒類</td>\n",
" <td>リキュール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1835 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
"0 岩手 一関 2014-01-01 00:00:00 ... 酒類 単式蒸留しょうちゅう 新規\n",
"1 愛知 名古屋中村 2014-01-08 00:00:00 ... 酒類 ビール 新規\n",
"2 愛知 名古屋中村 2014-01-08 00:00:00 ... 酒類 発泡酒 新規\n",
"3 福岡 八女 2014-01-08 00:00:00 ... 試験免許 その他の醸造酒 新規\n",
"4 熊本 山鹿 2014-01-27 00:00:00 ... 酒類 単式蒸留しょうちゅう 法人成り等\n",
"... ... ... ... ... ... ... ...\n",
"1830 新潟 十日町 令和2年8月31日 ... 酒類 発泡酒 新規\n",
"1831 東京 神田 令和2年8月31日 ... 酒類 発泡酒 新規\n",
"1832 山梨 甲府 令和2年8月31日 ... 酒類 単式蒸留焼酎 新規\n",
"1833 北海道 札幌南 令和2年8月31日 ... 酒類 スピリッツ 新規\n",
"1834 北海道 札幌南 令和2年8月31日 ... 酒類 リキュール 新規\n",
"\n",
"[1835 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 10
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 962
},
"id": "nfjrsfvgjjy6",
"outputId": "bc01d501-2da3-40b3-e7a7-d5b2fb6fc43b"
},
"source": [
"import datetime\n",
"\n",
"def parse_date(d):\n",
" if type(d) is int:\n",
" if d < 60:\n",
" # 1900-03-01より前の場合\n",
" days = d - 1\n",
" else:\n",
" # 1900-03-01以降の場合\n",
" days = d - 2\n",
" return pd.Timestamp(pd.to_datetime('1900/01/01') + datetime.timedelta(days=days))\n",
" elif type(d) is datetime.datetime:\n",
" return pd.Timestamp(d)\n",
" elif type(d) is str:\n",
" d = d.strip()\n",
" year, others = d.split(\"年\")\n",
" month, day = others.split(\"月\")\n",
" day = day.replace(\"日\", \"\")\n",
" if year.startswith(\"平成元\"):\n",
" year = 1989\n",
" elif year.startswith(\"平成\"):\n",
" year = int(year[2:]) + 1988\n",
" elif year.startswith(\"令和元\"):\n",
" year = 2019 \n",
" elif year.startswith(\"令和\"):\n",
" year = int(year[2:]) + 2018\n",
" else:\n",
" raise Exception(\"Invalid year format\")\n",
" return pd.Timestamp(year, int(month), int(day))\n",
" else:\n",
" raise Exception(\"Invalid data format\")\n",
"\n",
"df_clean = df_all.copy()\n",
"df_clean = df_clean.dropna()\n",
"df_clean[\"免許等年月日\"] = df_clean[\"免許等年月日\"].apply(parse_date)\n",
"df_clean[\"申請等年月日\"] = df_clean[\"申請等年月日\"].apply(parse_date)\n",
"df_clean"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>都道府県名</th>\n",
" <th>税務署名</th>\n",
" <th>免許等年月日</th>\n",
" <th>申請等年月日</th>\n",
" <th>製造者氏名又は名称</th>\n",
" <th>製造場所在地</th>\n",
" <th>免許等区分</th>\n",
" <th>品目</th>\n",
" <th>処理区分</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>岩手</td>\n",
" <td>一関</td>\n",
" <td>2014-01-01</td>\n",
" <td>2013-08-26</td>\n",
" <td>岩手銘醸株式会社</td>\n",
" <td>一関市千厩町千厩字北方134番地</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留しょうちゅう</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>愛知</td>\n",
" <td>名古屋中村</td>\n",
" <td>2014-01-08</td>\n",
" <td>2013-06-21</td>\n",
" <td>株式会社ワイマーケット\\nY.MARKET BREWING</td>\n",
" <td>名古屋市中村区名駅四丁目1705番地</td>\n",
" <td>酒類</td>\n",
" <td>ビール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>愛知</td>\n",
" <td>名古屋中村</td>\n",
" <td>2014-01-08</td>\n",
" <td>2013-06-21</td>\n",
" <td>株式会社ワイマーケット\\nY.MARKET BREWING</td>\n",
" <td>名古屋市中村区名駅四丁目1705番地</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>福岡</td>\n",
" <td>八女</td>\n",
" <td>2014-01-08</td>\n",
" <td>2013-12-06</td>\n",
" <td>西吉田酒造株式会社</td>\n",
" <td>筑後市大字和泉612番地</td>\n",
" <td>試験免許</td>\n",
" <td>その他の醸造酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>熊本</td>\n",
" <td>山鹿</td>\n",
" <td>2014-01-27</td>\n",
" <td>2013-09-12</td>\n",
" <td>株式会社VinEx山鹿\\nVinEx山鹿</td>\n",
" <td>山鹿市鹿央町合里980番地1</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留しょうちゅう</td>\n",
" <td>法人成り等</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1830</th>\n",
" <td>新潟</td>\n",
" <td>十日町</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-02-25</td>\n",
" <td>法人番号3110001034090\\n株式会社 醸燻酒類研究所</td>\n",
" <td>十日町市本町5丁目55番地8</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1831</th>\n",
" <td>東京</td>\n",
" <td>神田</td>\n",
" <td>2020-08-31</td>\n",
" <td>2019-12-16</td>\n",
" <td>法人番号7010401007149\\n株式会社魚金</td>\n",
" <td>千代田区神田錦町2丁目2番1号 KANDA SQUARE1階店舗9区画</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1832</th>\n",
" <td>山梨</td>\n",
" <td>甲府</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-06-26</td>\n",
" <td>法人番号1090001011219\\n山梨銘醸株式会社</td>\n",
" <td>北杜市白州町台ヶ原2283番地</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留焼酎</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1833</th>\n",
" <td>北海道</td>\n",
" <td>札幌南</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-04-09</td>\n",
" <td>法人番号9430001021794\\n北海道コカ・コーラボトリング株式会社\\n北海道コカ・コ...</td>\n",
" <td>札幌市清田区清田一条一丁目2番1号</td>\n",
" <td>酒類</td>\n",
" <td>スピリッツ</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1834</th>\n",
" <td>北海道</td>\n",
" <td>札幌南</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-04-09</td>\n",
" <td>法人番号9430001021794\\n北海道コカ・コーラボトリング株式会社\\n北海道コカ・コ...</td>\n",
" <td>札幌市清田区清田一条一丁目2番1号</td>\n",
" <td>酒類</td>\n",
" <td>リキュール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1722 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
"0 岩手 一関 2014-01-01 ... 酒類 単式蒸留しょうちゅう 新規\n",
"1 愛知 名古屋中村 2014-01-08 ... 酒類 ビール 新規\n",
"2 愛知 名古屋中村 2014-01-08 ... 酒類 発泡酒 新規\n",
"3 福岡 八女 2014-01-08 ... 試験免許 その他の醸造酒 新規\n",
"4 熊本 山鹿 2014-01-27 ... 酒類 単式蒸留しょうちゅう 法人成り等\n",
"... ... ... ... ... ... ... ...\n",
"1830 新潟 十日町 2020-08-31 ... 酒類 発泡酒 新規\n",
"1831 東京 神田 2020-08-31 ... 酒類 発泡酒 新規\n",
"1832 山梨 甲府 2020-08-31 ... 酒類 単式蒸留焼酎 新規\n",
"1833 北海道 札幌南 2020-08-31 ... 酒類 スピリッツ 新規\n",
"1834 北海道 札幌南 2020-08-31 ... 酒類 リキュール 新規\n",
"\n",
"[1722 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 11
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "wjx9MTAZXlHU",
"outputId": "ae1dc7ba-0e4c-4b1b-ea47-6fa222f2184d"
},
"source": [
"df_clean[\"都道府県名\"].unique()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['岩手', '愛知', '福岡', '熊本', '愛媛', '兵庫', '山梨', '東京', '栃木', '京都', '鳥取',\n",
" '宮崎', '新潟', '長崎', '和歌山', '大阪', '山口', '徳島', '三重', '鹿児島', '山形', '青森',\n",
" '岐阜', '神奈川', '静岡', '沖縄', '茨城', '北海道', '埼玉', '奈良', '長野', '広島', '千葉',\n",
" '秋田', '群馬', '福島', '香川', '福井', '高知', '宮城', '石川', '富山', '滋賀', '大分',\n",
" '岡山', '島根', '佐賀', '札幌', '宮﨑'], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 12
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "6PDTEfUKknAm",
"outputId": "b9b80bf3-894f-422d-c4f4-e46cc3a17ea3"
},
"source": [
"df_clean[\"免許等区分\"].unique()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['酒類', '試験免許', '試験\\n免許', ' 試験免許', '試験免許 '], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 13
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "R0vkv_-XmV9h",
"outputId": "1c333304-5d91-415b-de81-bbddd0aa0d74"
},
"source": [
"df_clean[\"品目\"].unique()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['単式蒸留しょうちゅう', 'ビール', '発泡酒', 'その他の醸造酒', 'リキュール', 'みりん', '果実酒',\n",
" 'スピリッツ', '清酒', '合成清酒', '連続式蒸留しょうちゅう', '粉末酒', '原料用アルコール', '雑酒',\n",
" '甘味果実酒', 'ブランデー', 'ウイスキー', '連続式蒸留焼酎', '単式蒸留焼酎', 'スピリッツ・リキュール',\n",
" 'その他の醸\\n造酒', '原料用アル\\nコール', '単式蒸留\\n焼酎'], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 14
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 168
},
"id": "0uCidOpaoFoF",
"outputId": "7a976fc2-1bd6-4248-857f-f13d31d86b58"
},
"source": [
"df_clean[df_clean[\"品目\"] == \"スピリッツ・リキュール\"]"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>都道府県名</th>\n",
" <th>税務署名</th>\n",
" <th>免許等年月日</th>\n",
" <th>申請等年月日</th>\n",
" <th>製造者氏名又は名称</th>\n",
" <th>製造場所在地</th>\n",
" <th>免許等区分</th>\n",
" <th>品目</th>\n",
" <th>処理区分</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1634</th>\n",
" <td>滋賀</td>\n",
" <td>草津</td>\n",
" <td>2020-04-01</td>\n",
" <td>2019-12-10</td>\n",
" <td>法人番号4011001007970\\n株式会社コカ・コーラ東京研究開発センター\\n株式会社コ...</td>\n",
" <td>守山市阿村町49番地</td>\n",
" <td>試験免許</td>\n",
" <td>スピリッツ・リキュール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 都道府県名 税務署名 免許等年月日 申請等年月日 ... 製造場所在地 免許等区分 品目 処理区分\n",
"1634 滋賀 草津 2020-04-01 2019-12-10 ... 守山市阿村町49番地 試験免許 スピリッツ・リキュール 新規\n",
"\n",
"[1 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 15
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "tlvuBS_9mflw",
"outputId": "1da1537e-70d4-4a45-b91d-aa589f5aa16c"
},
"source": [
"df_clean[\"処理区分\"].unique()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['新規', '法人成り等', '相続', '移転', '条件解除', '条件緩和', '新規\\u3000'],\n",
" dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 16
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 945
},
"id": "eHOtRZ3AnKJQ",
"outputId": "2dff69e7-dc32-4429-e254-f03df5b14d35"
},
"source": [
"def cleanse_string(d):\n",
" d = d.replace(\"\\n\", \"\")\n",
" d = d.strip() \n",
" return d\n",
"\n",
"def normalize_category(d):\n",
" d = d.replace(\"しょうちゅう\", \"焼酎\")\n",
" d = d.replace(\"スピリッツ・リキュール\", \"スピリッツ\")\n",
" return d\n",
"\n",
"def normalize_prefecture(d):\n",
" if d == \"札幌\":\n",
" return \"北海道\"\n",
" elif d == \"宮﨑\":\n",
" return \"宮崎\"\n",
" else:\n",
" return d\n",
"\n",
"df_normalize = df_clean.copy()\n",
"df_normalize[\"免許等区分\"] = df_normalize[\"免許等区分\"].apply(cleanse_string)\n",
"df_normalize[\"品目\"] = df_normalize[\"品目\"].apply(normalize_category)\n",
"df_normalize[\"処理区分\"] = df_normalize[\"処理区分\"].apply(cleanse_string)\n",
"df_normalize[\"都道府県名\"] = df_normalize[\"都道府県名\"].apply(cleanse_string).apply(normalize_prefecture)\n",
"df_normalize"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>都道府県名</th>\n",
" <th>税務署名</th>\n",
" <th>免許等年月日</th>\n",
" <th>申請等年月日</th>\n",
" <th>製造者氏名又は名称</th>\n",
" <th>製造場所在地</th>\n",
" <th>免許等区分</th>\n",
" <th>品目</th>\n",
" <th>処理区分</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>岩手</td>\n",
" <td>一関</td>\n",
" <td>2014-01-01</td>\n",
" <td>2013-08-26</td>\n",
" <td>岩手銘醸株式会社</td>\n",
" <td>一関市千厩町千厩字北方134番地</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留焼酎</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>愛知</td>\n",
" <td>名古屋中村</td>\n",
" <td>2014-01-08</td>\n",
" <td>2013-06-21</td>\n",
" <td>株式会社ワイマーケット\\nY.MARKET BREWING</td>\n",
" <td>名古屋市中村区名駅四丁目1705番地</td>\n",
" <td>酒類</td>\n",
" <td>ビール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>愛知</td>\n",
" <td>名古屋中村</td>\n",
" <td>2014-01-08</td>\n",
" <td>2013-06-21</td>\n",
" <td>株式会社ワイマーケット\\nY.MARKET BREWING</td>\n",
" <td>名古屋市中村区名駅四丁目1705番地</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>福岡</td>\n",
" <td>八女</td>\n",
" <td>2014-01-08</td>\n",
" <td>2013-12-06</td>\n",
" <td>西吉田酒造株式会社</td>\n",
" <td>筑後市大字和泉612番地</td>\n",
" <td>試験免許</td>\n",
" <td>その他の醸造酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>熊本</td>\n",
" <td>山鹿</td>\n",
" <td>2014-01-27</td>\n",
" <td>2013-09-12</td>\n",
" <td>株式会社VinEx山鹿\\nVinEx山鹿</td>\n",
" <td>山鹿市鹿央町合里980番地1</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留焼酎</td>\n",
" <td>法人成り等</td>\n",
" </tr>\n",
" <tr>\n",
" <th>...</th>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" <td>...</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1830</th>\n",
" <td>新潟</td>\n",
" <td>十日町</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-02-25</td>\n",
" <td>法人番号3110001034090\\n株式会社 醸燻酒類研究所</td>\n",
" <td>十日町市本町5丁目55番地8</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1831</th>\n",
" <td>東京</td>\n",
" <td>神田</td>\n",
" <td>2020-08-31</td>\n",
" <td>2019-12-16</td>\n",
" <td>法人番号7010401007149\\n株式会社魚金</td>\n",
" <td>千代田区神田錦町2丁目2番1号 KANDA SQUARE1階店舗9区画</td>\n",
" <td>酒類</td>\n",
" <td>発泡酒</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1832</th>\n",
" <td>山梨</td>\n",
" <td>甲府</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-06-26</td>\n",
" <td>法人番号1090001011219\\n山梨銘醸株式会社</td>\n",
" <td>北杜市白州町台ヶ原2283番地</td>\n",
" <td>酒類</td>\n",
" <td>単式蒸留焼酎</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1833</th>\n",
" <td>北海道</td>\n",
" <td>札幌南</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-04-09</td>\n",
" <td>法人番号9430001021794\\n北海道コカ・コーラボトリング株式会社\\n北海道コカ・コ...</td>\n",
" <td>札幌市清田区清田一条一丁目2番1号</td>\n",
" <td>酒類</td>\n",
" <td>スピリッツ</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1834</th>\n",
" <td>北海道</td>\n",
" <td>札幌南</td>\n",
" <td>2020-08-31</td>\n",
" <td>2020-04-09</td>\n",
" <td>法人番号9430001021794\\n北海道コカ・コーラボトリング株式会社\\n北海道コカ・コ...</td>\n",
" <td>札幌市清田区清田一条一丁目2番1号</td>\n",
" <td>酒類</td>\n",
" <td>リキュール</td>\n",
" <td>新規</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>1722 rows × 9 columns</p>\n",
"</div>"
],
"text/plain": [
" 都道府県名 税務署名 免許等年月日 ... 免許等区分 品目 処理区分\n",
"0 岩手 一関 2014-01-01 ... 酒類 単式蒸留焼酎 新規\n",
"1 愛知 名古屋中村 2014-01-08 ... 酒類 ビール 新規\n",
"2 愛知 名古屋中村 2014-01-08 ... 酒類 発泡酒 新規\n",
"3 福岡 八女 2014-01-08 ... 試験免許 その他の醸造酒 新規\n",
"4 熊本 山鹿 2014-01-27 ... 酒類 単式蒸留焼酎 法人成り等\n",
"... ... ... ... ... ... ... ...\n",
"1830 新潟 十日町 2020-08-31 ... 酒類 発泡酒 新規\n",
"1831 東京 神田 2020-08-31 ... 酒類 発泡酒 新規\n",
"1832 山梨 甲府 2020-08-31 ... 酒類 単式蒸留焼酎 新規\n",
"1833 北海道 札幌南 2020-08-31 ... 酒類 スピリッツ 新規\n",
"1834 北海道 札幌南 2020-08-31 ... 酒類 リキュール 新規\n",
"\n",
"[1722 rows x 9 columns]"
]
},
"metadata": {
"tags": []
},
"execution_count": 22
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/"
},
"id": "3HdE6Q2KsVM9",
"outputId": "511005b3-95ee-4b17-8a57-2b757106d852"
},
"source": [
"df_normalize[\"都道府県名\"].unique()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array(['岩手', '愛知', '福岡', '熊本', '愛媛', '兵庫', '山梨', '東京', '栃木', '京都', '鳥取',\n",
" '宮崎', '新潟', '長崎', '和歌山', '大阪', '山口', '徳島', '三重', '鹿児島', '山形', '青森',\n",
" '岐阜', '神奈川', '静岡', '沖縄', '茨城', '北海道', '埼玉', '奈良', '長野', '広島', '千葉',\n",
" '秋田', '群馬', '福島', '香川', '福井', '高知', '宮城', '石川', '富山', '滋賀', '大分',\n",
" '岡山', '島根', '佐賀'], dtype=object)"
]
},
"metadata": {
"tags": []
},
"execution_count": 25
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "oCw5T5rYp2OP"
},
"source": [
"### 様々な角度でデータを眺める\n",
"\n",
"酒造免許取得社数\n",
"\n",
"* 観点\n",
" * 品目別\n",
" * 処理区分別\n",
" * 免許区分別\n",
" * 比較\n",
"* 分析\n",
" * 新規取得が多い品目はどれか\n",
" * 新規取得が多い年は何年か\n",
" * 申請が多い月は何月か\n",
" * 申請にどれぐらいかかっているのか\n",
" * どのエリアが取得社数が多いか"
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 542
},
"id": "htriwueiWpW8",
"outputId": "4e57ec23-4c24-40b0-9aba-062cec8ea894"
},
"source": [
"import plotly.express as px\n",
"\n",
"df_license = df_normalize.copy()\n",
"df_license = df_license.groupby([\"都道府県名\"]).count()\n",
"df_license = df_license[[\"免許等年月日\"]].rename(columns={\"免許等年月日\": \"件数\"})\n",
"df_license = df_license.sort_values(\"件数\", ascending=False)\n",
"df_license = df_license.reset_index()\n",
"px.bar(df_license, x=\"都道府県名\", y=\"件数\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<html>\n",
"<head><meta charset=\"utf-8\" /></head>\n",
"<body>\n",
" <div>\n",
" <script src=\"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_SVG\"></script><script type=\"text/javascript\">if (window.MathJax) {MathJax.Hub.Config({SVG: {font: \"STIX-Web\"}});}</script>\n",
" <script type=\"text/javascript\">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>\n",
" <script src=\"https://cdn.plot.ly/plotly-latest.min.js\"></script> \n",
" <div id=\"8b4c9303-e3cf-4c0e-ab0d-bd83cee2ad7a\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\"></div>\n",
" <script type=\"text/javascript\">\n",
" \n",
" window.PLOTLYENV=window.PLOTLYENV || {};\n",
" \n",
" if (document.getElementById(\"8b4c9303-e3cf-4c0e-ab0d-bd83cee2ad7a\")) {\n",
" Plotly.newPlot(\n",
" '8b4c9303-e3cf-4c0e-ab0d-bd83cee2ad7a',\n",
" [{\"alignmentgroup\": \"True\", \"hoverlabel\": {\"namelength\": 0}, \"hovertemplate\": \"\\u90fd\\u9053\\u5e9c\\u770c\\u540d=%{x}<br>\\u4ef6\\u6570=%{y}\", \"legendgroup\": \"\", \"marker\": {\"color\": \"#636efa\"}, \"name\": \"\", \"offsetgroup\": \"\", \"orientation\": \"v\", \"showlegend\": false, \"textposition\": \"auto\", \"type\": \"bar\", \"x\": [\"\\u6771\\u4eac\", \"\\u9577\\u91ce\", \"\\u795e\\u5948\\u5ddd\", \"\\u5927\\u962a\", \"\\u4eac\\u90fd\", \"\\u5317\\u6d77\\u9053\", \"\\u65b0\\u6f5f\", \"\\u798f\\u5ca1\", \"\\u6c96\\u7e04\", \"\\u9e7f\\u5150\\u5cf6\", \"\\u9759\\u5ca1\", \"\\u5175\\u5eab\", \"\\u5343\\u8449\", \"\\u798f\\u5cf6\", \"\\u5c71\\u68a8\", \"\\u5c90\\u961c\", \"\\u611b\\u77e5\", \"\\u5ca9\\u624b\", \"\\u9752\\u68ee\", \"\\u5bae\\u57ce\", \"\\u57fc\\u7389\", \"\\u5c71\\u5f62\", \"\\u8328\\u57ce\", \"\\u5e83\\u5cf6\", \"\\u5bae\\u5d0e\", \"\\u718a\\u672c\", \"\\u548c\\u6b4c\\u5c71\", \"\\u9ce5\\u53d6\", \"\\u9ad8\\u77e5\", \"\\u5fb3\\u5cf6\", \"\\u4e09\\u91cd\", \"\\u6803\\u6728\", \"\\u77f3\\u5ddd\", \"\\u5c71\\u53e3\", \"\\u5cf6\\u6839\", \"\\u5948\\u826f\", \"\\u5ca1\\u5c71\", \"\\u79cb\\u7530\", \"\\u6ecb\\u8cc0\", \"\\u9577\\u5d0e\", \"\\u9999\\u5ddd\", \"\\u7fa4\\u99ac\", \"\\u5bcc\\u5c71\", \"\\u5927\\u5206\", \"\\u798f\\u4e95\", \"\\u611b\\u5a9b\", \"\\u4f50\\u8cc0\"], \"xaxis\": \"x\", \"y\": [133, 95, 83, 83, 76, 70, 61, 55, 54, 50, 49, 47, 42, 42, 41, 40, 40, 39, 38, 36, 36, 36, 34, 34, 32, 29, 27, 25, 24, 23, 22, 22, 21, 20, 19, 19, 18, 14, 12, 12, 12, 11, 11, 11, 10, 10, 4], \"yaxis\": \"y\"}],\n",
" {\"barmode\": \"relative\", \"legend\": {\"tracegroupgap\": 0}, \"margin\": {\"t\": 60}, \"template\": {\"data\": {\"bar\": [{\"error_x\": {\"color\": \"#2a3f5f\"}, \"error_y\": {\"color\": \"#2a3f5f\"}, \"marker\": {\"line\": {\"color\": \"#E5ECF6\", \"width\": 0.5}}, \"type\": \"bar\"}], \"barpolar\": [{\"marker\": {\"line\": {\"color\": \"#E5ECF6\", \"width\": 0.5}}, \"type\": \"barpolar\"}], \"carpet\": [{\"aaxis\": {\"endlinecolor\": \"#2a3f5f\", \"gridcolor\": \"white\", \"linecolor\": \"white\", \"minorgridcolor\": \"white\", \"startlinecolor\": \"#2a3f5f\"}, \"baxis\": {\"endlinecolor\": \"#2a3f5f\", \"gridcolor\": \"white\", \"linecolor\": \"white\", \"minorgridcolor\": \"white\", \"startlinecolor\": \"#2a3f5f\"}, \"type\": \"carpet\"}], \"choropleth\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"type\": \"choropleth\"}], \"contour\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"contour\"}], \"contourcarpet\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"type\": \"contourcarpet\"}], \"heatmap\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"heatmap\"}], \"heatmapgl\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"heatmapgl\"}], \"histogram\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"histogram\"}], \"histogram2d\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"histogram2d\"}], \"histogram2dcontour\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"histogram2dcontour\"}], \"mesh3d\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"type\": \"mesh3d\"}], \"parcoords\": [{\"line\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"parcoords\"}], \"pie\": [{\"automargin\": true, \"type\": \"pie\"}], \"scatter\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatter\"}], \"scatter3d\": [{\"line\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatter3d\"}], \"scattercarpet\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattercarpet\"}], \"scattergeo\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattergeo\"}], \"scattergl\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattergl\"}], \"scattermapbox\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattermapbox\"}], \"scatterpolar\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatterpolar\"}], \"scatterpolargl\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatterpolargl\"}], \"scatterternary\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatterternary\"}], \"surface\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"surface\"}], \"table\": [{\"cells\": {\"fill\": {\"color\": \"#EBF0F8\"}, \"line\": {\"color\": \"white\"}}, \"header\": {\"fill\": {\"color\": \"#C8D4E3\"}, \"line\": {\"color\": \"white\"}}, \"type\": \"table\"}]}, \"layout\": {\"annotationdefaults\": {\"arrowcolor\": \"#2a3f5f\", \"arrowhead\": 0, \"arrowwidth\": 1}, \"coloraxis\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"colorscale\": {\"diverging\": [[0, \"#8e0152\"], [0.1, \"#c51b7d\"], [0.2, \"#de77ae\"], [0.3, \"#f1b6da\"], [0.4, \"#fde0ef\"], [0.5, \"#f7f7f7\"], [0.6, \"#e6f5d0\"], [0.7, \"#b8e186\"], [0.8, \"#7fbc41\"], [0.9, \"#4d9221\"], [1, \"#276419\"]], \"sequential\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"sequentialminus\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]]}, \"colorway\": [\"#636efa\", \"#EF553B\", \"#00cc96\", \"#ab63fa\", \"#FFA15A\", \"#19d3f3\", \"#FF6692\", \"#B6E880\", \"#FF97FF\", \"#FECB52\"], \"font\": {\"color\": \"#2a3f5f\"}, \"geo\": {\"bgcolor\": \"white\", \"lakecolor\": \"white\", \"landcolor\": \"#E5ECF6\", \"showlakes\": true, \"showland\": true, \"subunitcolor\": \"white\"}, \"hoverlabel\": {\"align\": \"left\"}, \"hovermode\": \"closest\", \"mapbox\": {\"style\": \"light\"}, \"paper_bgcolor\": \"white\", \"plot_bgcolor\": \"#E5ECF6\", \"polar\": {\"angularaxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}, \"bgcolor\": \"#E5ECF6\", \"radialaxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}}, \"scene\": {\"xaxis\": {\"backgroundcolor\": \"#E5ECF6\", \"gridcolor\": \"white\", \"gridwidth\": 2, \"linecolor\": \"white\", \"showbackground\": true, \"ticks\": \"\", \"zerolinecolor\": \"white\"}, \"yaxis\": {\"backgroundcolor\": \"#E5ECF6\", \"gridcolor\": \"white\", \"gridwidth\": 2, \"linecolor\": \"white\", \"showbackground\": true, \"ticks\": \"\", \"zerolinecolor\": \"white\"}, \"zaxis\": {\"backgroundcolor\": \"#E5ECF6\", \"gridcolor\": \"white\", \"gridwidth\": 2, \"linecolor\": \"white\", \"showbackground\": true, \"ticks\": \"\", \"zerolinecolor\": \"white\"}}, \"shapedefaults\": {\"line\": {\"color\": \"#2a3f5f\"}}, \"ternary\": {\"aaxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}, \"baxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}, \"bgcolor\": \"#E5ECF6\", \"caxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}}, \"title\": {\"x\": 0.05}, \"xaxis\": {\"automargin\": true, \"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\", \"title\": {\"standoff\": 15}, \"zerolinecolor\": \"white\", \"zerolinewidth\": 2}, \"yaxis\": {\"automargin\": true, \"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\", \"title\": {\"standoff\": 15}, \"zerolinecolor\": \"white\", \"zerolinewidth\": 2}}}, \"xaxis\": {\"anchor\": \"y\", \"domain\": [0.0, 1.0], \"title\": {\"text\": \"\\u90fd\\u9053\\u5e9c\\u770c\\u540d\"}}, \"yaxis\": {\"anchor\": \"x\", \"domain\": [0.0, 1.0], \"title\": {\"text\": \"\\u4ef6\\u6570\"}}},\n",
" {\"responsive\": true}\n",
" ).then(function(){\n",
" \n",
"var gd = document.getElementById('8b4c9303-e3cf-4c0e-ab0d-bd83cee2ad7a');\n",
"var x = new MutationObserver(function (mutations, observer) {{\n",
" var display = window.getComputedStyle(gd).display;\n",
" if (!display || display === 'none') {{\n",
" console.log([gd, 'removed!']);\n",
" Plotly.purge(gd);\n",
" observer.disconnect();\n",
" }}\n",
"}});\n",
"\n",
"// Listen for the removal of the full notebook cells\n",
"var notebookContainer = gd.closest('#notebook-container');\n",
"if (notebookContainer) {{\n",
" x.observe(notebookContainer, {childList: true});\n",
"}}\n",
"\n",
"// Listen for the clearing of the current output cell\n",
"var outputEl = gd.closest('.output');\n",
"if (outputEl) {{\n",
" x.observe(outputEl, {childList: true});\n",
"}}\n",
"\n",
" })\n",
" };\n",
" \n",
" </script>\n",
" </div>\n",
"</body>\n",
"</html>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 300
},
"id": "CxJj4d8GMXha",
"outputId": "f3e10c82-bac3-484c-8cbf-b9357bcb23fd"
},
"source": [
"df_license = df_normalize.copy()\n",
"df_license[\"取得日数\"] = (df_license[\"免許等年月日\"] - df_license[\"申請等年月日\"]).dt.days\n",
"df_license.describe()"
],
"execution_count": null,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>取得日数</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>1722.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>106.144599</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>271.564423</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>-10769.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>63.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>95.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>150.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>493.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" 取得日数\n",
"count 1722.000000\n",
"mean 106.144599\n",
"std 271.564423\n",
"min -10769.000000\n",
"25% 63.000000\n",
"50% 95.000000\n",
"75% 150.000000\n",
"max 493.000000"
]
},
"metadata": {
"tags": []
},
"execution_count": 32
}
]
},
{
"cell_type": "code",
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
"height": 542
},
"id": "ysr0ztMyPIkF",
"outputId": "1ed40862-d881-435f-9a4f-4fc3cc25d1c2"
},
"source": [
"import plotly.express as px\n",
"\n",
"px.box(df_license[\n",
" (df_license[\"取得日数\"] <= 365) & \n",
" (df_license[\"取得日数\"] > 0)\n",
"], y=\"取得日数\", points=\"all\")"
],
"execution_count": null,
"outputs": [
{
"output_type": "display_data",
"data": {
"text/html": [
"<html>\n",
"<head><meta charset=\"utf-8\" /></head>\n",
"<body>\n",
" <div>\n",
" <script src=\"https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.5/MathJax.js?config=TeX-AMS-MML_SVG\"></script><script type=\"text/javascript\">if (window.MathJax) {MathJax.Hub.Config({SVG: {font: \"STIX-Web\"}});}</script>\n",
" <script type=\"text/javascript\">window.PlotlyConfig = {MathJaxConfig: 'local'};</script>\n",
" <script src=\"https://cdn.plot.ly/plotly-latest.min.js\"></script> \n",
" <div id=\"f709506e-5f98-4ecf-9cbc-1d4306d7042a\" class=\"plotly-graph-div\" style=\"height:525px; width:100%;\"></div>\n",
" <script type=\"text/javascript\">\n",
" \n",
" window.PLOTLYENV=window.PLOTLYENV || {};\n",
" \n",
" if (document.getElementById(\"f709506e-5f98-4ecf-9cbc-1d4306d7042a\")) {\n",
" Plotly.newPlot(\n",
" 'f709506e-5f98-4ecf-9cbc-1d4306d7042a',\n",
" [{\"alignmentgroup\": \"True\", \"boxpoints\": \"all\", \"hoverlabel\": {\"namelength\": 0}, \"hovertemplate\": \"\\u53d6\\u5f97\\u65e5\\u6570=%{y}\", \"legendgroup\": \"\", \"marker\": {\"color\": \"#636efa\"}, \"name\": \"\", \"notched\": false, \"offsetgroup\": \"\", \"orientation\": \"v\", \"showlegend\": false, \"type\": \"box\", \"x0\": \" \", \"xaxis\": \"x\", \"y\": [128, 201, 201, 33, 137, 155, 72, 89, 154, 120, 177, 94, 39, 188, 65, 116, 98, 98, 67, 67, 55, 42, 48, 209, 196, 55, 166, 8, 167, 116, 7, 7, 7, 63, 63, 90, 90, 90, 90, 90, 90, 77, 40, 160, 157, 106, 79, 104, 51, 51, 30, 30, 66, 19, 19, 110, 179, 127, 36, 68, 175, 57, 174, 120, 42, 42, 42, 125, 232, 98, 41, 45, 45, 30, 30, 34, 176, 125, 125, 188, 31, 46, 26, 186, 79, 79, 79, 66, 57, 48, 140, 81, 80, 156, 142, 147, 66, 87, 162, 96, 96, 96, 96, 96, 96, 96, 121, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 76, 70, 58, 58, 66, 66, 66, 66, 66, 66, 66, 66, 58, 58, 208, 170, 15, 15, 15, 15, 15, 15, 60, 42, 37, 27, 125, 62, 91, 169, 71, 70, 70, 43, 176, 81, 120, 120, 142, 52, 158, 145, 56, 56, 56, 56, 56, 56, 56, 223, 115, 178, 75, 75, 93, 83, 83, 83, 157, 48, 151, 45, 45, 45, 45, 45, 45, 81, 122, 116, 87, 71, 153, 137, 58, 190, 142, 142, 190, 154, 159, 107, 224, 146, 52, 128, 56, 110, 56, 56, 76, 76, 130, 197, 161, 126, 121, 121, 121, 121, 121, 121, 110, 82, 62, 62, 54, 50, 40, 65, 54, 39, 155, 111, 121, 192, 199, 60, 81, 81, 58, 52, 144, 111, 63, 63, 63, 63, 63, 63, 118, 103, 108, 43, 92, 92, 182, 182, 182, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 67, 67, 67, 67, 211, 91, 59, 125, 47, 115, 21, 103, 152, 112, 133, 42, 42, 28, 63, 144, 147, 43, 53, 140, 76, 17, 124, 282, 38, 149, 126, 126, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 63, 7, 151, 34, 134, 92, 92, 70, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 111, 57, 57, 147, 241, 60, 45, 109, 109, 74, 153, 111, 62, 62, 62, 62, 63, 146, 182, 165, 97, 80, 23, 100, 265, 265, 160, 33, 33, 33, 178, 56, 186, 148, 148, 106, 71, 71, 186, 108, 207, 21, 128, 58, 64, 64, 193, 55, 165, 165, 271, 118, 281, 150, 111, 74, 35, 136, 136, 116, 153, 147, 126, 159, 301, 37, 266, 141, 56, 59, 219, 226, 130, 130, 130, 130, 130, 130, 70, 70, 70, 70, 70, 70, 70, 70, 70, 109, 128, 24, 153, 165, 83, 77, 54, 192, 181, 189, 147, 81, 149, 195, 69, 26, 21, 58, 58, 58, 58, 70, 77, 161, 113, 345, 88, 49, 204, 133, 100, 267, 126, 126, 126, 84, 84, 77, 64, 63, 63, 44, 74, 60, 125, 18, 18, 18, 18, 18, 18, 185, 115, 130, 63, 60, 21, 79, 361, 151, 71, 24, 133, 100, 72, 171, 118, 85, 76, 161, 16, 91, 142, 122, 94, 94, 297, 109, 92, 119, 28, 167, 138, 14, 118, 193, 193, 151, 143, 96, 56, 180, 111, 178, 131, 56, 50, 184, 104, 55, 93, 55, 250, 183, 80, 80, 80, 80, 80, 171, 143, 182, 92, 356, 49, 222, 36, 212, 187, 159, 120, 110, 197, 98, 64, 64, 64, 64, 64, 64, 86, 86, 86, 86, 86, 86, 86, 86, 86, 86, 81, 174, 76, 176, 162, 56, 118, 75, 75, 75, 75, 75, 75, 189, 161, 161, 85, 85, 34, 34, 34, 175, 211, 74, 74, 133, 131, 131, 123, 123, 102, 102, 72, 72, 153, 55, 166, 119, 64, 50, 50, 123, 123, 149, 70, 160, 185, 95, 161, 127, 94, 336, 55, 115, 115, 168, 8, 15, 145, 57, 57, 146, 61, 87, 73, 26, 26, 206, 148, 148, 148, 148, 148, 154, 159, 192, 245, 101, 59, 44, 131, 64, 64, 61, 46, 153, 153, 339, 138, 174, 161, 77, 124, 89, 250, 168, 168, 55, 170, 28, 57, 57, 57, 57, 57, 57, 57, 174, 106, 106, 52, 150, 185, 182, 154, 125, 100, 204, 72, 309, 133, 23, 23, 23, 23, 23, 23, 23, 23, 23, 45, 71, 213, 127, 37, 52, 52, 122, 122, 107, 203, 142, 132, 122, 170, 162, 67, 358, 157, 101, 73, 62, 62, 62, 19, 121, 65, 44, 166, 198, 126, 140, 149, 49, 154, 57, 57, 57, 110, 110, 110, 110, 110, 110, 172, 218, 157, 84, 79, 117, 88, 150, 58, 144, 110, 36, 167, 146, 87, 112, 43, 93, 245, 74, 98, 117, 72, 25, 169, 107, 149, 144, 35, 35, 35, 35, 35, 170, 205, 234, 269, 58, 261, 261, 166, 212, 120, 120, 65, 65, 168, 42, 42, 253, 64, 169, 16, 96, 45, 221, 54, 211, 83, 53, 138, 126, 32, 32, 32, 160, 174, 163, 86, 276, 101, 93, 45, 46, 193, 97, 54, 38, 161, 150, 192, 192, 325, 84, 84, 168, 182, 48, 244, 61, 61, 61, 122, 81, 205, 213, 192, 164, 182, 182, 96, 86, 108, 335, 130, 61, 41, 282, 119, 189, 189, 50, 143, 321, 217, 38, 170, 108, 160, 237, 57, 168, 112, 84, 126, 56, 36, 112, 236, 145, 158, 151, 151, 92, 180, 128, 89, 155, 203, 56, 56, 56, 138, 113, 183, 215, 44, 119, 69, 164, 119, 158, 178, 149, 161, 152, 140, 90, 105, 140, 104, 139, 153, 147, 230, 230, 174, 137, 175, 189, 185, 151, 146, 159, 153, 140, 125, 116, 152, 137, 67, 162, 153, 131, 180, 167, 141, 155, 118, 127, 156, 149, 119, 150, 122, 121, 151, 205, 112, 112, 112, 82, 51, 156, 46, 120, 57, 58, 132, 46, 63, 48, 154, 118, 64, 64, 72, 146, 146, 146, 77, 117, 107, 84, 200, 91, 40, 119, 170, 175, 332, 109, 55, 18, 91, 76, 76, 76, 76, 76, 304, 304, 105, 69, 131, 131, 49, 138, 12, 104, 134, 90, 191, 115, 115, 115, 115, 115, 73, 73, 75, 73, 72, 115, 124, 62, 62, 128, 79, 43, 42, 112, 112, 96, 97, 103, 282, 56, 161, 242, 50, 167, 177, 24, 38, 38, 38, 38, 38, 38, 152, 124, 121, 141, 82, 82, 82, 188, 127, 210, 163, 51, 169, 27, 139, 140, 76, 177, 217, 20, 209, 108, 70, 58, 87, 87, 121, 36, 36, 70, 230, 310, 83, 168, 49, 183, 146, 98, 82, 70, 70, 91, 91, 78, 126, 228, 228, 126, 101, 182, 202, 228, 286, 339, 140, 229, 282, 181, 100, 138, 138, 277, 162, 110, 62, 57, 57, 53, 189, 162, 60, 65, 170, 148, 150, 161, 63, 76, 70, 82, 82, 203, 183, 70, 257, 139, 37, 188, 254, 143, 267, 146, 69, 69, 212, 236, 92, 49, 43, 138, 165, 56, 73, 265, 64, 75, 162, 262, 65, 151, 22, 77, 126, 51, 131, 131, 69, 49, 79, 79, 57, 160, 66, 66, 120, 148, 283, 142, 163, 226, 344, 344, 232, 226, 127, 105, 105, 105, 105, 143, 122, 151, 80, 105, 126, 126, 105, 42, 105, 56, 105, 56, 42, 42, 105, 56, 105, 105, 56, 42, 56, 105, 56, 105, 56, 300, 291, 106, 106, 245, 124, 50, 264, 56, 78, 134, 36, 30, 30, 58, 44, 210, 167, 69, 188, 162, 254, 69, 69, 69, 69, 69, 69, 69, 69, 97, 164, 91, 104, 104, 154, 61, 322, 175, 175, 175, 175, 175, 175, 175, 175, 89, 50, 140, 140, 118, 118, 69, 66, 66, 126, 228, 64, 64, 64, 78, 283, 15, 141, 64, 158, 147, 56, 149, 148, 203, 228, 98, 107, 114, 143, 93, 84, 252, 28, 28, 50, 70, 137, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 75, 55, 49, 133, 190, 84, 84, 57, 132, 57, 136, 153, 255, 88, 88, 88, 88, 88, 174, 46, 62, 84, 32, 32, 32, 102, 203, 235, 235, 155, 43, 115, 115, 157, 60, 60, 66, 137, 148, 150, 94, 156, 151, 63, 168, 84, 68, 119, 163, 163, 163, 87, 226, 276, 299, 141, 84, 84, 198, 206, 235, 130, 130, 144, 102, 102, 102, 102, 102, 102, 81, 58, 181, 112, 150, 77, 56, 21, 217, 217, 178, 45, 65, 62, 62, 80, 80, 43, 159, 233, 87, 87, 58, 73, 105, 182, 148, 200, 51, 39, 119, 119, 178, 152, 152, 49, 49, 49, 49, 49, 49, 92, 42, 114, 84, 140, 155, 217, 75, 75, 75, 127, 177, 47, 47, 351, 122, 59, 177, 91, 123, 124, 124, 124, 124, 124, 103, 319, 131, 196, 70, 128, 60, 60, 60, 160, 80, 70, 85, 51, 193, 39, 89, 92, 113, 240, 240, 15, 15, 15, 240, 122, 122, 117, 93, 219, 95, 7, 264, 244, 187, 64, 64, 44, 203, 72, 17, 37, 47, 47, 112, 117, 275, 23, 246, 143, 47, 99, 99, 113, 147, 98, 98, 98, 98, 98, 201, 83, 218, 178, 269, 50, 167, 24, 24, 24, 58, 179, 179, 158, 158, 158, 179, 70, 70, 14, 4, 6, 134, 213, 134, 60, 35, 11, 19, 19, 11, 5, 18, 12, 5, 3, 7, 11, 123, 88, 88, 154, 4, 159, 214, 214, 214, 126, 11, 79, 266, 97, 121, 154, 132, 55, 26, 6, 223, 35, 70, 70, 70, 5, 32, 78, 52, 10, 48, 181, 68, 80, 323, 181, 22, 53, 57, 90, 153, 7, 7, 7, 7, 7, 7, 83, 33, 83, 129, 125, 125, 125, 125, 125, 44, 147, 70, 70, 117, 117, 117, 117, 112, 55, 16, 223, 112, 55, 43, 111, 117, 138, 70, 112, 117, 112, 36, 61, 83, 30, 254, 280, 291, 238, 56, 238, 291, 238, 291, 71, 46, 46, 82, 174, 308, 34, 73, 61, 112, 61, 26, 221, 95, 95, 95, 115, 95, 95, 95, 95, 61, 81, 188, 54, 210, 210, 162, 56, 91, 80, 64, 64, 64, 85, 44, 196, 49, 315, 176, 108, 108, 49, 254, 79, 56, 169, 135, 128, 188, 259, 66, 144, 144], \"y0\": \" \", \"yaxis\": \"y\"}],\n",
" {\"boxmode\": \"group\", \"legend\": {\"tracegroupgap\": 0}, \"margin\": {\"t\": 60}, \"template\": {\"data\": {\"bar\": [{\"error_x\": {\"color\": \"#2a3f5f\"}, \"error_y\": {\"color\": \"#2a3f5f\"}, \"marker\": {\"line\": {\"color\": \"#E5ECF6\", \"width\": 0.5}}, \"type\": \"bar\"}], \"barpolar\": [{\"marker\": {\"line\": {\"color\": \"#E5ECF6\", \"width\": 0.5}}, \"type\": \"barpolar\"}], \"carpet\": [{\"aaxis\": {\"endlinecolor\": \"#2a3f5f\", \"gridcolor\": \"white\", \"linecolor\": \"white\", \"minorgridcolor\": \"white\", \"startlinecolor\": \"#2a3f5f\"}, \"baxis\": {\"endlinecolor\": \"#2a3f5f\", \"gridcolor\": \"white\", \"linecolor\": \"white\", \"minorgridcolor\": \"white\", \"startlinecolor\": \"#2a3f5f\"}, \"type\": \"carpet\"}], \"choropleth\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"type\": \"choropleth\"}], \"contour\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"contour\"}], \"contourcarpet\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"type\": \"contourcarpet\"}], \"heatmap\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"heatmap\"}], \"heatmapgl\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"heatmapgl\"}], \"histogram\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"histogram\"}], \"histogram2d\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"histogram2d\"}], \"histogram2dcontour\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"histogram2dcontour\"}], \"mesh3d\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"type\": \"mesh3d\"}], \"parcoords\": [{\"line\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"parcoords\"}], \"pie\": [{\"automargin\": true, \"type\": \"pie\"}], \"scatter\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatter\"}], \"scatter3d\": [{\"line\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatter3d\"}], \"scattercarpet\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattercarpet\"}], \"scattergeo\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattergeo\"}], \"scattergl\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattergl\"}], \"scattermapbox\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scattermapbox\"}], \"scatterpolar\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatterpolar\"}], \"scatterpolargl\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatterpolargl\"}], \"scatterternary\": [{\"marker\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"type\": \"scatterternary\"}], \"surface\": [{\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}, \"colorscale\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"type\": \"surface\"}], \"table\": [{\"cells\": {\"fill\": {\"color\": \"#EBF0F8\"}, \"line\": {\"color\": \"white\"}}, \"header\": {\"fill\": {\"color\": \"#C8D4E3\"}, \"line\": {\"color\": \"white\"}}, \"type\": \"table\"}]}, \"layout\": {\"annotationdefaults\": {\"arrowcolor\": \"#2a3f5f\", \"arrowhead\": 0, \"arrowwidth\": 1}, \"coloraxis\": {\"colorbar\": {\"outlinewidth\": 0, \"ticks\": \"\"}}, \"colorscale\": {\"diverging\": [[0, \"#8e0152\"], [0.1, \"#c51b7d\"], [0.2, \"#de77ae\"], [0.3, \"#f1b6da\"], [0.4, \"#fde0ef\"], [0.5, \"#f7f7f7\"], [0.6, \"#e6f5d0\"], [0.7, \"#b8e186\"], [0.8, \"#7fbc41\"], [0.9, \"#4d9221\"], [1, \"#276419\"]], \"sequential\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]], \"sequentialminus\": [[0.0, \"#0d0887\"], [0.1111111111111111, \"#46039f\"], [0.2222222222222222, \"#7201a8\"], [0.3333333333333333, \"#9c179e\"], [0.4444444444444444, \"#bd3786\"], [0.5555555555555556, \"#d8576b\"], [0.6666666666666666, \"#ed7953\"], [0.7777777777777778, \"#fb9f3a\"], [0.8888888888888888, \"#fdca26\"], [1.0, \"#f0f921\"]]}, \"colorway\": [\"#636efa\", \"#EF553B\", \"#00cc96\", \"#ab63fa\", \"#FFA15A\", \"#19d3f3\", \"#FF6692\", \"#B6E880\", \"#FF97FF\", \"#FECB52\"], \"font\": {\"color\": \"#2a3f5f\"}, \"geo\": {\"bgcolor\": \"white\", \"lakecolor\": \"white\", \"landcolor\": \"#E5ECF6\", \"showlakes\": true, \"showland\": true, \"subunitcolor\": \"white\"}, \"hoverlabel\": {\"align\": \"left\"}, \"hovermode\": \"closest\", \"mapbox\": {\"style\": \"light\"}, \"paper_bgcolor\": \"white\", \"plot_bgcolor\": \"#E5ECF6\", \"polar\": {\"angularaxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}, \"bgcolor\": \"#E5ECF6\", \"radialaxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}}, \"scene\": {\"xaxis\": {\"backgroundcolor\": \"#E5ECF6\", \"gridcolor\": \"white\", \"gridwidth\": 2, \"linecolor\": \"white\", \"showbackground\": true, \"ticks\": \"\", \"zerolinecolor\": \"white\"}, \"yaxis\": {\"backgroundcolor\": \"#E5ECF6\", \"gridcolor\": \"white\", \"gridwidth\": 2, \"linecolor\": \"white\", \"showbackground\": true, \"ticks\": \"\", \"zerolinecolor\": \"white\"}, \"zaxis\": {\"backgroundcolor\": \"#E5ECF6\", \"gridcolor\": \"white\", \"gridwidth\": 2, \"linecolor\": \"white\", \"showbackground\": true, \"ticks\": \"\", \"zerolinecolor\": \"white\"}}, \"shapedefaults\": {\"line\": {\"color\": \"#2a3f5f\"}}, \"ternary\": {\"aaxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}, \"baxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}, \"bgcolor\": \"#E5ECF6\", \"caxis\": {\"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\"}}, \"title\": {\"x\": 0.05}, \"xaxis\": {\"automargin\": true, \"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\", \"title\": {\"standoff\": 15}, \"zerolinecolor\": \"white\", \"zerolinewidth\": 2}, \"yaxis\": {\"automargin\": true, \"gridcolor\": \"white\", \"linecolor\": \"white\", \"ticks\": \"\", \"title\": {\"standoff\": 15}, \"zerolinecolor\": \"white\", \"zerolinewidth\": 2}}}, \"xaxis\": {\"anchor\": \"y\", \"domain\": [0.0, 1.0]}, \"yaxis\": {\"anchor\": \"x\", \"domain\": [0.0, 1.0], \"title\": {\"text\": \"\\u53d6\\u5f97\\u65e5\\u6570\"}}},\n",
" {\"responsive\": true}\n",
" ).then(function(){\n",
" \n",
"var gd = document.getElementById('f709506e-5f98-4ecf-9cbc-1d4306d7042a');\n",
"var x = new MutationObserver(function (mutations, observer) {{\n",
" var display = window.getComputedStyle(gd).display;\n",
" if (!display || display === 'none') {{\n",
" console.log([gd, 'removed!']);\n",
" Plotly.purge(gd);\n",
" observer.disconnect();\n",
" }}\n",
"}});\n",
"\n",
"// Listen for the removal of the full notebook cells\n",
"var notebookContainer = gd.closest('#notebook-container');\n",
"if (notebookContainer) {{\n",
" x.observe(notebookContainer, {childList: true});\n",
"}}\n",
"\n",
"// Listen for the clearing of the current output cell\n",
"var outputEl = gd.closest('.output');\n",
"if (outputEl) {{\n",
" x.observe(outputEl, {childList: true});\n",
"}}\n",
"\n",
" })\n",
" };\n",
" \n",
" </script>\n",
" </div>\n",
"</body>\n",
"</html>"
]
},
"metadata": {
"tags": []
}
}
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "ejF77fDjcFp9"
},
"source": [
"## まとめ\n",
"\n",
"高専だとプログラミングの基礎で結構な時間を費やすと思いますが、その部分は高専の授業に譲って、ここでは実践的で、しかも面白いと思える授業を目指しています。クローラーを書ければ、高専のニュースなどクローリングしてボットにすることもできるだろうし、少しだけプログラミングが面白くなるかな、などと考えて授業構成を作りました。\n",
"\n",
"内容もちょっとずつアップデートしています。去年はフラーのニュースサイトからデータを取得するだけでしたが、今年は外部データを可視化するところまでいけました。\n",
"\n",
"今後も、技術のトレンドや自分の趣味に応じて、授業をバージョンアップしていければと思います。"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment