Skip to content

Instantly share code, notes, and snippets.

@shimizukawa
Last active September 26, 2018 13:32
Show Gist options
  • Save shimizukawa/8aca0b55d3acbc25ec05120d5e84b727 to your computer and use it in GitHub Desktop.
Save shimizukawa/8aca0b55d3acbc25ec05120d5e84b727 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# ビール銘柄と醸造所の分析\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"\n",
"import pandas as pd"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ビールデータ読み込み"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>abv</th>\n",
" <th>ibu</th>\n",
" <th>name</th>\n",
" <th>style</th>\n",
" <th>brewery_id</th>\n",
" <th>ounces</th>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1436</th>\n",
" <td>0.050</td>\n",
" <td>NaN</td>\n",
" <td>Pub Beer</td>\n",
" <td>American Pale Lager</td>\n",
" <td>408</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2265</th>\n",
" <td>0.066</td>\n",
" <td>NaN</td>\n",
" <td>Devil's Cup</td>\n",
" <td>American Pale Ale (APA)</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2264</th>\n",
" <td>0.071</td>\n",
" <td>NaN</td>\n",
" <td>Rise of the Phoenix</td>\n",
" <td>American IPA</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2263</th>\n",
" <td>0.090</td>\n",
" <td>NaN</td>\n",
" <td>Sinister</td>\n",
" <td>American Double / Imperial IPA</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2262</th>\n",
" <td>0.075</td>\n",
" <td>NaN</td>\n",
" <td>Sex and Candy</td>\n",
" <td>American IPA</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" abv ibu name style \\\n",
"id \n",
"1436 0.050 NaN Pub Beer American Pale Lager \n",
"2265 0.066 NaN Devil's Cup American Pale Ale (APA) \n",
"2264 0.071 NaN Rise of the Phoenix American IPA \n",
"2263 0.090 NaN Sinister American Double / Imperial IPA \n",
"2262 0.075 NaN Sex and Candy American IPA \n",
"\n",
" brewery_id ounces \n",
"id \n",
"1436 408 12.0 \n",
"2265 177 12.0 \n",
"2264 177 12.0 \n",
"2263 177 12.0 \n",
"2262 177 12.0 "
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_beers = pd.read_csv('beers.csv', index_col='id')\n",
"df_beers.drop(df_beers.columns[0], inplace=True, axis=1) # 行を削除するため、axis=1\n",
"df_beers.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* abv = Alcohol By Volume = アルコール度数, [0, 1]\n",
"* ibu = IInternational Bitterness Units = 国際苦味単位\n",
"* name = ビールの名前\n",
"* style = ビールの種類\n",
"* brewery_id = breweries.csvのid列\n",
"* ounces = オンス(1オンスは30ml)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 数値データの概要を把握"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>abv</th>\n",
" <th>ibu</th>\n",
" <th>brewery_id</th>\n",
" <th>ounces</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>count</th>\n",
" <td>2348.000000</td>\n",
" <td>1405.000000</td>\n",
" <td>2410.000000</td>\n",
" <td>2410.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>mean</th>\n",
" <td>0.059773</td>\n",
" <td>42.713167</td>\n",
" <td>231.749793</td>\n",
" <td>13.592241</td>\n",
" </tr>\n",
" <tr>\n",
" <th>std</th>\n",
" <td>0.013542</td>\n",
" <td>25.954066</td>\n",
" <td>157.685604</td>\n",
" <td>2.352204</td>\n",
" </tr>\n",
" <tr>\n",
" <th>min</th>\n",
" <td>0.001000</td>\n",
" <td>4.000000</td>\n",
" <td>0.000000</td>\n",
" <td>8.400000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>25%</th>\n",
" <td>0.050000</td>\n",
" <td>21.000000</td>\n",
" <td>93.000000</td>\n",
" <td>12.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>50%</th>\n",
" <td>0.056000</td>\n",
" <td>35.000000</td>\n",
" <td>205.000000</td>\n",
" <td>12.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>75%</th>\n",
" <td>0.067000</td>\n",
" <td>64.000000</td>\n",
" <td>366.000000</td>\n",
" <td>16.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>max</th>\n",
" <td>0.128000</td>\n",
" <td>138.000000</td>\n",
" <td>557.000000</td>\n",
" <td>32.000000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" abv ibu brewery_id ounces\n",
"count 2348.000000 1405.000000 2410.000000 2410.000000\n",
"mean 0.059773 42.713167 231.749793 13.592241\n",
"std 0.013542 25.954066 157.685604 2.352204\n",
"min 0.001000 4.000000 0.000000 8.400000\n",
"25% 0.050000 21.000000 93.000000 12.000000\n",
"50% 0.056000 35.000000 205.000000 12.000000\n",
"75% 0.067000 64.000000 366.000000 16.000000\n",
"max 0.128000 138.000000 557.000000 32.000000"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_beers.describe()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### styleの種類数を把握"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"99"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_beers['style'].nunique()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### styleごとの件数を把握"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"American IPA 424\n",
"American Pale Ale (APA) 245\n",
"American Amber / Red Ale 133\n",
"American Blonde Ale 108\n",
"American Double / Imperial IPA 105\n",
"Name: style, dtype: int64"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_beers['style'].value_counts().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## データの特徴を掴む\n",
"\n",
"そのために、プロットしてみる"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 各データのばらつきを見る"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x119f9bf98>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x10d722278>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x10d74e550>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x119fc87b8>]],\n",
" dtype=object)"
]
},
"execution_count": 6,
"metadata": {},
"output_type": "execute_result"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/shimizukawa/bp/python-training-20180926-nri/.venv/lib/python3.7/site-packages/matplotlib/font_manager.py:1238: UserWarning: findfont: Font family ['IPAexGothic'] not found. Falling back to DejaVu Sans.\n",
" (prop.get_family(), self.defaultFamily[fontext]))\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 4 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df_beers.hist()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* abvは0.5付近に集中していることがわかる\n",
"* ibuは30くらいが多くて、100を超えるのはごく僅か\n",
"* オンス(量)はほとんどのビールが2種類の量に集中している"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 量の種類でカウント"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"12.0 1525\n",
"16.0 841\n",
"24.0 22\n",
"19.2 15\n",
"32.0 5\n",
"16.9 1\n",
"8.4 1\n",
"Name: ounces, dtype: int64"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_beers['ounces'].value_counts()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* 6割が12オンス(約350ml)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### ABVとIBUの関係をプロット"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x11a103c50>"
]
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"df_beers.plot(kind='scatter', x='abv', y='ibu')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* 苦さ(高いIBU)を求めると、アルコール度数は高くなってしまう\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### いろんなグラフを一気に見る"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[<matplotlib.axes._subplots.AxesSubplot object at 0x11a103cc0>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x11a413f28>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x11a4424a8>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x11a469a20>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x11c64bf98>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x11c67a550>],\n",
" [<matplotlib.axes._subplots.AxesSubplot object at 0x11c6a1ac8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x11c6d50b8>,\n",
" <matplotlib.axes._subplots.AxesSubplot object at 0x11c6d50f0>]],\n",
" dtype=object)"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
},
{
"name": "stderr",
"output_type": "stream",
"text": [
"/Users/shimizukawa/bp/python-training-20180926-nri/.venv/lib/python3.7/site-packages/matplotlib/font_manager.py:1238: UserWarning: findfont: Font family ['IPAexGothic'] not found. Falling back to DejaVu Sans.\n",
" (prop.get_family(), self.defaultFamily[fontext]))\n"
]
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 9 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"from pandas.plotting import scatter_matrix\n",
"scatter_matrix(df_beers[['abv', 'ibu', 'ounces']])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### styleごとのabv, ibu"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 平均を計算"
]
},
{
"cell_type": "code",
"execution_count": 29,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>abv</th>\n",
" <th>ibu</th>\n",
" </tr>\n",
" <tr>\n",
" <th>style</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>Abbey Single Ale</th>\n",
" <td>0.049000</td>\n",
" <td>22.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>Altbier</th>\n",
" <td>0.054385</td>\n",
" <td>34.125000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>American Adjunct Lager</th>\n",
" <td>0.048722</td>\n",
" <td>11.000000</td>\n",
" </tr>\n",
" <tr>\n",
" <th>American Amber / Red Ale</th>\n",
" <td>0.057456</td>\n",
" <td>36.298701</td>\n",
" </tr>\n",
" <tr>\n",
" <th>American Amber / Red Lager</th>\n",
" <td>0.049464</td>\n",
" <td>23.250000</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" abv ibu\n",
"style \n",
"Abbey Single Ale 0.049000 22.000000\n",
"Altbier 0.054385 34.125000\n",
"American Adjunct Lager 0.048722 11.000000\n",
"American Amber / Red Ale 0.057456 36.298701\n",
"American Amber / Red Lager 0.049464 23.250000"
]
},
"execution_count": 29,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"g = df_beers.groupby(by='style')\n",
"g[['abv', 'ibu']].mean().head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### 散布図をプロット"
]
},
{
"cell_type": "code",
"execution_count": 32,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"<matplotlib.axes._subplots.AxesSubplot at 0x11ca39e80>"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
},
{
"data": {
"image/png": "\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"g[['abv', 'ibu']].mean().plot(kind='scatter', x='abv', y='ibu')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* やっぱり、苦さ(高いIBU)を求めると、アルコール度数は高くなってしまう\n",
"* アルコール度数が高いビールが常に苦いとは限らない"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## styleを特徴として処理できるようにダミー化する\n",
"\n",
"* ダミー化 = One Hot Encoding (OHE).\n",
"* 文字のままだと扱いづらいので、0/1で表現できるように展開したデータのこと"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>style_Abbey Single Ale</th>\n",
" <th>style_Altbier</th>\n",
" <th>style_American Adjunct Lager</th>\n",
" <th>style_American Amber / Red Ale</th>\n",
" <th>style_American Amber / Red Lager</th>\n",
" <th>style_American Barleywine</th>\n",
" <th>style_American Black Ale</th>\n",
" <th>style_American Blonde Ale</th>\n",
" <th>style_American Brown Ale</th>\n",
" <th>style_American Dark Wheat Ale</th>\n",
" <th>...</th>\n",
" <th>style_Schwarzbier</th>\n",
" <th>style_Scotch Ale / Wee Heavy</th>\n",
" <th>style_Scottish Ale</th>\n",
" <th>style_Shandy</th>\n",
" <th>style_Smoked Beer</th>\n",
" <th>style_Tripel</th>\n",
" <th>style_Vienna Lager</th>\n",
" <th>style_Wheat Ale</th>\n",
" <th>style_Winter Warmer</th>\n",
" <th>style_Witbier</th>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1436</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2265</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2264</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2263</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2262</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 99 columns</p>\n",
"</div>"
],
"text/plain": [
" style_Abbey Single Ale style_Altbier style_American Adjunct Lager \\\n",
"id \n",
"1436 0 0 0 \n",
"2265 0 0 0 \n",
"2264 0 0 0 \n",
"2263 0 0 0 \n",
"2262 0 0 0 \n",
"\n",
" style_American Amber / Red Ale style_American Amber / Red Lager \\\n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
" style_American Barleywine style_American Black Ale \\\n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
" style_American Blonde Ale style_American Brown Ale \\\n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
" style_American Dark Wheat Ale ... style_Schwarzbier \\\n",
"id ... \n",
"1436 0 ... 0 \n",
"2265 0 ... 0 \n",
"2264 0 ... 0 \n",
"2263 0 ... 0 \n",
"2262 0 ... 0 \n",
"\n",
" style_Scotch Ale / Wee Heavy style_Scottish Ale style_Shandy \\\n",
"id \n",
"1436 0 0 0 \n",
"2265 0 0 0 \n",
"2264 0 0 0 \n",
"2263 0 0 0 \n",
"2262 0 0 0 \n",
"\n",
" style_Smoked Beer style_Tripel style_Vienna Lager style_Wheat Ale \\\n",
"id \n",
"1436 0 0 0 0 \n",
"2265 0 0 0 0 \n",
"2264 0 0 0 0 \n",
"2263 0 0 0 0 \n",
"2262 0 0 0 0 \n",
"\n",
" style_Winter Warmer style_Witbier \n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
"[5 rows x 99 columns]"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"dummies = pd.get_dummies(df_beers['style'], prefix='style')\n",
"dummies.head()"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>abv</th>\n",
" <th>ibu</th>\n",
" <th>name</th>\n",
" <th>style_Abbey Single Ale</th>\n",
" <th>style_Altbier</th>\n",
" <th>style_American Adjunct Lager</th>\n",
" <th>style_American Amber / Red Ale</th>\n",
" <th>style_American Amber / Red Lager</th>\n",
" <th>style_American Barleywine</th>\n",
" <th>style_American Black Ale</th>\n",
" <th>...</th>\n",
" <th>style_Schwarzbier</th>\n",
" <th>style_Scotch Ale / Wee Heavy</th>\n",
" <th>style_Scottish Ale</th>\n",
" <th>style_Shandy</th>\n",
" <th>style_Smoked Beer</th>\n",
" <th>style_Tripel</th>\n",
" <th>style_Vienna Lager</th>\n",
" <th>style_Wheat Ale</th>\n",
" <th>style_Winter Warmer</th>\n",
" <th>style_Witbier</th>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1436</th>\n",
" <td>0.050</td>\n",
" <td>NaN</td>\n",
" <td>Pub Beer</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2265</th>\n",
" <td>0.066</td>\n",
" <td>NaN</td>\n",
" <td>Devil's Cup</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2264</th>\n",
" <td>0.071</td>\n",
" <td>NaN</td>\n",
" <td>Rise of the Phoenix</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2263</th>\n",
" <td>0.090</td>\n",
" <td>NaN</td>\n",
" <td>Sinister</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2262</th>\n",
" <td>0.075</td>\n",
" <td>NaN</td>\n",
" <td>Sex and Candy</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 102 columns</p>\n",
"</div>"
],
"text/plain": [
" abv ibu name style_Abbey Single Ale style_Altbier \\\n",
"id \n",
"1436 0.050 NaN Pub Beer 0 0 \n",
"2265 0.066 NaN Devil's Cup 0 0 \n",
"2264 0.071 NaN Rise of the Phoenix 0 0 \n",
"2263 0.090 NaN Sinister 0 0 \n",
"2262 0.075 NaN Sex and Candy 0 0 \n",
"\n",
" style_American Adjunct Lager style_American Amber / Red Ale \\\n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
" style_American Amber / Red Lager style_American Barleywine \\\n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
" style_American Black Ale ... style_Schwarzbier \\\n",
"id ... \n",
"1436 0 ... 0 \n",
"2265 0 ... 0 \n",
"2264 0 ... 0 \n",
"2263 0 ... 0 \n",
"2262 0 ... 0 \n",
"\n",
" style_Scotch Ale / Wee Heavy style_Scottish Ale style_Shandy \\\n",
"id \n",
"1436 0 0 0 \n",
"2265 0 0 0 \n",
"2264 0 0 0 \n",
"2263 0 0 0 \n",
"2262 0 0 0 \n",
"\n",
" style_Smoked Beer style_Tripel style_Vienna Lager style_Wheat Ale \\\n",
"id \n",
"1436 0 0 0 0 \n",
"2265 0 0 0 0 \n",
"2264 0 0 0 0 \n",
"2263 0 0 0 0 \n",
"2262 0 0 0 0 \n",
"\n",
" style_Winter Warmer style_Witbier \n",
"id \n",
"1436 0 0 \n",
"2265 0 0 \n",
"2264 0 0 \n",
"2263 0 0 \n",
"2262 0 0 \n",
"\n",
"[5 rows x 102 columns]"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# 元のbeersのデータにくっつける。横方向で連結するため、axis=1\n",
"df_beers2 = pd.concat([df_beers[['abv', 'ibu', 'name']] ,dummies], axis=1)\n",
"df_beers2.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"ここで、このデータを使ってなにか面白いことはできないか?"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 醸造所データ読み込み"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>name</th>\n",
" <th>city</th>\n",
" <th>state</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NorthGate Brewing</td>\n",
" <td>Minneapolis</td>\n",
" <td>MN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>Against the Grain Brewery</td>\n",
" <td>Louisville</td>\n",
" <td>KY</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>Jack's Abby Craft Lagers</td>\n",
" <td>Framingham</td>\n",
" <td>MA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>Mike Hess Brewing Company</td>\n",
" <td>San Diego</td>\n",
" <td>CA</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>Fort Point Beer Company</td>\n",
" <td>San Francisco</td>\n",
" <td>CA</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" name city state\n",
"0 NorthGate Brewing Minneapolis MN\n",
"1 Against the Grain Brewery Louisville KY\n",
"2 Jack's Abby Craft Lagers Framingham MA\n",
"3 Mike Hess Brewing Company San Diego CA\n",
"4 Fort Point Beer Company San Francisco CA"
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df_breweries = pd.read_csv('breweries.csv', index_col=0)\n",
"df_breweries.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## ビールと醸造所を結合"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>abv</th>\n",
" <th>ibu</th>\n",
" <th>name</th>\n",
" <th>style</th>\n",
" <th>brewery_id</th>\n",
" <th>ounces</th>\n",
" <th>name_brewery</th>\n",
" <th>city</th>\n",
" <th>state</th>\n",
" </tr>\n",
" <tr>\n",
" <th>id</th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>1436</th>\n",
" <td>0.050</td>\n",
" <td>NaN</td>\n",
" <td>Pub Beer</td>\n",
" <td>American Pale Lager</td>\n",
" <td>408</td>\n",
" <td>12.0</td>\n",
" <td>10 Barrel Brewing Company</td>\n",
" <td>Bend</td>\n",
" <td>OR</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2265</th>\n",
" <td>0.066</td>\n",
" <td>NaN</td>\n",
" <td>Devil's Cup</td>\n",
" <td>American Pale Ale (APA)</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" <td>18th Street Brewery</td>\n",
" <td>Gary</td>\n",
" <td>IN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2264</th>\n",
" <td>0.071</td>\n",
" <td>NaN</td>\n",
" <td>Rise of the Phoenix</td>\n",
" <td>American IPA</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" <td>18th Street Brewery</td>\n",
" <td>Gary</td>\n",
" <td>IN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2263</th>\n",
" <td>0.090</td>\n",
" <td>NaN</td>\n",
" <td>Sinister</td>\n",
" <td>American Double / Imperial IPA</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" <td>18th Street Brewery</td>\n",
" <td>Gary</td>\n",
" <td>IN</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2262</th>\n",
" <td>0.075</td>\n",
" <td>NaN</td>\n",
" <td>Sex and Candy</td>\n",
" <td>American IPA</td>\n",
" <td>177</td>\n",
" <td>12.0</td>\n",
" <td>18th Street Brewery</td>\n",
" <td>Gary</td>\n",
" <td>IN</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" abv ibu name style \\\n",
"id \n",
"1436 0.050 NaN Pub Beer American Pale Lager \n",
"2265 0.066 NaN Devil's Cup American Pale Ale (APA) \n",
"2264 0.071 NaN Rise of the Phoenix American IPA \n",
"2263 0.090 NaN Sinister American Double / Imperial IPA \n",
"2262 0.075 NaN Sex and Candy American IPA \n",
"\n",
" brewery_id ounces name_brewery city state \n",
"id \n",
"1436 408 12.0 10 Barrel Brewing Company Bend OR \n",
"2265 177 12.0 18th Street Brewery Gary IN \n",
"2264 177 12.0 18th Street Brewery Gary IN \n",
"2263 177 12.0 18th Street Brewery Gary IN \n",
"2262 177 12.0 18th Street Brewery Gary IN "
]
},
"execution_count": 13,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"df = df_beers.merge(\n",
" df_breweries, \n",
" left_on='brewery_id', # 左側はこの列を結合に使う\n",
" right_index=True, # 右側はindexを結合に使う\n",
" how='inner', # 左側のデータを全て活かして、右のデータを乗せる\n",
" suffixes=('', '_brewery') # name列がかぶってるので、brewery側にsuffixを付ける\n",
")\n",
"df.head()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 醸造所の特徴を見る"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"bg = df.groupby(by='name_brewery')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"うーん、醸造所データ無くても良いような気がしてきた..."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.0"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment