Skip to content

Instantly share code, notes, and snippets.

@akiniwa
Last active December 20, 2015 19:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save akiniwa/6184055 to your computer and use it in GitHub Desktop.
Save akiniwa/6184055 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "titanic_01_k"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "code",
"collapsed": false,
"input": [
"import pandas as pd"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train = pd.read_csv('train.csv')"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"type(train)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 5,
"text": [
"pandas.core.frame.DataFrame"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"train\u306fDataFrame\u30aa\u30d6\u30b8\u30a7\u30af\u30c8"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train.columns",
"type(train.Age)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 6,
"text": [
"pandas.core.series.Series"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"train\u306f\u30d8\u30c3\u30c0\u30fc\u306e\u540d\u524d\u3067\u5404\u30ab\u30e9\u30e0\u306b\u30a2\u30af\u30bb\u30b9\u3067\u304d\u308b\u3002\n",
"\u3061\u306a\u307f\u306btrain.Age\u306fSeries\u30aa\u30d6\u30b8\u30a7\u30af\u30c8\u3068\u3044\u3046\u3082\u306e\u3067\u3001\u30ea\u30b9\u30c8\u307f\u305f\u3044\u3060\u3051\u3069\u3001\u7d71\u8a08\u51e6\u7406\u306b\u4fbf\u5229\u306a\u30e1\u30bd\u30c3\u30c9\u3092\u3082\u3063\u3066\u308b\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train.Age.head()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 7,
"text": [
"0 22\n",
"1 38\n",
"2 26\n",
"3 35\n",
"4 35\n",
"Name: Age, dtype: float64"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"R\u3067\u4fbf\u5229\u306ahead\u3082\u4f7f\u3048\u308b\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print train.describe()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
" PassengerId Survived Pclass Age SibSp \\\n",
"count 891.000000 891.000000 891.000000 714.000000 891.000000 \n",
"mean 446.000000 0.383838 2.308642 29.699118 0.523008 \n",
"std 257.353842 0.486592 0.836071 14.526497 1.102743 \n",
"min 1.000000 0.000000 1.000000 0.420000 0.000000 \n",
"25% 223.500000 0.000000 2.000000 20.125000 0.000000 \n",
"50% 446.000000 0.000000 3.000000 28.000000 0.000000 \n",
"75% 668.500000 1.000000 3.000000 38.000000 1.000000 \n",
"max 891.000000 1.000000 3.000000 80.000000 8.000000 \n",
"\n",
" Parch Fare \n",
"count 891.000000 891.000000 \n",
"mean 0.381594 32.204208 \n",
"std 0.806057 49.693429 \n",
"min 0.000000 0.000000 \n",
"25% 0.000000 7.910400 \n",
"50% 0.000000 14.454200 \n",
"75% 0.000000 31.000000 \n",
"max 6.000000 512.329200 \n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\u3053\u308c\u306f\u3001summary()\u7684\u306a\u3084\u3064\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train.Age[0:10]"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 12,
"text": [
"0 22\n",
"1 38\n",
"2 26\n",
"3 35\n",
"4 35\n",
"5 NaN\n",
"6 54\n",
"7 2\n",
"8 27\n",
"9 14\n",
"Name: Age, dtype: float64"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"train.Age[5]\u304cNaN\u306b\u306a\u3063\u3066\u308b\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train.Age[0:10].isnull()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 14,
"text": [
"0 False\n",
"1 False\n",
"2 False\n",
"3 False\n",
"4 False\n",
"5 True\n",
"6 False\n",
"7 False\n",
"8 False\n",
"9 False\n",
"Name: Age, dtype: bool"
]
}
],
"prompt_number": 14
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"isnull()\u3092\u4f7f\u3046\u3068\u3001True\u304c\u8fd4\u308b\u3002"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"train.Age = train.Age.fillna(train.Age.mean())"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 39,
"text": [
"0 22.000000\n",
"1 38.000000\n",
"2 26.000000\n",
"3 35.000000\n",
"4 35.000000\n",
"5 29.699118\n",
"6 54.000000\n",
"7 2.000000\n",
"8 27.000000\n",
"9 14.000000\n",
"10 4.000000\n",
"11 58.000000\n",
"12 20.000000\n",
"13 39.000000\n",
"14 14.000000\n",
"...\n",
"876 20.000000\n",
"877 19.000000\n",
"878 29.699118\n",
"879 56.000000\n",
"880 25.000000\n",
"881 33.000000\n",
"882 22.000000\n",
"883 28.000000\n",
"884 25.000000\n",
"885 39.000000\n",
"886 27.000000\n",
"887 19.000000\n",
"888 29.699118\n",
"889 26.000000\n",
"890 32.000000\n",
"Name: Age, Length: 891, dtype: float64"
]
}
],
"prompt_number": 39
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"fillna('NaN\u3092\u57cb\u3081\u308b\u6570\u5024')\u304c\u4fbf\u5229!!"
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment