Skip to content

Instantly share code, notes, and snippets.

@carlosdelfino
Last active August 2, 2018 21:30
Show Gist options
  • Save carlosdelfino/ebe3d54db67756699b93246ba6dbafc4 to your computer and use it in GitHub Desktop.
Save carlosdelfino/ebe3d54db67756699b93246ba6dbafc4 to your computer and use it in GitHub Desktop.
handson-ml-scikit-tensorflow/housing.ipynb
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Baixando arquivo de dados para estudos \n\no script abaixo faz o download dos dados em formato CSV, estão compactados, mas o script descompacta já na pasta correta."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import os\nimport tarfile\nfrom six.moves import urllib\n\nDOWNLOAD_ROOT = \"https://raw.githubusercontent.com/ageron/handson-ml/master/\"\nHOUSING_PATH = \"datasets/housing\"\nHOUSING_URL = DOWNLOAD_ROOT + HOUSING_PATH + \"/housing.tgz\"\n\ndef fetch_housing_data(housing_url=HOUSING_URL, housing_path=HOUSING_PATH):\n \n if not os.path.isdir(housing_path):\n os.makedirs(housing_path)\n \n tgz_path = os.path.join(housing_path, \"housing.tgz\")\n urllib.request.urlretrieve(housing_url, tgz_path)\n \n housing_tgz = tarfile.open(tgz_path)\n housing_tgz.extractall(path=housing_path)\n housing_tgz.close()",
"execution_count": 1,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Agora a função `fetch_houseing_data()` deve ser chamada, para que o download e extração seja feita."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "fetch_housing_data()\n",
"execution_count": 2,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Em seguida crimaos a função que carrega os dados do arquivo csv etraido."
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import pandas as pd\n\ndef load_housing_data(housing_path=HOUSING_PATH):\n csv_path = os.path.join(housing_path, \"housing.csv\")\n return pd.read_csv(csv_path)",
"execution_count": 4,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "A título de teste podemos ver o contedudo do arquivo executando o seguinte script:"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing = load_housing_data()\nhousing.head()",
"execution_count": 5,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>longitude</th>\n <th>latitude</th>\n <th>housing_median_age</th>\n <th>total_rooms</th>\n <th>total_bedrooms</th>\n <th>population</th>\n <th>households</th>\n <th>median_income</th>\n <th>median_house_value</th>\n <th>ocean_proximity</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>0</th>\n <td>-122.23</td>\n <td>37.88</td>\n <td>41.0</td>\n <td>880.0</td>\n <td>129.0</td>\n <td>322.0</td>\n <td>126.0</td>\n <td>8.3252</td>\n <td>452600.0</td>\n <td>NEAR BAY</td>\n </tr>\n <tr>\n <th>1</th>\n <td>-122.22</td>\n <td>37.86</td>\n <td>21.0</td>\n <td>7099.0</td>\n <td>1106.0</td>\n <td>2401.0</td>\n <td>1138.0</td>\n <td>8.3014</td>\n <td>358500.0</td>\n <td>NEAR BAY</td>\n </tr>\n <tr>\n <th>2</th>\n <td>-122.24</td>\n <td>37.85</td>\n <td>52.0</td>\n <td>1467.0</td>\n <td>190.0</td>\n <td>496.0</td>\n <td>177.0</td>\n <td>7.2574</td>\n <td>352100.0</td>\n <td>NEAR BAY</td>\n </tr>\n <tr>\n <th>3</th>\n <td>-122.25</td>\n <td>37.85</td>\n <td>52.0</td>\n <td>1274.0</td>\n <td>235.0</td>\n <td>558.0</td>\n <td>219.0</td>\n <td>5.6431</td>\n <td>341300.0</td>\n <td>NEAR BAY</td>\n </tr>\n <tr>\n <th>4</th>\n <td>-122.25</td>\n <td>37.85</td>\n <td>52.0</td>\n <td>1627.0</td>\n <td>280.0</td>\n <td>565.0</td>\n <td>259.0</td>\n <td>3.8462</td>\n <td>342200.0</td>\n <td>NEAR BAY</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " longitude latitude housing_median_age total_rooms total_bedrooms \\\n0 -122.23 37.88 41.0 880.0 129.0 \n1 -122.22 37.86 21.0 7099.0 1106.0 \n2 -122.24 37.85 52.0 1467.0 190.0 \n3 -122.25 37.85 52.0 1274.0 235.0 \n4 -122.25 37.85 52.0 1627.0 280.0 \n\n population households median_income median_house_value ocean_proximity \n0 322.0 126.0 8.3252 452600.0 NEAR BAY \n1 2401.0 1138.0 8.3014 358500.0 NEAR BAY \n2 496.0 177.0 7.2574 352100.0 NEAR BAY \n3 558.0 219.0 5.6431 341300.0 NEAR BAY \n4 565.0 259.0 3.8462 342200.0 NEAR BAY "
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing.info()",
"execution_count": 6,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "<class 'pandas.core.frame.DataFrame'>\nRangeIndex: 20640 entries, 0 to 20639\nData columns (total 10 columns):\nlongitude 20640 non-null float64\nlatitude 20640 non-null float64\nhousing_median_age 20640 non-null float64\ntotal_rooms 20640 non-null float64\ntotal_bedrooms 20433 non-null float64\npopulation 20640 non-null float64\nhouseholds 20640 non-null float64\nmedian_income 20640 non-null float64\nmedian_house_value 20640 non-null float64\nocean_proximity 20640 non-null object\ndtypes: float64(9), object(1)\nmemory usage: 1.6+ MB\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing['ocean_proximity'].value_counts()",
"execution_count": 8,
"outputs": [
{
"data": {
"text/plain": "<1H OCEAN 9136\nINLAND 6551\nNEAR OCEAN 2658\nNEAR BAY 2290\nISLAND 5\nName: ocean_proximity, dtype: int64"
},
"execution_count": 8,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing.describe()",
"execution_count": 9,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>longitude</th>\n <th>latitude</th>\n <th>housing_median_age</th>\n <th>total_rooms</th>\n <th>total_bedrooms</th>\n <th>population</th>\n <th>households</th>\n <th>median_income</th>\n <th>median_house_value</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>count</th>\n <td>20640.000000</td>\n <td>20640.000000</td>\n <td>20640.000000</td>\n <td>20640.000000</td>\n <td>20433.000000</td>\n <td>20640.000000</td>\n <td>20640.000000</td>\n <td>20640.000000</td>\n <td>20640.000000</td>\n </tr>\n <tr>\n <th>mean</th>\n <td>-119.569704</td>\n <td>35.631861</td>\n <td>28.639486</td>\n <td>2635.763081</td>\n <td>537.870553</td>\n <td>1425.476744</td>\n <td>499.539680</td>\n <td>3.870671</td>\n <td>206855.816909</td>\n </tr>\n <tr>\n <th>std</th>\n <td>2.003532</td>\n <td>2.135952</td>\n <td>12.585558</td>\n <td>2181.615252</td>\n <td>421.385070</td>\n <td>1132.462122</td>\n <td>382.329753</td>\n <td>1.899822</td>\n <td>115395.615874</td>\n </tr>\n <tr>\n <th>min</th>\n <td>-124.350000</td>\n <td>32.540000</td>\n <td>1.000000</td>\n <td>2.000000</td>\n <td>1.000000</td>\n <td>3.000000</td>\n <td>1.000000</td>\n <td>0.499900</td>\n <td>14999.000000</td>\n </tr>\n <tr>\n <th>25%</th>\n <td>-121.800000</td>\n <td>33.930000</td>\n <td>18.000000</td>\n <td>1447.750000</td>\n <td>296.000000</td>\n <td>787.000000</td>\n <td>280.000000</td>\n <td>2.563400</td>\n <td>119600.000000</td>\n </tr>\n <tr>\n <th>50%</th>\n <td>-118.490000</td>\n <td>34.260000</td>\n <td>29.000000</td>\n <td>2127.000000</td>\n <td>435.000000</td>\n <td>1166.000000</td>\n <td>409.000000</td>\n <td>3.534800</td>\n <td>179700.000000</td>\n </tr>\n <tr>\n <th>75%</th>\n <td>-118.010000</td>\n <td>37.710000</td>\n <td>37.000000</td>\n <td>3148.000000</td>\n <td>647.000000</td>\n <td>1725.000000</td>\n <td>605.000000</td>\n <td>4.743250</td>\n <td>264725.000000</td>\n </tr>\n <tr>\n <th>max</th>\n <td>-114.310000</td>\n <td>41.950000</td>\n <td>52.000000</td>\n <td>39320.000000</td>\n <td>6445.000000</td>\n <td>35682.000000</td>\n <td>6082.000000</td>\n <td>15.000100</td>\n <td>500001.000000</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " longitude latitude housing_median_age total_rooms \\\ncount 20640.000000 20640.000000 20640.000000 20640.000000 \nmean -119.569704 35.631861 28.639486 2635.763081 \nstd 2.003532 2.135952 12.585558 2181.615252 \nmin -124.350000 32.540000 1.000000 2.000000 \n25% -121.800000 33.930000 18.000000 1447.750000 \n50% -118.490000 34.260000 29.000000 2127.000000 \n75% -118.010000 37.710000 37.000000 3148.000000 \nmax -114.310000 41.950000 52.000000 39320.000000 \n\n total_bedrooms population households median_income \\\ncount 20433.000000 20640.000000 20640.000000 20640.000000 \nmean 537.870553 1425.476744 499.539680 3.870671 \nstd 421.385070 1132.462122 382.329753 1.899822 \nmin 1.000000 3.000000 1.000000 0.499900 \n25% 296.000000 787.000000 280.000000 2.563400 \n50% 435.000000 1166.000000 409.000000 3.534800 \n75% 647.000000 1725.000000 605.000000 4.743250 \nmax 6445.000000 35682.000000 6082.000000 15.000100 \n\n median_house_value \ncount 20640.000000 \nmean 206855.816909 \nstd 115395.615874 \nmin 14999.000000 \n25% 119600.000000 \n50% 179700.000000 \n75% 264725.000000 \nmax 500001.000000 "
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "%matplotlib inline\nimport matplotlib.pyplot as plt\nhousing.hist(bins=50, figsize=(20,15))\nplt.show()",
"execution_count": 11,
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": "<Figure size 1440x1080 with 9 Axes>"
},
"metadata": {},
"output_type": "display_data"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import numpy as np\n\ndef split_train_test(data, test_ratio):\n np.random.seed(47)\n shuffled_indices = np.random.permutation(len(data))\n test_set_size = int(len(data) * test_ratio)\n test_indices = shuffled_indices[:test_set_size]\n train_indices = shuffled_indices[test_set_size:]\n return data.iloc[train_indices], data.iloc[test_indices]",
"execution_count": 17,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "train_set, test_set = split_train_test(housing, 0.2)\nprint(len(train_set), \"train +\", len(test_set), \"test\")",
"execution_count": 22,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "16512 train + 4128 test\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "import hashlib\n\ndef test_set_check(identifier, test_ratio, hash):\n return hash(np.int64(identifier)).digest()[-1] < 256 * test_ratio\n\ndef split_train_test_by_id(data, test_ratio, id_column, hash=hashlib.md5):\n ids = data[id_column]\n in_test_set = ids.apply(lambda id_: test_set_check(id_, test_ratio, hash))\n return data.loc[~in_test_set], data.loc[in_test_set]",
"execution_count": 24,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing_with_id = housing.reset_index() # adds an `index` column\ntrain_set, test_set = split_train_test_by_id(housing_with_id, 0.2, \"index\")",
"execution_count": 25,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "train_set, test_set = split_train_test(housing_with_id, 0.2)\nprint(len(train_set), \"train +\", len(test_set), \"test\")",
"execution_count": 29,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "16512 train + 4128 test\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing_with_id[\"id\"] = housing[\"longitude\"] * 1000 + housing[\"latitude\"]\ntrain_set, test_set = split_train_test_by_id(housing_with_id, 0.2, \"id\")",
"execution_count": 28,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "train_set, test_set = split_train_test(housing_with_id, 0.2)\nprint(len(train_set), \"train +\", len(test_set), \"test\")",
"execution_count": 30,
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": "16512 train + 4128 test\n"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "from sklearn.model_selection import train_test_split\ntrain_set, test_set = train_test_split(housing, test_size=0.2, random_state=42)",
"execution_count": 31,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing[\"income_cat\"] = np.ceil(housing[\"median_income\"] / 1.5)\nhousing[\"income_cat\"].where(housing[\"income_cat\"] < 5, 5.0, inplace=True)",
"execution_count": 32,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "from sklearn.model_selection import StratifiedShuffleSplit\n\nsplit = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=42)\n\nfor train_index, test_index in split.split(housing, housing[\"income_cat\"]):\n strat_train_set = housing.loc[train_index]\n strat_test_set = housing.loc[test_index]",
"execution_count": 33,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing[\"income_cat\"].value_counts() / len(housing)",
"execution_count": 34,
"outputs": [
{
"data": {
"text/plain": "3.0 0.350581\n2.0 0.318847\n4.0 0.176308\n5.0 0.114438\n1.0 0.039826\nName: income_cat, dtype: float64"
},
"execution_count": 34,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "housing[\"income_cat\"].value_counts()",
"execution_count": 35,
"outputs": [
{
"data": {
"text/plain": "3.0 7236\n2.0 6581\n4.0 3639\n5.0 2362\n1.0 822\nName: income_cat, dtype: int64"
},
"execution_count": 35,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "for set in (strat_train_set, strat_test_set):\n set.drop([\"income_cat\"], axis=1, inplace=True)",
"execution_count": 38,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "strat_train_set.describe()",
"execution_count": 39,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>longitude</th>\n <th>latitude</th>\n <th>housing_median_age</th>\n <th>total_rooms</th>\n <th>total_bedrooms</th>\n <th>population</th>\n <th>households</th>\n <th>median_income</th>\n <th>median_house_value</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>count</th>\n <td>16512.000000</td>\n <td>16512.000000</td>\n <td>16512.000000</td>\n <td>16512.000000</td>\n <td>16354.000000</td>\n <td>16512.000000</td>\n <td>16512.000000</td>\n <td>16512.000000</td>\n <td>16512.000000</td>\n </tr>\n <tr>\n <th>mean</th>\n <td>-119.575834</td>\n <td>35.639577</td>\n <td>28.653101</td>\n <td>2622.728319</td>\n <td>534.973890</td>\n <td>1419.790819</td>\n <td>497.060380</td>\n <td>3.875589</td>\n <td>206990.920724</td>\n </tr>\n <tr>\n <th>std</th>\n <td>2.001860</td>\n <td>2.138058</td>\n <td>12.574726</td>\n <td>2138.458419</td>\n <td>412.699041</td>\n <td>1115.686241</td>\n <td>375.720845</td>\n <td>1.904950</td>\n <td>115703.014830</td>\n </tr>\n <tr>\n <th>min</th>\n <td>-124.350000</td>\n <td>32.540000</td>\n <td>1.000000</td>\n <td>6.000000</td>\n <td>2.000000</td>\n <td>3.000000</td>\n <td>2.000000</td>\n <td>0.499900</td>\n <td>14999.000000</td>\n </tr>\n <tr>\n <th>25%</th>\n <td>-121.800000</td>\n <td>33.940000</td>\n <td>18.000000</td>\n <td>1443.000000</td>\n <td>295.000000</td>\n <td>784.000000</td>\n <td>279.000000</td>\n <td>2.566775</td>\n <td>119800.000000</td>\n </tr>\n <tr>\n <th>50%</th>\n <td>-118.510000</td>\n <td>34.260000</td>\n <td>29.000000</td>\n <td>2119.500000</td>\n <td>433.000000</td>\n <td>1164.000000</td>\n <td>408.000000</td>\n <td>3.540900</td>\n <td>179500.000000</td>\n </tr>\n <tr>\n <th>75%</th>\n <td>-118.010000</td>\n <td>37.720000</td>\n <td>37.000000</td>\n <td>3141.000000</td>\n <td>644.000000</td>\n <td>1719.250000</td>\n <td>602.000000</td>\n <td>4.744475</td>\n <td>263900.000000</td>\n </tr>\n <tr>\n <th>max</th>\n <td>-114.310000</td>\n <td>41.950000</td>\n <td>52.000000</td>\n <td>39320.000000</td>\n <td>6210.000000</td>\n <td>35682.000000</td>\n <td>5358.000000</td>\n <td>15.000100</td>\n <td>500001.000000</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " longitude latitude housing_median_age total_rooms \\\ncount 16512.000000 16512.000000 16512.000000 16512.000000 \nmean -119.575834 35.639577 28.653101 2622.728319 \nstd 2.001860 2.138058 12.574726 2138.458419 \nmin -124.350000 32.540000 1.000000 6.000000 \n25% -121.800000 33.940000 18.000000 1443.000000 \n50% -118.510000 34.260000 29.000000 2119.500000 \n75% -118.010000 37.720000 37.000000 3141.000000 \nmax -114.310000 41.950000 52.000000 39320.000000 \n\n total_bedrooms population households median_income \\\ncount 16354.000000 16512.000000 16512.000000 16512.000000 \nmean 534.973890 1419.790819 497.060380 3.875589 \nstd 412.699041 1115.686241 375.720845 1.904950 \nmin 2.000000 3.000000 2.000000 0.499900 \n25% 295.000000 784.000000 279.000000 2.566775 \n50% 433.000000 1164.000000 408.000000 3.540900 \n75% 644.000000 1719.250000 602.000000 4.744475 \nmax 6210.000000 35682.000000 5358.000000 15.000100 \n\n median_house_value \ncount 16512.000000 \nmean 206990.920724 \nstd 115703.014830 \nmin 14999.000000 \n25% 119800.000000 \n50% 179500.000000 \n75% 263900.000000 \nmax 500001.000000 "
},
"execution_count": 39,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "strat_test_set.describe()",
"execution_count": 40,
"outputs": [
{
"data": {
"text/html": "<div>\n<style scoped>\n .dataframe tbody tr th:only-of-type {\n vertical-align: middle;\n }\n\n .dataframe tbody tr th {\n vertical-align: top;\n }\n\n .dataframe thead th {\n text-align: right;\n }\n</style>\n<table border=\"1\" class=\"dataframe\">\n <thead>\n <tr style=\"text-align: right;\">\n <th></th>\n <th>longitude</th>\n <th>latitude</th>\n <th>housing_median_age</th>\n <th>total_rooms</th>\n <th>total_bedrooms</th>\n <th>population</th>\n <th>households</th>\n <th>median_income</th>\n <th>median_house_value</th>\n </tr>\n </thead>\n <tbody>\n <tr>\n <th>count</th>\n <td>4128.000000</td>\n <td>4128.000000</td>\n <td>4128.000000</td>\n <td>4128.000000</td>\n <td>4079.000000</td>\n <td>4128.000000</td>\n <td>4128.00000</td>\n <td>4128.000000</td>\n <td>4128.000000</td>\n </tr>\n <tr>\n <th>mean</th>\n <td>-119.545187</td>\n <td>35.600998</td>\n <td>28.585029</td>\n <td>2687.902132</td>\n <td>549.484187</td>\n <td>1448.220446</td>\n <td>509.45688</td>\n <td>3.850998</td>\n <td>206315.401647</td>\n </tr>\n <tr>\n <th>std</th>\n <td>2.010260</td>\n <td>2.127489</td>\n <td>12.630172</td>\n <td>2345.868226</td>\n <td>454.414696</td>\n <td>1197.088364</td>\n <td>407.59254</td>\n <td>1.879270</td>\n <td>114170.048854</td>\n </tr>\n <tr>\n <th>min</th>\n <td>-124.180000</td>\n <td>32.550000</td>\n <td>1.000000</td>\n <td>2.000000</td>\n <td>1.000000</td>\n <td>5.000000</td>\n <td>1.00000</td>\n <td>0.499900</td>\n <td>14999.000000</td>\n </tr>\n <tr>\n <th>25%</th>\n <td>-121.780000</td>\n <td>33.920000</td>\n <td>18.000000</td>\n <td>1474.000000</td>\n <td>301.000000</td>\n <td>805.750000</td>\n <td>283.00000</td>\n <td>2.543000</td>\n <td>118975.000000</td>\n </tr>\n <tr>\n <th>50%</th>\n <td>-118.455000</td>\n <td>34.220000</td>\n <td>28.000000</td>\n <td>2158.500000</td>\n <td>441.000000</td>\n <td>1172.000000</td>\n <td>416.00000</td>\n <td>3.514750</td>\n <td>181300.000000</td>\n </tr>\n <tr>\n <th>75%</th>\n <td>-117.980000</td>\n <td>37.690000</td>\n <td>37.000000</td>\n <td>3171.000000</td>\n <td>653.000000</td>\n <td>1754.000000</td>\n <td>613.00000</td>\n <td>4.739700</td>\n <td>269025.000000</td>\n </tr>\n <tr>\n <th>max</th>\n <td>-114.560000</td>\n <td>41.950000</td>\n <td>52.000000</td>\n <td>32627.000000</td>\n <td>6445.000000</td>\n <td>28566.000000</td>\n <td>6082.00000</td>\n <td>15.000100</td>\n <td>500001.000000</td>\n </tr>\n </tbody>\n</table>\n</div>",
"text/plain": " longitude latitude housing_median_age total_rooms \\\ncount 4128.000000 4128.000000 4128.000000 4128.000000 \nmean -119.545187 35.600998 28.585029 2687.902132 \nstd 2.010260 2.127489 12.630172 2345.868226 \nmin -124.180000 32.550000 1.000000 2.000000 \n25% -121.780000 33.920000 18.000000 1474.000000 \n50% -118.455000 34.220000 28.000000 2158.500000 \n75% -117.980000 37.690000 37.000000 3171.000000 \nmax -114.560000 41.950000 52.000000 32627.000000 \n\n total_bedrooms population households median_income \\\ncount 4079.000000 4128.000000 4128.00000 4128.000000 \nmean 549.484187 1448.220446 509.45688 3.850998 \nstd 454.414696 1197.088364 407.59254 1.879270 \nmin 1.000000 5.000000 1.00000 0.499900 \n25% 301.000000 805.750000 283.00000 2.543000 \n50% 441.000000 1172.000000 416.00000 3.514750 \n75% 653.000000 1754.000000 613.00000 4.739700 \nmax 6445.000000 28566.000000 6082.00000 15.000100 \n\n median_house_value \ncount 4128.000000 \nmean 206315.401647 \nstd 114170.048854 \nmin 14999.000000 \n25% 118975.000000 \n50% 181300.000000 \n75% 269025.000000 \nmax 500001.000000 "
},
"execution_count": 40,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "from math import ceil\nceil(22.01)",
"execution_count": 47,
"outputs": [
{
"data": {
"text/plain": "23"
},
"execution_count": 47,
"metadata": {},
"output_type": "execute_result"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "",
"execution_count": null,
"outputs": []
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python 3",
"language": "python"
},
"language_info": {
"name": "python",
"version": "3.6.6",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"toc": {
"nav_menu": {},
"number_sections": true,
"sideBar": true,
"skip_h1_title": false,
"base_numbering": 1,
"title_cell": "Table of Contents",
"title_sidebar": "Contents",
"toc_cell": false,
"toc_position": {},
"toc_section_display": true,
"toc_window_display": false
},
"gist": {
"id": "",
"data": {
"description": "handson-ml-scikit-tensorflow/housing.ipynb",
"public": true
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment