title | author | date |
---|---|---|
Loading scikit-learn's Boston Housing Dataset |
Damian Mingle |
04/30/2018 |
# Load library
from sklearn import datasets
The Boston housing dataset contains 506 observations on housing prices for Boston suburbs and has 15 features. The medv variable is the target variable.
per capita crime rate by town.
proportion of residential land zoned for lots over 25,000 sq.ft.
proportion of non-retail business acres per town.
Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
nitrogen oxides concentration (parts per 10 million).
average number of rooms per dwelling.
proportion of owner-occupied units built prior to 1940.
weighted mean of distances to five Boston employment centres.
index of accessibility to radial highways.
full-value property-tax rate per $10,000.
pupil-teacher ratio by town.
1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
lower status of the population (percent).
median value of owner-occupied homes in $1000s.
# Load dataset
boston = datasets.load_boston()
# Create feature matrix
features = boston.data
# Create target vector
target = boston.target
# View the feature values for the first observation
features[0]
array([6.320e-03, 1.800e+01, 2.310e+00, 0.000e+00, 5.380e-01, 6.575e+00,
6.520e+01, 4.090e+00, 1.000e+00, 2.960e+02, 1.530e+01, 3.969e+02,
4.980e+00])
Harrison, D. and Rubinfeld, D.L. (1978) Hedonic prices and the demand for clean air. J. Environ. Economics and Management 5, 81–102.
Belsley D.A., Kuh, E. and Welsch, R.E. (1980) Regression Diagnostics. Identifying Influential Data and Sources of Collinearity. New York: Wiley.