The medv variable is the target variable. Dataset source exists on kaggle.com. The data is also avaialble on github in case downloading has some issue.
This is a nice and clean table with information related to housing price in Boston Suburbs. Each line represents a town with a summary of many variables plus the median housing price there.
The Boston data frame has 506 rows and 14 columns. This data frame contains the following columns:
- crim - per capita crime rate by town.
- zn - proportion of residential land zoned for lots over 25,000 sq.ft.
- indus - proportion of non-retail business acres per town.
- chas - Charles River dummy variable (= 1 if tract bounds river; 0 otherwise).
- nox - nitrogen oxides concentration (parts per 10 million).
- rm - average number of rooms per dwelling.
- age - proportion of owner-occupied units built prior to 1940.
- dis - weighted mean of distances to five Boston employment centres.
- rad - index of accessibility to radial highways.
- tax - full-value property-tax rate per $10,000.
- ptratio - pupil-teacher ratio by town.
- black - b - 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town.
- lstat - lower status of the population (percent).
- medv - median value of owner-occupied homes in $1000s.
- Most of the fields are quantitative
- Radius to highway is an ordinal value
- chas for Charles river is Categorical
- Q1: What are the relationships between housing median price and other variables available in the table?
- Q2: How to effectively visualize those parameters if they have correlation with the housing price? Does housing price and pupil-teacher ratio groups any patterns as town with good schools tend to be expensive? Whether “age” factor about old houses (built prior to 1940) has any impacts on housing price or other parameters, such as number of rooms per house, or lower status of the population?
- Q3: Does the table has geolocation information for each town and how can that help the visualization of the dataset?