Skip to content

Instantly share code, notes, and snippets.

@qbeer
Last active September 27, 2021 17:57
Show Gist options
  • Save qbeer/6bcdfa258286bdb92f370a6146260795 to your computer and use it in GitHub Desktop.
Save qbeer/6bcdfa258286bdb92f370a6146260795 to your computer and use it in GitHub Desktop.
HW_4_raw.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "HW_4_raw.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPSztI7UgdIOpzGyciCUmm+",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
},
"language_info": {
"name": "python"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/qbeer/6bcdfa258286bdb92f370a6146260795/hw_4.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "L_NT2cWGTvgo"
},
"source": [
"# Linear regression\n",
"\n",
"### 1. Load the provided .npy files. You can load it with numpy.\n",
"\n",
"* each file contains one vector, X and y\n",
"* visualize X vs y on a scatter plot\n",
"* fit an y=w_0 + w_1⋅X + w_2⋅X^2 linear regression using `sklearn`\n",
"\n",
"### 2. Using different features\n",
"\n",
"* plot the residuals (the difference between the prediction and the actual y ) vs the original y \n",
"* a non-random-noise like pattern suggests non-linear connection between the features and the predictions\n",
"* someone told us that the connection between X and y is y=A⋅X+B⋅cos^3(X)+C⋅X^2+D \n",
" * using sklearn's linear regression estimate A,B,C,D !\n",
"* plot the residuals again! is it better now?\n",
"\n",
"### 3. Other methdods than sklearn for linear regression\n",
"\n",
"* using the statsmodels package perform the same linear regression as in 2.) (hint: use statsmodels.api.OLS)\n",
"* is the result the same? if not guess, why? (did you not forget to add the constant term?)\n",
"* try to get the same results with statsmodels as with sklearn!\n",
"* using the analytic solution formula shown during the lecture, calculate the coefficients (A, B, C, D). are they the same compared to the two previous methods?\n",
"\n",
"### 4.\n",
"\n",
"* load the [real_estate](https://gist.github.com/qbeer/f356d7144543cbb09c9792c34b8ad722) data to a pandas dataframe\n",
"drop the ID column and the geographic location columns\n",
"fit a linear regression model to predict the unit price using sklearn\n",
"\n",
"### 5.\n",
"\n",
"* interpret the coefficients and their meaning shortly with your own words\n",
"* plot the residuals for the predictions. if you had to decide only on this information, which house would you buy?\n",
"\n",
"---\n",
"\n",
"## Hints:\n",
"\n",
"* On total you can get 10 points for fully completing all tasks.\n",
"* Decorate your notebook with, questions, explanation etc, make it self contained and understandable!\n",
"* Comments you code when necessary\n",
"* Write functions for repetitive tasks!\n",
"* Use the pandas package for data loading and handling\n",
"* Use matplotlib and seaborn for plotting or bokeh and plotly for interactive investigation\n",
"* Use the scikit learn package for almost everything\n",
"* Use for loops only if it is really necessary!\n",
"* Code sharing is not allowed between student! Sharing code will result in zero points.\n",
"* If you use code found on web, it is OK, but, make its source clear!"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment