-
-
Save bedohazizsolt/56b3a86a575f3e3227e50644ae4adfe9 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "L_NT2cWGTvgo" | |
}, | |
"source": [ | |
"# Linear regression\n", | |
"\n", | |
"### 1. Load the provided .npy files. You can load it with numpy.\n", | |
"\n", | |
"* each file contains one vector, X and y\n", | |
"* visualize X vs y on a scatter plot\n", | |
"* fit an y=w_0 + w_1⋅X + w_2⋅X^2 linear regression using `sklearn`\n", | |
"\n", | |
"### 2. Using different features\n", | |
"\n", | |
"* plot the residuals (the difference between the prediction and the actual y ) vs the original y \n", | |
"* a non-random-noise like pattern suggests non-linear connection between the features and the predictions\n", | |
"* someone told us that the connection between X and y is y=A⋅X+B⋅cos^3(X)+C⋅X^2+D \n", | |
" * using sklearn's linear regression estimate A,B,C,D !\n", | |
"* plot the residuals again! is it better now?\n", | |
"\n", | |
"### 3. Other methdods than sklearn for linear regression\n", | |
"\n", | |
"* using the statsmodels package perform the same linear regression as in 2.) (hint: use statsmodels.api.OLS)\n", | |
"* is the result the same? if not guess, why? (did you not forget to add the constant term?)\n", | |
"* try to get the same results with statsmodels as with sklearn!\n", | |
"* using the analytic solution formula shown during the lecture, calculate the coefficients (A, B, C, D). are they the same compared to the two previous methods?\n", | |
"\n", | |
"### 4.\n", | |
"\n", | |
"* load the [real_estate](https://gist.github.com/qbeer/f356d7144543cbb09c9792c34b8ad722) data to a pandas dataframe\n", | |
"drop the ID column and the geographic location columns\n", | |
"fit a linear regression model to predict the unit price using sklearn\n", | |
"* interpret the coefficients and their meaning shortly with your own words\n", | |
"* plot the residuals for the predictions. if you had to decide only on this information, which house would you buy?\n", | |
"\n", | |
"### 5.\n", | |
"* Using the same dataset from task 4) compute the parameters of the multivariate regression model via gradient descent.\n", | |
"* Compare the calculated parameters with the ones obtained in task 4) via sklearn. Is there any difference? If so give your explanation.\n", | |
"\n", | |
"Hint: you can use a function to calculate the loss and a function to perform the gradient descent to learn the parameters. Example:\n", | |
"\n", | |
"```python\n", | |
"def comp_cost(X, y, theta):\n", | |
" \"\"\"Compute cost given X, y and parameters theta.\"\"\"\n", | |
" .\n", | |
" .\n", | |
" .\n", | |
" return J\n", | |
"```\n", | |
"\n", | |
"```python\n", | |
"def grad_descent(X, y, theta, alpha, num_iters):\n", | |
" \"\"\"Perform gradient descent\"\"\"\n", | |
" .\n", | |
" .\n", | |
" . \n", | |
" return J_history, theta\n", | |
"```\n", | |
"\n", | |
"---\n", | |
"\n", | |
"## Hints:\n", | |
"\n", | |
"* On total you can get 10 points for fully completing all tasks.\n", | |
"* Decorate your notebook with, questions, explanation etc, make it self contained and understandable!\n", | |
"* Comments you code when necessary\n", | |
"* Write functions for repetitive tasks!\n", | |
"* Use the pandas package for data loading and handling\n", | |
"* Use matplotlib and seaborn for plotting or bokeh and plotly for interactive investigation\n", | |
"* Use the scikit learn package for almost everything\n", | |
"* Use for loops only if it is really necessary!\n", | |
"* Code sharing is not allowed between student! Sharing code will result in zero points.\n", | |
"* If you use code found on web, it is OK, but, make its source clear!" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"execution_count": null, | |
"metadata": {}, | |
"outputs": [], | |
"source": [] | |
} | |
], | |
"metadata": { | |
"colab": { | |
"authorship_tag": "ABX9TyO+SGQqiJxE7tqyElTiYeKt", | |
"collapsed_sections": [], | |
"include_colab_link": true, | |
"name": "HW_4.ipynb", | |
"provenance": [] | |
}, | |
"kernelspec": { | |
"display_name": "Python 3 (ipykernel)", | |
"language": "python", | |
"name": "python3" | |
}, | |
"language_info": { | |
"codemirror_mode": { | |
"name": "ipython", | |
"version": 3 | |
}, | |
"file_extension": ".py", | |
"mimetype": "text/x-python", | |
"name": "python", | |
"nbconvert_exporter": "python", | |
"pygments_lexer": "ipython3", | |
"version": "3.8.10" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 1 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment