Skip to content

Instantly share code, notes, and snippets.

Last active September 27, 2021 17:57
Show Gist options
  • Save qbeer/6bcdfa258286bdb92f370a6146260795 to your computer and use it in GitHub Desktop.
Save qbeer/6bcdfa258286bdb92f370a6146260795 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "HW_4_raw.ipynb",
"provenance": [],
"collapsed_sections": [],
"authorship_tag": "ABX9TyPSztI7UgdIOpzGyciCUmm+",
"include_colab_link": true
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
"language_info": {
"name": "python"
"cells": [
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
"source": [
"<a href=\"\" target=\"_parent\"><img src=\"\" alt=\"Open In Colab\"/></a>"
"cell_type": "markdown",
"metadata": {
"id": "L_NT2cWGTvgo"
"source": [
"# Linear regression\n",
"### 1. Load the provided .npy files. You can load it with numpy.\n",
"* each file contains one vector, X and y\n",
"* visualize X vs y on a scatter plot\n",
"* fit an y=w_0 + w_1⋅X + w_2⋅X^2 linear regression using `sklearn`\n",
"### 2. Using different features\n",
"* plot the residuals (the difference between the prediction and the actual y ) vs the original y \n",
"* a non-random-noise like pattern suggests non-linear connection between the features and the predictions\n",
"* someone told us that the connection between X and y is y=A⋅X+B⋅cos^3(X)+C⋅X^2+D \n",
" * using sklearn's linear regression estimate A,B,C,D !\n",
"* plot the residuals again! is it better now?\n",
"### 3. Other methdods than sklearn for linear regression\n",
"* using the statsmodels package perform the same linear regression as in 2.) (hint: use statsmodels.api.OLS)\n",
"* is the result the same? if not guess, why? (did you not forget to add the constant term?)\n",
"* try to get the same results with statsmodels as with sklearn!\n",
"* using the analytic solution formula shown during the lecture, calculate the coefficients (A, B, C, D). are they the same compared to the two previous methods?\n",
"### 4.\n",
"* load the [real_estate]( data to a pandas dataframe\n",
"drop the ID column and the geographic location columns\n",
"fit a linear regression model to predict the unit price using sklearn\n",
"### 5.\n",
"* interpret the coefficients and their meaning shortly with your own words\n",
"* plot the residuals for the predictions. if you had to decide only on this information, which house would you buy?\n",
"## Hints:\n",
"* On total you can get 10 points for fully completing all tasks.\n",
"* Decorate your notebook with, questions, explanation etc, make it self contained and understandable!\n",
"* Comments you code when necessary\n",
"* Write functions for repetitive tasks!\n",
"* Use the pandas package for data loading and handling\n",
"* Use matplotlib and seaborn for plotting or bokeh and plotly for interactive investigation\n",
"* Use the scikit learn package for almost everything\n",
"* Use for loops only if it is really necessary!\n",
"* Code sharing is not allowed between student! Sharing code will result in zero points.\n",
"* If you use code found on web, it is OK, but, make its source clear!"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment