JoshuaC3/inequality-constraints-in-linear-regression.ipynb

## inequality-constraints-in-linear-regression.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "inequality constraints in linear regression.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/avidale/6668635c318aceebe0142de013a4cf77/inequality-constraints-in-linear-regression.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "oVlDkAHybHpj",
        "colab_type": "text"
      },
      "source": [
        "We want to fit a linear model $\\hat{Y}=\\alpha + X\\beta$ with a linear constrant on $\\beta$: $A\\beta \\leq B$, where $A$ is a matrix, $B$ is a vector, and vector inequality is element-wize. \n",
        "\n",
        "We do it by coordinate descent, recalculating constraints for every coefficient on each step"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "vjl-v4PIazpI",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 71
        },
        "outputId": "93b0ec26-f658-4639-ad94-d29d33845ba2"
      },
      "source": [
        "from sklearn.linear_model.base import LinearModel\n",
        "from sklearn.base import RegressorMixin\n",
        "from sklearn.utils import check_X_y\n",
        "import numpy as np\n",
        "\n",
        "class ConstrainedLinearRegression(LinearModel, RegressorMixin):\n",
        "\n",
        "    def __init__(self, A, B, fit_intercept=True, normalize=False, copy_X=True, tol=1e-15, lr=1.0):\n",
        "        self.A = A\n",
        "        self.B = B\n",
        "        self.fit_intercept = fit_intercept\n",
        "        self.normalize = normalize\n",
        "        self.copy_X = copy_X\n",
        "        self.tol = tol\n",
        "        self.lr = lr\n",
        "\n",
        "    def fit(self, X, y, initial_beta=None):\n",
        "        X, y = check_X_y(\n",
        "            X, y, \n",
        "            accept_sparse=['csr', 'csc', 'coo'], \n",
        "            y_numeric=True, multi_output=False\n",
        "        )\n",
        "        X, y, X_offset, y_offset, X_scale = self._preprocess_data(\n",
        "            X, y, \n",
        "            fit_intercept=self.fit_intercept, \n",
        "            normalize=self.normalize, \n",
        "            copy=self.copy_X\n",
        "        )\n",
        "        if initial_beta is not None:\n",
        "            # providing initial_beta may be useful, \n",
        "            # if initial solution does not respect the constraints. \n",
        "            beta = initial_beta\n",
        "        else:\n",
        "            beta = np.zeros(X.shape[1]).astype(float)\n",
        "        prev_beta = beta + 1\n",
        "        hessian = np.dot(X.transpose(), X)\n",
        "        while not (np.abs(prev_beta - beta)<self.tol).all():\n",
        "            prev_beta = beta.copy()\n",
        "            for i in range(len(beta)):\n",
        "                grad = np.dot(np.dot(X,beta) - y, X)\n",
        "                max_coef = np.inf\n",
        "                min_coef = -np.inf\n",
        "                for a_row, b_value in zip(self.A, self.B):\n",
        "                    if a_row[i] == 0:\n",
        "                        continue\n",
        "                    zero_out = beta.copy()\n",
        "                    zero_out[i] = 0\n",
        "                    bound = (b_value - np.dot(zero_out, a_row)) / a_row[i] \n",
        "                    if a_row[i] > 0:\n",
        "                        max_coef = np.minimum(max_coef, bound)\n",
        "                    elif a_row[i] < 0:\n",
        "                        min_coef = np.maximum(min_coef, bound)\n",
        "                assert min_coef <= max_coef, \"the constraints are inconsistent\"\n",
        "                beta[i] = np.minimum(\n",
        "                    max_coef, \n",
        "                    np.maximum(\n",
        "                        min_coef,\n",
        "                        beta[i] - (grad[i] / hessian[i,i]) * self.lr\n",
        "                    )\n",
        "                )\n",
        "        self.coef_ = beta\n",
        "        self._set_intercept(X_offset, y_offset, X_scale)\n",
        "        return self  "
      ],
      "execution_count": 1,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.linear_model.base module is  deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.\n",
            "  warnings.warn(message, FutureWarning)\n"
          ],
          "name": "stderr"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "E2uDGUElguDR",
        "colab_type": "text"
      },
      "source": [
        "For example, we can constrain our model that \n",
        "$$1 \\beta_1 + 1 \\beta_2 + ... + 1 \\beta_13 \\leq -7$$\n",
        "and\n",
        "$$-1 \\beta_1 -1 \\beta_2 - ... - 1 \\beta_13 \\leq 11 $$\n",
        "which is a fancy way of saying that the sum of coefficients should be between -11 and -7.\n",
        "\n",
        "And voila! Our model respects this constraint. It is your bro."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "i1LGD8agc65t",
        "colab_type": "code",
        "outputId": "8ef33477-3746-492f-f2dd-9e0ee92d621b",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 119
        }
      },
      "source": [
        "from sklearn.datasets import load_boston\n",
        "from sklearn.linear_model import LinearRegression\n",
        "X, y = load_boston(return_X_y=True)\n",
        "model = ConstrainedLinearRegression(\n",
        "    A=np.concatenate([np.ones((1, 13)), -np.ones((1, 13))]),\n",
        "    B=np.array([-7, 11])\n",
        ")\n",
        "model.fit(X, y)\n",
        "print(model.intercept_)\n",
        "print(model.coef_)\n",
        "print(model.coef_.sum())"
      ],
      "execution_count": 2,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "34.18314035303051\n",
            "[-1.06490237e-01  4.66708281e-02  8.69665528e-03  2.66974122e+00\n",
            " -1.48959727e+01  3.86142746e+00 -2.07481327e-03 -1.43212060e+00\n",
            "  2.98392331e-01 -1.25258469e-02 -9.19338378e-01  9.47601379e-03\n",
            " -5.25881977e-01]\n",
            "-11.0\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "3Yvw8BdIgu_J",
        "colab_type": "text"
      },
      "source": [
        "And this is the baseline: unconstrained linear regression. It does not respect the constraint. It is not your bro. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "CdAPpPrPc-a_",
        "colab_type": "code",
        "outputId": "0ab1335b-b3d8-49db-bc48-c6ce619772e9",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 119
        }
      },
      "source": [
        "model = LinearRegression()\n",
        "model.fit(X, y)\n",
        "print(model.intercept_)\n",
        "print(model.coef_)\n",
        "print(model.coef_.sum())"
      ],
      "execution_count": 3,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "36.459488385090125\n",
            "[-1.08011358e-01  4.64204584e-02  2.05586264e-02  2.68673382e+00\n",
            " -1.77666112e+01  3.80986521e+00  6.92224640e-04 -1.47556685e+00\n",
            "  3.06049479e-01 -1.23345939e-02 -9.52747232e-01  9.31168327e-03\n",
            " -5.24758378e-01]\n",
            "-13.9603981374292\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MWATYySJCvlq",
        "colab_type": "text"
      },
      "source": [
        "### The impact of \"learning rate\"\n",
        "\n",
        "For some inputs, coordinate descent can stuck in a suboptimal corner. To prevent it, we can decrease the learning rate."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "OdgtL5wShmZ2",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "X = np.array([[ 6.76395100e-03, 1.05127870e-02, 1.38131920e-02, 8.91216900e-03],\n",
        "[ 3.11930700e-03, -1.18833000e-04, -5.56112100e-03,-1.40301670e-02],\n",
        "[ 1.22898000e-02, 1.48354070e-02, 1.76495830e-02,1.46773540e-02],\n",
        "[ 8.20155400e-03, 1.03513650e-02, 9.93534000e-03,1.05970970e-02],\n",
        "[ 5.84342900e-03, 5.10907300e-03, 7.89939000e-03,1.37747080e-02],\n",
        "[-1.82564300e-03, -1.21698960e-02, -2.41190460e-02,-2.72551080e-02],\n",
        "[ 2.49304000e-05, -3.81050000e-03, -7.14523700e-03,-1.21222690e-02],\n",
        "[ 2.20000000e-02, 3.61578590e-02, 5.27356990e-02,6.43562460e-02],\n",
        "[ 1.04000000e-02, 1.26722980e-02, 1.50119400e-02,2.13071580e-02],\n",
        "[ 1.13000000e-02, 1.55012380e-02, 2.93593550e-02,5.01564280e-02],\n",
        "[ 4.31893300e-03, 8.90650800e-03, 1.62781490e-02,2.58799160e-02],\n",
        "[-4.10587000e-05, -3.94980900e-03, -5.08411500e-03,-9.01910500e-03],\n",
        "[-2.98000000e-04, -5.12142700e-03, -1.46469070e-02,-2.54150210e-02],\n",
        "[ 3.04000000e-03, -3.39805700e-03, -9.74336200e-03,-1.52240690e-02],\n",
        "[ 9.17000000e-03, 1.30487420e-02, 2.33631520e-02,2.75515380e-02],\n",
        "[ 1.67858300e-03, -6.15091800e-03, -1.59148770e-02,-2.34790290e-02],\n",
        "[ 7.63690800e-03, 8.28053100e-03, 5.80860500e-03,7.27193000e-04],\n",
        "[ 1.28940800e-02, -2.54326700e-03, -1.46813370e-02,-2.31787980e-02],\n",
        "[ 1.77329260e-02, 2.27723310e-02, 3.30301960e-02,4.45734430e-02],\n",
        "[ 1.02051920e-02, 1.15040150e-02, 9.76096600e-03,4.02489100e-03],\n",
        "[ 6.66632800e-03, 1.91852410e-02, 2.52234550e-02,3.31616260e-02],\n",
        "[ 4.08310700e-03, 4.35669700e-03, -6.53384000e-04,-7.84317600e-03],\n",
        "[ 7.79128400e-03, 9.49587000e-03, 1.26496340e-02,1.46050810e-02],\n",
        "[ 8.28355200e-03, 7.66034500e-03, 9.65023700e-03,6.15344600e-03],\n",
        "[ 8.79511300e-03, 1.07991700e-02, 8.79172400e-03,4.14202900e-03],\n",
        "[ 9.26750300e-03, 1.95014440e-02, 2.70481780e-02,2.18226380e-02],\n",
        "[ 5.34313000e-04, -4.78709000e-04, -4.36123200e-03,-1.08921810e-02],\n",
        "[ 7.35792400e-03, 1.13625030e-02, 1.09614140e-02,8.92494200e-03],\n",
        "[ 4.28727900e-03, 1.61191700e-03, 2.38737700e-03,7.29546800e-03],\n",
        "[ 5.54352700e-03, 3.69901700e-03, -4.00165900e-03,-5.07037900e-03],\n",
        "[ 4.55277800e-03, 9.19459000e-03, 8.14547700e-03,-2.54201800e-03],\n",
        "[ 5.42147000e-03, 1.34941860e-02, 1.90196850e-02,2.05033210e-02],\n",
        "[ 9.42743600e-03, 1.94949930e-02, 1.10851060e-02,8.66207300e-03],\n",
        "[ 7.85459900e-03, 1.68533700e-03, 8.06887000e-04,-2.86475900e-03],\n",
        "[-5.04015600e-03, -7.08023830e-02, -1.09417595e-01,-1.16886912e-01],\n",
        "[ 4.09176120e-02, 7.53775410e-02, 5.31236080e-02,1.80014280e-02]])\n"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YpYy0TPUB-rz",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "y = np.array([ 0.00981752, -0.00945781, 0.01503509, 0.01026975, 0.01106711,\n",
        "-0.02300287, -0.00972671, 0.05655607, 0.01859124, 0.03931067,\n",
        "0.02070886, -0.00702471, -0.01954932, -0.01169865, 0.02418464,\n",
        "-0.01865414, 0.00299641, -0.0170793 , 0.0386599 , 0.00638496,\n",
        "0.02901831, -0.00440659, 0.01331681, 0.00749591, 0.00638459,\n",
        "0.02274299, -0.00743834, 0.00976891, 0.00508093, -0.00349541,\n",
        "0.00221714, 0.01862392, 0.01185967, -0.00043719, -0.09747239,\n",
        "0.03918195])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "GnnzfxOiFACV",
        "colab_type": "text"
      },
      "source": [
        "The constraints below enforce that the sum of all coefficients is between 0 and 1, and each individual coefficient is between 0 and 1 as well. \n",
        "\n",
        "We run the optimization with different learning rates, and see that with learning rate of 1 the optimization gets stuck in a suboptimal corner. However, with learning rate of 1e-4 it converges to a much better solution (and that further decreasing lr yields a negligible improvement).\n",
        "\n",
        "\n",
        "\n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZSR5oIhLCChP",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 136
        },
        "outputId": "d9172978-6a1b-4f50-f02f-54e739bd194e"
      },
      "source": [
        "A = np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])\n",
        "B = np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))\n",
        "\n",
        "for lr in [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]:\n",
        "    model = ConstrainedLinearRegression(\n",
        "        A=A,\n",
        "        B=B,\n",
        "        lr=lr,\n",
        "    )\n",
        "    model.fit(X, y)\n",
        "    print(lr, ((model.predict(X) - y)**2).sum()**0.5, model.coef_, model.coef_.sum())"
      ],
      "execution_count": 32,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "1 0.12359713317487613 [1. 0. 0. 0.] 1.0\n",
            "0.1 0.06453175639689608 [0.44664155 0.22728538 0.18483899 0.14123408] 1.0\n",
            "0.01 0.057848955909340485 [0.4026343  0.22291547 0.19637926 0.17807097] 1.0\n",
            "0.001 0.0571191533538363 [0.39817337 0.22207123 0.19708779 0.1826676 ] 1.0\n",
            "0.0001 0.05706354120505105 [0.39777213 0.22207764 0.19722112 0.18292911] 1.0\n",
            "1e-05 0.05705461221555 [0.39771485 0.22206381 0.19725026 0.18297108] 1.0\n",
            "1e-06 0.05705397400955906 [0.3977102  0.22206391 0.19725191 0.18297398] 1.0\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "BIL-Kqh0HMMh",
        "colab_type": "text"
      },
      "source": [
        "Unconstrained linear regression gives a solution with much much better predictive power. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "KUOnF0bzDlVL",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "outputId": "2a839f4f-1239-4dfb-992a-04d13ef03c41"
      },
      "source": [
        "model = LinearRegression()\n",
        "model.fit(X, y)\n",
        "print(((model.predict(X) - y)**2).sum()**0.5, model.coef_, model.coef_.sum())\n",
        "old_coef = model.coef_"
      ],
      "execution_count": 36,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "0.0046915108138479355 [0.18638468 0.2053277  0.07690013 0.64233345] 1.1109459580593493\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bvn2O29RH0_W",
        "colab_type": "text"
      },
      "source": [
        "We can use the solution of the unconstrained model (after normalization) as a seed for the constrained model. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "1Hx7W7j8HLgo",
        "colab_type": "code",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 34
        },
        "outputId": "de2f7b2d-4899-4edc-9f13-b07317d4a502"
      },
      "source": [
        "model = ConstrainedLinearRegression(\n",
        "    A=A,\n",
        "    B=B,\n",
        "    lr=1e-4,\n",
        ")\n",
        "model.fit(X, y, initial_beta=old_coef / sum(old_coef))\n",
        "print(((model.predict(X) - y)**2).sum()**0.5, model.coef_, model.coef_.sum())"
      ],
      "execution_count": 39,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "0.01568685482946043 [0.16777115 0.1848224  0.06922041 0.57818605] 1.0\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "znP6Ubp1Hki1",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "inequality constraints in linear regression.ipynb",
	"provenance": [],
	"collapsed_sections": [],
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/avidale/6668635c318aceebe0142de013a4cf77/inequality-constraints-in-linear-regression.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "oVlDkAHybHpj",
	"colab_type": "text"
	},
	"source": [
	"We want to fit a linear model $\\hat{Y}=\\alpha + X\\beta$ with a linear constrant on $\\beta$: $A\\beta \\leq B$, where $A$ is a matrix, $B$ is a vector, and vector inequality is element-wize. \n",
	"\n",
	"We do it by coordinate descent, recalculating constraints for every coefficient on each step"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "vjl-v4PIazpI",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 71
	},
	"outputId": "93b0ec26-f658-4639-ad94-d29d33845ba2"
	},
	"source": [
	"from sklearn.linear_model.base import LinearModel\n",
	"from sklearn.base import RegressorMixin\n",
	"from sklearn.utils import check_X_y\n",
	"import numpy as np\n",
	"\n",
	"class ConstrainedLinearRegression(LinearModel, RegressorMixin):\n",
	"\n",
	" def __init__(self, A, B, fit_intercept=True, normalize=False, copy_X=True, tol=1e-15, lr=1.0):\n",
	" self.A = A\n",
	" self.B = B\n",
	" self.fit_intercept = fit_intercept\n",
	" self.normalize = normalize\n",
	" self.copy_X = copy_X\n",
	" self.tol = tol\n",
	" self.lr = lr\n",
	"\n",
	" def fit(self, X, y, initial_beta=None):\n",
	" X, y = check_X_y(\n",
	" X, y, \n",
	" accept_sparse=['csr', 'csc', 'coo'], \n",
	" y_numeric=True, multi_output=False\n",
	" )\n",
	" X, y, X_offset, y_offset, X_scale = self._preprocess_data(\n",
	" X, y, \n",
	" fit_intercept=self.fit_intercept, \n",
	" normalize=self.normalize, \n",
	" copy=self.copy_X\n",
	" )\n",
	" if initial_beta is not None:\n",
	" # providing initial_beta may be useful, \n",
	" # if initial solution does not respect the constraints. \n",
	" beta = initial_beta\n",
	" else:\n",
	" beta = np.zeros(X.shape[1]).astype(float)\n",
	" prev_beta = beta + 1\n",
	" hessian = np.dot(X.transpose(), X)\n",
	" while not (np.abs(prev_beta - beta)<self.tol).all():\n",
	" prev_beta = beta.copy()\n",
	" for i in range(len(beta)):\n",
	" grad = np.dot(np.dot(X,beta) - y, X)\n",
	" max_coef = np.inf\n",
	" min_coef = -np.inf\n",
	" for a_row, b_value in zip(self.A, self.B):\n",
	" if a_row[i] == 0:\n",
	" continue\n",
	" zero_out = beta.copy()\n",
	" zero_out[i] = 0\n",
	" bound = (b_value - np.dot(zero_out, a_row)) / a_row[i] \n",
	" if a_row[i] > 0:\n",
	" max_coef = np.minimum(max_coef, bound)\n",
	" elif a_row[i] < 0:\n",
	" min_coef = np.maximum(min_coef, bound)\n",
	" assert min_coef <= max_coef, \"the constraints are inconsistent\"\n",
	" beta[i] = np.minimum(\n",
	" max_coef, \n",
	" np.maximum(\n",
	" min_coef,\n",
	" beta[i] - (grad[i] / hessian[i,i]) * self.lr\n",
	" )\n",
	" )\n",
	" self.coef_ = beta\n",
	" self._set_intercept(X_offset, y_offset, X_scale)\n",
	" return self "
	],
	"execution_count": 1,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"/usr/local/lib/python3.6/dist-packages/sklearn/utils/deprecation.py:144: FutureWarning: The sklearn.linear_model.base module is deprecated in version 0.22 and will be removed in version 0.24. The corresponding classes / functions should instead be imported from sklearn.linear_model. Anything that cannot be imported from sklearn.linear_model is now part of the private API.\n",
	" warnings.warn(message, FutureWarning)\n"
	],
	"name": "stderr"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "E2uDGUElguDR",
	"colab_type": "text"
	},
	"source": [
	"For example, we can constrain our model that \n",
	"$$1 \\beta_1 + 1 \\beta_2 + ... + 1 \\beta_13 \\leq -7$$\n",
	"and\n",
	"$$-1 \\beta_1 -1 \\beta_2 - ... - 1 \\beta_13 \\leq 11 $$\n",
	"which is a fancy way of saying that the sum of coefficients should be between -11 and -7.\n",
	"\n",
	"And voila! Our model respects this constraint. It is your bro."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "i1LGD8agc65t",
	"colab_type": "code",
	"outputId": "8ef33477-3746-492f-f2dd-9e0ee92d621b",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 119
	}
	},
	"source": [
	"from sklearn.datasets import load_boston\n",
	"from sklearn.linear_model import LinearRegression\n",
	"X, y = load_boston(return_X_y=True)\n",
	"model = ConstrainedLinearRegression(\n",
	" A=np.concatenate([np.ones((1, 13)), -np.ones((1, 13))]),\n",
	" B=np.array([-7, 11])\n",
	")\n",
	"model.fit(X, y)\n",
	"print(model.intercept_)\n",
	"print(model.coef_)\n",
	"print(model.coef_.sum())"
	],
	"execution_count": 2,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"34.18314035303051\n",
	"[-1.06490237e-01 4.66708281e-02 8.69665528e-03 2.66974122e+00\n",
	" -1.48959727e+01 3.86142746e+00 -2.07481327e-03 -1.43212060e+00\n",
	" 2.98392331e-01 -1.25258469e-02 -9.19338378e-01 9.47601379e-03\n",
	" -5.25881977e-01]\n",
	"-11.0\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "3Yvw8BdIgu_J",
	"colab_type": "text"
	},
	"source": [
	"And this is the baseline: unconstrained linear regression. It does not respect the constraint. It is not your bro. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "CdAPpPrPc-a_",
	"colab_type": "code",
	"outputId": "0ab1335b-b3d8-49db-bc48-c6ce619772e9",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 119
	}
	},
	"source": [
	"model = LinearRegression()\n",
	"model.fit(X, y)\n",
	"print(model.intercept_)\n",
	"print(model.coef_)\n",
	"print(model.coef_.sum())"
	],
	"execution_count": 3,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"36.459488385090125\n",
	"[-1.08011358e-01 4.64204584e-02 2.05586264e-02 2.68673382e+00\n",
	" -1.77666112e+01 3.80986521e+00 6.92224640e-04 -1.47556685e+00\n",
	" 3.06049479e-01 -1.23345939e-02 -9.52747232e-01 9.31168327e-03\n",
	" -5.24758378e-01]\n",
	"-13.9603981374292\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "MWATYySJCvlq",
	"colab_type": "text"
	},
	"source": [
	"### The impact of \"learning rate\"\n",
	"\n",
	"For some inputs, coordinate descent can stuck in a suboptimal corner. To prevent it, we can decrease the learning rate."
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "OdgtL5wShmZ2",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"X = np.array([[ 6.76395100e-03, 1.05127870e-02, 1.38131920e-02, 8.91216900e-03],\n",
	"[ 3.11930700e-03, -1.18833000e-04, -5.56112100e-03,-1.40301670e-02],\n",
	"[ 1.22898000e-02, 1.48354070e-02, 1.76495830e-02,1.46773540e-02],\n",
	"[ 8.20155400e-03, 1.03513650e-02, 9.93534000e-03,1.05970970e-02],\n",
	"[ 5.84342900e-03, 5.10907300e-03, 7.89939000e-03,1.37747080e-02],\n",
	"[-1.82564300e-03, -1.21698960e-02, -2.41190460e-02,-2.72551080e-02],\n",
	"[ 2.49304000e-05, -3.81050000e-03, -7.14523700e-03,-1.21222690e-02],\n",
	"[ 2.20000000e-02, 3.61578590e-02, 5.27356990e-02,6.43562460e-02],\n",
	"[ 1.04000000e-02, 1.26722980e-02, 1.50119400e-02,2.13071580e-02],\n",
	"[ 1.13000000e-02, 1.55012380e-02, 2.93593550e-02,5.01564280e-02],\n",
	"[ 4.31893300e-03, 8.90650800e-03, 1.62781490e-02,2.58799160e-02],\n",
	"[-4.10587000e-05, -3.94980900e-03, -5.08411500e-03,-9.01910500e-03],\n",
	"[-2.98000000e-04, -5.12142700e-03, -1.46469070e-02,-2.54150210e-02],\n",
	"[ 3.04000000e-03, -3.39805700e-03, -9.74336200e-03,-1.52240690e-02],\n",
	"[ 9.17000000e-03, 1.30487420e-02, 2.33631520e-02,2.75515380e-02],\n",
	"[ 1.67858300e-03, -6.15091800e-03, -1.59148770e-02,-2.34790290e-02],\n",
	"[ 7.63690800e-03, 8.28053100e-03, 5.80860500e-03,7.27193000e-04],\n",
	"[ 1.28940800e-02, -2.54326700e-03, -1.46813370e-02,-2.31787980e-02],\n",
	"[ 1.77329260e-02, 2.27723310e-02, 3.30301960e-02,4.45734430e-02],\n",
	"[ 1.02051920e-02, 1.15040150e-02, 9.76096600e-03,4.02489100e-03],\n",
	"[ 6.66632800e-03, 1.91852410e-02, 2.52234550e-02,3.31616260e-02],\n",
	"[ 4.08310700e-03, 4.35669700e-03, -6.53384000e-04,-7.84317600e-03],\n",
	"[ 7.79128400e-03, 9.49587000e-03, 1.26496340e-02,1.46050810e-02],\n",
	"[ 8.28355200e-03, 7.66034500e-03, 9.65023700e-03,6.15344600e-03],\n",
	"[ 8.79511300e-03, 1.07991700e-02, 8.79172400e-03,4.14202900e-03],\n",
	"[ 9.26750300e-03, 1.95014440e-02, 2.70481780e-02,2.18226380e-02],\n",
	"[ 5.34313000e-04, -4.78709000e-04, -4.36123200e-03,-1.08921810e-02],\n",
	"[ 7.35792400e-03, 1.13625030e-02, 1.09614140e-02,8.92494200e-03],\n",
	"[ 4.28727900e-03, 1.61191700e-03, 2.38737700e-03,7.29546800e-03],\n",
	"[ 5.54352700e-03, 3.69901700e-03, -4.00165900e-03,-5.07037900e-03],\n",
	"[ 4.55277800e-03, 9.19459000e-03, 8.14547700e-03,-2.54201800e-03],\n",
	"[ 5.42147000e-03, 1.34941860e-02, 1.90196850e-02,2.05033210e-02],\n",
	"[ 9.42743600e-03, 1.94949930e-02, 1.10851060e-02,8.66207300e-03],\n",
	"[ 7.85459900e-03, 1.68533700e-03, 8.06887000e-04,-2.86475900e-03],\n",
	"[-5.04015600e-03, -7.08023830e-02, -1.09417595e-01,-1.16886912e-01],\n",
	"[ 4.09176120e-02, 7.53775410e-02, 5.31236080e-02,1.80014280e-02]])\n"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "YpYy0TPUB-rz",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	"y = np.array([ 0.00981752, -0.00945781, 0.01503509, 0.01026975, 0.01106711,\n",
	"-0.02300287, -0.00972671, 0.05655607, 0.01859124, 0.03931067,\n",
	"0.02070886, -0.00702471, -0.01954932, -0.01169865, 0.02418464,\n",
	"-0.01865414, 0.00299641, -0.0170793 , 0.0386599 , 0.00638496,\n",
	"0.02901831, -0.00440659, 0.01331681, 0.00749591, 0.00638459,\n",
	"0.02274299, -0.00743834, 0.00976891, 0.00508093, -0.00349541,\n",
	"0.00221714, 0.01862392, 0.01185967, -0.00043719, -0.09747239,\n",
	"0.03918195])"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "GnnzfxOiFACV",
	"colab_type": "text"
	},
	"source": [
	"The constraints below enforce that the sum of all coefficients is between 0 and 1, and each individual coefficient is between 0 and 1 as well. \n",
	"\n",
	"We run the optimization with different learning rates, and see that with learning rate of 1 the optimization gets stuck in a suboptimal corner. However, with learning rate of 1e-4 it converges to a much better solution (and that further decreasing lr yields a negligible improvement).\n",
	"\n",
	"\n",
	"\n"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "ZSR5oIhLCChP",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 136
	},
	"outputId": "d9172978-6a1b-4f50-f02f-54e739bd194e"
	},
	"source": [
	"A = np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])\n",
	"B = np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))\n",
	"\n",
	"for lr in [1, 1e-1, 1e-2, 1e-3, 1e-4, 1e-5, 1e-6]:\n",
	" model = ConstrainedLinearRegression(\n",
	" A=A,\n",
	" B=B,\n",
	" lr=lr,\n",
	" )\n",
	" model.fit(X, y)\n",
	" print(lr, ((model.predict(X) - y)2).sum()0.5, model.coef_, model.coef_.sum())"
	],
	"execution_count": 32,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"1 0.12359713317487613 [1. 0. 0. 0.] 1.0\n",
	"0.1 0.06453175639689608 [0.44664155 0.22728538 0.18483899 0.14123408] 1.0\n",
	"0.01 0.057848955909340485 [0.4026343 0.22291547 0.19637926 0.17807097] 1.0\n",
	"0.001 0.0571191533538363 [0.39817337 0.22207123 0.19708779 0.1826676 ] 1.0\n",
	"0.0001 0.05706354120505105 [0.39777213 0.22207764 0.19722112 0.18292911] 1.0\n",
	"1e-05 0.05705461221555 [0.39771485 0.22206381 0.19725026 0.18297108] 1.0\n",
	"1e-06 0.05705397400955906 [0.3977102 0.22206391 0.19725191 0.18297398] 1.0\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "BIL-Kqh0HMMh",
	"colab_type": "text"
	},
	"source": [
	"Unconstrained linear regression gives a solution with much much better predictive power. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "KUOnF0bzDlVL",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 34
	},
	"outputId": "2a839f4f-1239-4dfb-992a-04d13ef03c41"
	},
	"source": [
	"model = LinearRegression()\n",
	"model.fit(X, y)\n",
	"print(((model.predict(X) - y)2).sum()0.5, model.coef_, model.coef_.sum())\n",
	"old_coef = model.coef_"
	],
	"execution_count": 36,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"0.0046915108138479355 [0.18638468 0.2053277 0.07690013 0.64233345] 1.1109459580593493\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "bvn2O29RH0_W",
	"colab_type": "text"
	},
	"source": [
	"We can use the solution of the unconstrained model (after normalization) as a seed for the constrained model. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "1Hx7W7j8HLgo",
	"colab_type": "code",
	"colab": {
	"base_uri": "https://localhost:8080/",
	"height": 34
	},
	"outputId": "de2f7b2d-4899-4edc-9f13-b07317d4a502"
	},
	"source": [
	"model = ConstrainedLinearRegression(\n",
	" A=A,\n",
	" B=B,\n",
	" lr=1e-4,\n",
	")\n",
	"model.fit(X, y, initial_beta=old_coef / sum(old_coef))\n",
	"print(((model.predict(X) - y)2).sum()0.5, model.coef_, model.coef_.sum())"
	],
	"execution_count": 39,
	"outputs": [
	{
	"output_type": "stream",
	"text": [
	"0.01568685482946043 [0.16777115 0.1848224 0.06922041 0.57818605] 1.0\n"
	],
	"name": "stdout"
	}
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"id": "znP6Ubp1Hki1",
	"colab_type": "code",
	"colab": {}
	},
	"source": [
	""
	],
	"execution_count": 0,
	"outputs": []
	}
	]
	}