Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save avidale/6668635c318aceebe0142de013a4cf77 to your computer and use it in GitHub Desktop.
Save avidale/6668635c318aceebe0142de013a4cf77 to your computer and use it in GitHub Desktop.
inequality constraints in linear regression.ipynb
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@jfrasco
Copy link

jfrasco commented May 12, 2020

This is brilliant, but I can't get it to work with individual constraints, in addition to the multiple constraints i.e. I've added individual constraints that all coefficients must be between 0 and 1, as well as the sum of all of them.

A=np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])
B=np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))

@avidale
Copy link
Author

avidale commented May 12, 2020

@jfrasco can you give a specific example?
With the Boston dataset, your constraints work fine:

X, y = load_boston(return_X_y=True)
A = np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])
B = np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))

model = ConstrainedLinearRegression(
    A=A,
    B=B,
)
model.fit(X, y)
print(model.intercept_)
print(model.coef_)
print(model.coef_.sum())

produces

20.858240975103065
[0.         0.14213999 0.         0.85786001 0.         0.
 0.         0.         0.         0.         0.         0.
 0.        ]
1.0

@antipisa
Copy link

antipisa commented May 29, 2020

@avidale Does this work with generalized linear models like Lasso and Ridge?

@avidale
Copy link
Author

avidale commented May 29, 2020

Yes, it does. For Ridge adaptation is particularly easy, because Ridge uses exactly the same math as OLS - you need to change only a couple of code lines to add the L2 penalty.
With Lasso, however, the loss function is no longer quadratic, so the adaptation will be more messy. But it is still possible.

@jfrasco
Copy link

jfrasco commented May 29, 2020

@jfrasco can you give a specific example?
With the Boston dataset, your constraints work fine:

X, y = load_boston(return_X_y=True)
A = np.concatenate([-np.identity(X.shape[1]), np.identity(X.shape[1]), np.ones((1, X.shape[1])), -np.ones((1, X.shape[1]))])
B = np.concatenate((np.zeros(X.shape[1]), np.ones(X.shape[1]), np.array([1, 0])))

model = ConstrainedLinearRegression(
    A=A,
    B=B,
)
model.fit(X, y)
print(model.intercept_)
print(model.coef_)
print(model.coef_.sum())

produces

20.858240975103065
[0.         0.14213999 0.         0.85786001 0.         0.
 0.         0.         0.         0.         0.         0.
 0.        ]
1.0

I can definitely confirm that it works on the Boston data as I just tried it, but I can't get it to work on my data. Not sure how to share my data, but I essentially have 4 columns of 36 months of monthly returns for my X, and one column of 36 monthly returns for my y.

@jfrasco
Copy link

jfrasco commented May 29, 2020

I get the following from a normal LinearRegression with no constraints:
-0.0004900024100277615
[0.18638477 0.2053274 0.07690057 0.64233321]
1.1109459483489401

Using the above, I get this:
-0.002232462269444445
[1. 0. 0. 0.]
1.0

Which is clearly wrong!

@jfrasco
Copy link

jfrasco commented May 29, 2020

Here's my X:
array([[ 6.76395100e-03, 1.05127870e-02, 1.38131920e-02,
8.91216900e-03],
[ 3.11930700e-03, -1.18833000e-04, -5.56112100e-03,
-1.40301670e-02],
[ 1.22898000e-02, 1.48354070e-02, 1.76495830e-02,
1.46773540e-02],
[ 8.20155400e-03, 1.03513650e-02, 9.93534000e-03,
1.05970970e-02],
[ 5.84342900e-03, 5.10907300e-03, 7.89939000e-03,
1.37747080e-02],
[-1.82564300e-03, -1.21698960e-02, -2.41190460e-02,
-2.72551080e-02],
[ 2.49304000e-05, -3.81050000e-03, -7.14523700e-03,
-1.21222690e-02],
[ 2.20000000e-02, 3.61578590e-02, 5.27356990e-02,
6.43562460e-02],
[ 1.04000000e-02, 1.26722980e-02, 1.50119400e-02,
2.13071580e-02],
[ 1.13000000e-02, 1.55012380e-02, 2.93593550e-02,
5.01564280e-02],
[ 4.31893300e-03, 8.90650800e-03, 1.62781490e-02,
2.58799160e-02],
[-4.10587000e-05, -3.94980900e-03, -5.08411500e-03,
-9.01910500e-03],
[-2.98000000e-04, -5.12142700e-03, -1.46469070e-02,
-2.54150210e-02],
[ 3.04000000e-03, -3.39805700e-03, -9.74336200e-03,
-1.52240690e-02],
[ 9.17000000e-03, 1.30487420e-02, 2.33631520e-02,
2.75515380e-02],
[ 1.67858300e-03, -6.15091800e-03, -1.59148770e-02,
-2.34790290e-02],
[ 7.63690800e-03, 8.28053100e-03, 5.80860500e-03,
7.27193000e-04],
[ 1.28940800e-02, -2.54326700e-03, -1.46813370e-02,
-2.31787980e-02],
[ 1.77329260e-02, 2.27723310e-02, 3.30301960e-02,
4.45734430e-02],
[ 1.02051920e-02, 1.15040150e-02, 9.76096600e-03,
4.02489100e-03],
[ 6.66632800e-03, 1.91852410e-02, 2.52234550e-02,
3.31616260e-02],
[ 4.08310700e-03, 4.35669700e-03, -6.53384000e-04,
-7.84317600e-03],
[ 7.79128400e-03, 9.49587000e-03, 1.26496340e-02,
1.46050810e-02],
[ 8.28355200e-03, 7.66034500e-03, 9.65023700e-03,
6.15344600e-03],
[ 8.79511300e-03, 1.07991700e-02, 8.79172400e-03,
4.14202900e-03],
[ 9.26750300e-03, 1.95014440e-02, 2.70481780e-02,
2.18226380e-02],
[ 5.34313000e-04, -4.78709000e-04, -4.36123200e-03,
-1.08921810e-02],
[ 7.35792400e-03, 1.13625030e-02, 1.09614140e-02,
8.92494200e-03],
[ 4.28727900e-03, 1.61191700e-03, 2.38737700e-03,
7.29546800e-03],
[ 5.54352700e-03, 3.69901700e-03, -4.00165900e-03,
-5.07037900e-03],
[ 4.55277800e-03, 9.19459000e-03, 8.14547700e-03,
-2.54201800e-03],
[ 5.42147000e-03, 1.34941860e-02, 1.90196850e-02,
2.05033210e-02],
[ 9.42743600e-03, 1.94949930e-02, 1.10851060e-02,
8.66207300e-03],
[ 7.85459900e-03, 1.68533700e-03, 8.06887000e-04,
-2.86475900e-03],
[-5.04015600e-03, -7.08023830e-02, -1.09417595e-01,
-1.16886912e-01],
[ 4.09176120e-02, 7.53775410e-02, 5.31236080e-02,
1.80014280e-02]])

@jfrasco
Copy link

jfrasco commented May 29, 2020

Here's my y:
array([ 0.00981752, -0.00945781, 0.01503509, 0.01026975, 0.01106711,
-0.02300287, -0.00972671, 0.05655607, 0.01859124, 0.03931067,
0.02070886, -0.00702471, -0.01954932, -0.01169865, 0.02418464,
-0.01865414, 0.00299641, -0.0170793 , 0.0386599 , 0.00638496,
0.02901831, -0.00440659, 0.01331681, 0.00749591, 0.00638459,
0.02274299, -0.00743834, 0.00976891, 0.00508093, -0.00349541,
0.00221714, 0.01862392, 0.01185967, -0.00043719, -0.09747239,
0.03918195])

@antipisa
Copy link

This is a really nice application for constraining the sums of coefficients, but can you do it together with individual beta constraints? For instance enforcing a bound on the sum as well as an upper bound on each beta.

@avidale
Copy link
Author

avidale commented May 29, 2020

@jfrasco yes, it seems that your data is a case when coordinate descent sucks, because the optimization algorithm is allowed to move only along the axes, and it gets stuck in a sharp corner of the restricted area.

One way of not completely solving, but mitigating this problem would be to decrease the "learning rate": instead of substracting grad[i] / hessian[i,i] from beta[i], substract only its fraction on each step.

In deep learning, learning rate about 1e-4 is often good, and for your particular dataset it is good as well, producing the vector of coefficients [0.39777213 0.22207764 0.19722112 0.18292911].

I have updated the code to make the learning rate a tunable hyperparameter.

This solution is still far from optimal, so maybe you want to use more sophisticated algorithms, such as L-BFGS-B (it is implemented e.g. in scipy).

@avidale
Copy link
Author

avidale commented May 29, 2020

@antipisa Yes, the second example (created by @jfrasco), that I have added to my notebook 2 minutes ago, does exactly this: constrains each individual coefficient, as well as the sum of them.

@jfrasco
Copy link

jfrasco commented May 30, 2020

@avidale Thanks for the quick response. I will try out on other data I have to see how it works, but this should have been a simple case, because my y (The South African All Bond Index), is actually made up of the Xs (the individual components of the ALBI representing the different duration buckets).

I will investigate your suggestion above, but is this an alternative to what your method solves, as this is exactly what I am looking for?

@avidale
Copy link
Author

avidale commented May 30, 2020

@jfrasco yes, exactly, L-BFGS-B is a more sophisticated alternative to my method.

@avidale
Copy link
Author

avidale commented Jun 1, 2020

@davendw49 please look at the examples above

@davendw49
Copy link

@davendw49 please look at the examples above

oops, i miss that part.

@jasmeet1996
Copy link

Hello,

I have used your code for my dataset and it works. Thanks for sharing this. I still needed a bit of help/guidance though. The sklearn linear regression also allows you to use weights (sample_weights). I wanted to know, how can I add sample weights to this code. If you could give me help with that. Use both specific constraints on each coefficient and also use specific weights for each sample. Thank you.

@neerjachandra05
Copy link

Hi, how to put both constraints together, min beta value should be 1 and sum of all the betas should be equal or less than a particular no

@jfrasco
Copy link

jfrasco commented Sep 30, 2020

Hi, how to put both constraints together, min beta value should be 1 and sum of all the betas should be equal or less than a particular no

See my example above. -np.identity provides a single 1 for each beta under A, and you will then include the corresponding minimum value in B.

@neerjachandra05
Copy link

Its not working.
I am putting 2 conditions together:

  1. min beta value - 1
  2. sum of all beta <= a no

@jfrasco
Copy link

jfrasco commented Sep 30, 2020

Please paste your code.

@neerjachandra05
Copy link

Can I also make intercept positive. if yes, please let me know how? I am using the same code, just making min_coef = 1

@jpatria
Copy link

jpatria commented Nov 3, 2020

@avidale how is the coordinate descent computing the constraints? For example, if I used an interior point solver, it creates a barrier function for inequalities and then a newton line search is used.

EDIT: And what do you mean by "recalculating constraints for every coefficient on each step"?

@ywu-stats
Copy link

Thank you very much for sharing. This is what I'm looking for. Is there a way to just constrain specific subset of features and leave everything else as is?

@handelay
Copy link

Brilliant solution and well explained. I'm late to the party here but how would one modify this to use weighted-least-squares with constraints?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment