Skip to content

Instantly share code, notes, and snippets.

@tristanwietsma
Last active October 9, 2023 18:55
Show Gist options
  • Save tristanwietsma/8481824 to your computer and use it in GitHub Desktop.
Save tristanwietsma/8481824 to your computer and use it in GitHub Desktop.
Access glmnet through RPy2
import numpy as np
import rpy2.robjects as ro
import rpy2.robjects.numpy2ri as n2r
n2r.activate()
r = ro.r
r.library('glmnet')
# input files (for this example) need to have header and NO index column
X = np.loadtxt('./x.csv', dtype=float, delimiter=',', skiprows=1)
y = np.loadtxt('./y.csv', dtype=int, delimiter=',', skiprows=1)
y = ro.FactorVector(list(y.transpose())) # use factors
trained_model = r['cv.glmnet'](X, y, nfolds=3, family="binomial")
lambda_ = np.asanyarray(trained_model.rx2('lambda'))
cvm_ = np.asanyarray(trained_model.rx2('cvm'))
cvsd_ = np.asanyarray(trained_model.rx2('cvsd'))
lambda_min = np.asanyarray(trained_model.rx2('lambda.min'))[0]
min_cvm = cvm_[np.argwhere(lambda_ == lambda_min)[0][0]]
idx = np.argwhere(cvm_ < min_cvm + 0.1*cvsd_)
idx[0]
fit = trained_model.rx2('glmnet.fit')
beta = n2r.ri2numpy(r['as.matrix'](fit.rx2('beta')))
relvars = np.argwhere(beta[:,idx[0]].transpose()[0] > 1e-5)
print relvars.transpose()[0]
@abhishek-ghose
Copy link

abhishek-ghose commented Feb 22, 2019

@tristanwietsma
I just found this and as a rpy2 and glmnet noob, thank you!
I am also trying to pass in the weight vector to glmnet - my problem is multinomial classification with class imbalance - do you know what Python object this should be passed in as?

I have tried trained_model = r['glmnet'](X, y, family="multinomial", dfmax=1, weights=np.asarray([1] * np.shape(X)[0])), and that doesn't work giving me this error: rpy2.rinterface.RRuntimeError: Error in y * weights : non-conformable arrays. The error makes me wonder if the weights parameter be can be even specified for the multinomial case; why is it attempting to multiply by y?

EDIT: dfmax was for trying out something and that's not integral to the problem here. I wanted to quickly build a model with one variable (come to think of it I am not sure if dfmax enforces that or pmax).

@abhishek-ghose
Copy link

I finally figured this out, leaving a note here for others. The right datatype is the FloatVector. The weight vector can be casted to it, for ex here's a list of the size of datapoints in X consisting of only 1s: rpy2.robjects.FloatVector([1.0] * numpy.shape(X)[0])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment