Create a gist now

Instantly share code, notes, and snippets.

Logistic prediction
def sigmoid(X):
'''Compute the sigmoid function '''
#d = zeros(shape=(X.shape))
den = 1.0 + e ** (-1.0 * X)
d = 1.0 / den
return d
def compute_cost(theta,X,y): #computes cost given predicted and actual values
m = X.shape[0] #number of training examples
theta = reshape(theta,(len(theta),1))
#y = reshape(y,(len(y),1))
J = (1./m) * (-transpose(y).dot(log(sigmoid( - transpose(1-y).dot(log(1-sigmoid(
grad = transpose((1./m)*transpose(sigmoid( - y).dot(X))
#optimize.fmin expects a single value, so cannot return grad
return J[0][0]#,grad
def compute_grad(theta, X, y):
#print theta.shape
theta.shape = (1, 3)
grad = zeros(3)
h = sigmoid(
delta = h - y
l = grad.size
for i in range(l):
sumdelta =[:, i])
grad[i] = (1.0 / m) * sumdelta * - 1
theta.shape = (3,)
return grad

Here are some slightly simplified versions. I modified grad to be slightly more vectorized. I also took out the negatives in the cost function and gradient.

def sigmoid(X):
    return 1 / (1 + numpy.exp(- X))

def cost(theta, X, y):
    p_1 = sigmoid(, theta)) # predicted probability of label 1
    log_l = (-y)*numpy.log(p_1) - (1-y)*numpy.log(1-p_1) # log-likelihood vector

    return log_l.mean()

def grad(theta, X, y):
    p_1 = sigmoid(, theta))
    error = p_1 - y # difference between label and prediction
    grad =, X_1) / y.size # gradient vector

    return grad

Here's how I ran them:

import scipy.optimize as opt

# prefix an extra column of ones to the feature matrix (for intercept term)
theta = 0.1* numpy.random.randn(3)
X_1 = numpy.append( numpy.ones((X.shape[0], 1)), X, axis=1)

theta_1 = opt.fmin_bfgs(cost, theta, fprime=grad, args=(X_1, y))

Some initial values of theta cause it to fail to converge. Just run it again.


Hi @waylonflinn I solved the problem updating the cost function.


hi all,

why does the gradient have to be scaled by y.size ?
Thank you!

graffaner commented Nov 7, 2016 edited

@cipri-tom scaled by y.size to calculate the average since summed over all training data I think

vinipachecov commented Feb 16, 2017 edited

I think your code is not converging to the minimum. As I tested is not converging at all. Check this tutorial code that I'll post below here and I think it will make sense. Your full code, that is not this one is converging to the initial parameter vector, which is (0,0,0). Try using fmin_tnc instead of fmin_bfgs.
Btw, excellent work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment