-
-
Save sagarmainkar/41d135a04d7d3bc4098f0664fe20cf3c to your computer and use it in GitHub Desktop.
Hi,
There seems to be a flaw in the cost function
cost = (1/2*m) * np.sum(np.square(predictions-y))
Shouldn't it be
cost = 1/(2*m) * np.sum(np.square(predictions-y))
Nice walkthrough
I was just about to make the same observation as sivi299 regarding the cost function.
In this case, since m is fixed from iteration to iteration when doing the gradient descent, I don't think it matters when it comes to optimizing the theta variable. As written, it's proportional to the mean-squared error, but it should optimize towards the same theta all the same.
The relative magnitudes of the cost function history curves differ between the gradient_descent and minibatch_gradient_descent due to different batch sizes when the cal_cost function is called, but since each algorithm uses the same number of points from iteration to iteration internally it should be OK.
Hi, this is fantastic material; thanks so much.
I think there is a typo in equation (8). Shouldn't the X's subindex be j? Meaning X_j instead of X_0?
Regards,
I need your help: f(x,y) = xx + yy