-
RMSE can be defined as a square root of summation over all i { [Yi -(W^T.Xi)]^2 }. We take square of the difference between predicted value and the real value before summing over in order to get rid of negative values, so that the positives and negatives don’t cancel each other out and we are left with an error which is much lesser than the actual error. We then proceed to find W by differentiating the above loss function w.r.t W and equate it to zero.
-
However, it is interesting to note that we don’t have to necessarily take the square of the difference to get rid of the negative values. The same could be achieved by taking the absolute value of the difference between predicted and real values. This is know as Mean Absolute Error or MAE. Summation over all i {abs [ Yi - (W^T.Xi)] }
-
One could argue that why do we take RMSE when MAE less computationally expensive.
The below link beautifully explains why we do that. Do give it a read.