This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Newton's method in Tensorflow | |
# 'Vanilla' N.m. intended to work when loss function to be optimized is convex. | |
# One-layer linear network without activation is convex. | |
# If activation function is monotonic, the error surface associated with a single-layer model is convex. | |
# In other cases, Hessian will have negative eigenvalues in saddle points and other non-convex places of the surface | |
# To fix that, you can try different methods. One of those approaches is to do eigendecomposition of H and invert negative eigenvalues, | |
# making H "pushing out" in those directions, as described in this paper: Identifying and attacking the saddle point problem in high-dimensional non-convex optimization (https://papers.nips.cc/paper/5486-identifying-and-attacking-the-saddle-point-problem-in-high-dimensional-non-convex-optimization.pdf) |