{{ message }}

Instantly share code, notes, and snippets.

fperez/SimpleNeuralNets.ipynb

Last active May 28, 2019
Notes for "Why does deep and cheap learning work so well?" (ArXiv:1608.08225v1/cond-mat.dis-nn) by Lin and Tegmark.
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

hwlin76 commented Sep 10, 2016

 Thanks for the interest in our paper: you are right about the factor of 2! It is correct in equation (11) but incorrect in the figure. We are correcting it in the next draft and will acknowledge you appropriately!

fperez commented Sep 11, 2016

 Thanks for pitching in, @hwlin76! Lovely paper: I had been thinking about these questions for a while in my "spare time", and I am finding it very interesting. One question: in eq. (11) your denominator is still $1/8\sigma_2$, so I'm a bit confused by your comment. I thought the value in the figure came from the same 1/8 in (11). Am I misunderstanding? BTW, I added a bit of error analysis to the notebook above, it's nicely informative (and consistent with the analytical estimates, up to numerical precision).

hwlin76 commented Sep 11, 2016

 Glad to hear that the paper has resonated with you! Yes, you are right that the 1/8 in the figure comes from equation (11), but we have defined sigma_2 somewhat confusingly as sigma''/2 (see equation 10). We'll likely change this quirky notation in the next iteration of the paper. Your error plots are very interesting. It would be cool to see how the error scales for approximating complicated multivariate polynomials in addition to just the multiplication gate.

fperez commented Sep 12, 2016

 Ah @hwlin76, of course! I just didn't pay enough attention to eq. (10) and fell in the same trap you probably did: once I accepted it as correct (yup, Taylor series, fine, move along), then after that in my head I simply identified sigma_2 with sigma'', and the trap was closed. I'd be happy to look at the numerical error for multivariate polynomials, but in a couple of days, I'm currently traveling. BTW, if you don't mind, I'd like to use this little exchange to illustrate two things: your very interesting results/paper (which I'm not done digesting) the value of open work with Jupyter, Github, etc, for this kind of exchange, that is even more lightweight and direct than an exchange of ArXiv preprints (and there's code to work reproducibly off actual implementations). On Tuesday I'm giving the colloquium in the CS department at CU Boluder but I know some of the folks from my former physics dept will be in attendance. Would you be OK if I mention it?

fperez commented Sep 12, 2016

 ps - small typo, in the paragraph right after eq. (11) it says that the approximation is exact as lambda -> \infty. It should be "as lambda -> 0."

fperez commented Sep 13, 2016

 BTW @hwlin76, for the multivariate case, are you thinking of representing it as nested networks, e.g. g(u,v,w) = uvw as g = f(u, f(v,w)) where f(u,v) = uv, or do you have a similar (explicit) construction to the paper for the 3-variable case, with a 3-n-1 network? I imagine this should be possible/easy, but I haven't tried yet figuring out what the nx3 matrix would be, nor what \mu would become. And if that's the case, do you have the inductive result for the p-term multivariate product? Before I dive into the multivariate analysis, it would be good to know which way you're thinking of it, and if you have these constructive results ready then we could formulate the implementation that way directly.