Many theories are available for very wide and infinite-width networks related to the Neural Tangent Kernel, but it is not clear if such theories are able to explain what happens in real world models.
We collect three theories related to convergence, conditioning and generalization of deep networks analyzed under the Polyak-Lojasiewicz condition, and perform experiments to measure crucial quantities in the optimization process of realistic models to test these theories.
For any question or curiosity don't forget to write to the authors.