Submitted by GraciousReformer t3_118pof6 in MachineLearning
"Deep learning is the only thing that currently works at scale it's the only class of algorithms that is able to discover arbitrary functions in a reasonable amount of time."
https://www.youtube.com/watch?v=p-OYPRhqRCg
I know of the universal approximation theorem. But is there any mathematical formulation of this statement?
activatedgeek t1_j9jvj8h wrote
For generalization (performing well beyond the training), there’s at least two dimensions: flexibility and inductive biases.
Flexibility ensures that many functions “can” be approximated in principle. That’s the universal approximation theorem. It is a descriptive result and does not prescribe how to find that function. This is not something very unique to DL. Deep Random Forests, Fourier Bases, Polynomial Bases, Gaussian processes all are universal function approximators (with some extra technical details).
The part unique to DL is that somehow their inductive biases have helped match some of the complex structured problems including vision and language that makes them generalize well. Inductive bias is a loosely defined term. I can provide examples and references.
CNNs provide the inductive bias to prefer functions that handle translation equivariance (not exactly true but only roughly due to pooling layers). https://arxiv.org/abs/1806.01261
Graph neural networks provide a relational inductive bias. https://arxiv.org/abs/1806.01261
Neural networks overall prefer simpler solutions, embodying Occam’s razor, another inductive bias. This argument is made theoretically using Kolmogorov complexity. https://arxiv.org/abs/1805.08522