Submitted by sidney_lumet t3_105syyz in MachineLearning
IntelArtiGen t1_j3cmdvl wrote
Any alternative which would be able to solve the same problems would probably require a similar architecture: lot of parameters, deep connections.
There are many alternatives to deep learning on some specific tasks. But I'm not sure that if something is able to outpeform the current way we're doing deep learning on usual DL tasks, it will be something totally different (non-deep, few parameters etc.)
The future of ML regarding tasks we do with deep learning is probably just another kind of deep learning. Perhaps without backpropagation, perhaps with a totally different way to do computations, but still deep and highly parametric.
sidney_lumet OP t1_j3covo8 wrote
Like the forward forward algorithm Jeff Hinton proposed in nips 2022 ?
IntelArtiGen t1_j3cq3bi wrote
That's an innovative training algorithm on usual architectures. We could think of innovative training algorithms on innovative architectures.
jloverich t1_j3cpmoi wrote
Unfortunately this is basically a different type of layer by layer training which doesn't perform better than end to end training in any case that I'm aware of. It also seems very similar to stacking which can be done with any type of model.
yldedly t1_j3dn5mb wrote
>Any alternative which would be able
to solve the same problems would probably require a similar
architecture: lot of parameters, deep connections.
If handwritten character recognition (and generation) counts as one such problem, then here is a model that solves it with a handful of parameters: https://www.cs.cmu.edu/~rsalakhu/papers/LakeEtAl2015Science.pdf
IntelArtiGen t1_j3dpy8q wrote
Well it doesn't really count because you can also "solve" these tasks with SVM / RandomForests, etc. MNIST, OCR and other tasks with very small images are not great benchmarks anymore to compare a random algorithm with a deep learning algorithm.
I was more thinking of getting 90% top 1 on ImageNet or generating 512x512 images from text or learning on billions of texts to answer questions. You either need tons of parameters to solve these or an unbelievable amount of compression. And even DL algorithms which do compression need a lot of parameters. You would need an even bigger way to compress an information, perhaps it's possible but it's yet to invent.
yldedly t1_j3ds4h2 wrote
Imo there's no reason why we can't have much smaller models that do well on these tasks, but I admit it's just a hypothesis at this point. Specifically for images, an inverse graphics approach wouldn't require nearly as many parameters: http://sunw.csail.mit.edu/2015/papers/75_Kulkarni_SUNw.pdf
IntelArtiGen t1_j3dvbjr wrote
>Imo there's no reason why we can't have much smaller models
It depends on how much smaller they would be. There are limits to how much you can compress information. If you need to represent 4 states, you can't use one binary value 0/1, you need two parameters 00/01/10/11.
A large image of the real world contains a lot of information / details which can be hard to process and compress. We can compress it of course, that's what current DL algorithms and compression softwares do, but they have limits otherwise they loose too much information.
Usual models are far from being perfectly optimized but when you try to optimize them too much you can quickly loose in accuracy. Under 1.000.000 parameters it's hard to have anything that could compete with more standard DL models on the tasks I've described... at least for now. Perhaps people will have great ideas but it would require to really push current limits.
yldedly t1_j3dwdv6 wrote
I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).
IntelArtiGen t1_j3dyhfy wrote
By default it's true that DL algorithms are truly unoptimized on this point because modelers usually don't really care about optimizing the number of parameters.
For example Resnet50 uses 23 million parameters, which is much more than efficient net B0 which uses 5 million parameters and have a better accuracy (and is harder to train). But when you try to further optimize algorithms which were already optimized on their number of parameters you quickly see these limits. You would need models that would be even more efficient than these DL models which are already optimized regarding their number of parameters.
A DL model could probably solve this handwriting problem with a very low number of parameters if you build it specifically with this goal in mind.
Viewing a single comment thread. View all comments