Viewing a single comment thread. View all comments

IntelArtiGen t1_j3dpy8q wrote

Well it doesn't really count because you can also "solve" these tasks with SVM / RandomForests, etc. MNIST, OCR and other tasks with very small images are not great benchmarks anymore to compare a random algorithm with a deep learning algorithm.

I was more thinking of getting 90% top 1 on ImageNet or generating 512x512 images from text or learning on billions of texts to answer questions. You either need tons of parameters to solve these or an unbelievable amount of compression. And even DL algorithms which do compression need a lot of parameters. You would need an even bigger way to compress an information, perhaps it's possible but it's yet to invent.

1

yldedly t1_j3ds4h2 wrote

Imo there's no reason why we can't have much smaller models that do well on these tasks, but I admit it's just a hypothesis at this point. Specifically for images, an inverse graphics approach wouldn't require nearly as many parameters: http://sunw.csail.mit.edu/2015/papers/75_Kulkarni_SUNw.pdf

2

IntelArtiGen t1_j3dvbjr wrote

>Imo there's no reason why we can't have much smaller models

It depends on how much smaller they would be. There are limits to how much you can compress information. If you need to represent 4 states, you can't use one binary value 0/1, you need two parameters 00/01/10/11.

A large image of the real world contains a lot of information / details which can be hard to process and compress. We can compress it of course, that's what current DL algorithms and compression softwares do, but they have limits otherwise they loose too much information.

Usual models are far from being perfectly optimized but when you try to optimize them too much you can quickly loose in accuracy. Under 1.000.000 parameters it's hard to have anything that could compete with more standard DL models on the tasks I've described... at least for now. Perhaps people will have great ideas but it would require to really push current limits.

2

yldedly t1_j3dwdv6 wrote

I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).

2

IntelArtiGen t1_j3dyhfy wrote

By default it's true that DL algorithms are truly unoptimized on this point because modelers usually don't really care about optimizing the number of parameters.

For example Resnet50 uses 23 million parameters, which is much more than efficient net B0 which uses 5 million parameters and have a better accuracy (and is harder to train). But when you try to further optimize algorithms which were already optimized on their number of parameters you quickly see these limits. You would need models that would be even more efficient than these DL models which are already optimized regarding their number of parameters.

A DL model could probably solve this handwriting problem with a very low number of parameters if you build it specifically with this goal in mind.

2