Submitted by sidney_lumet t3_105syyz in MachineLearning
IntelArtiGen t1_j3dvbjr wrote
Reply to comment by yldedly in [Discussion] Is there any alternative of deep learning ? by sidney_lumet
>Imo there's no reason why we can't have much smaller models
It depends on how much smaller they would be. There are limits to how much you can compress information. If you need to represent 4 states, you can't use one binary value 0/1, you need two parameters 00/01/10/11.
A large image of the real world contains a lot of information / details which can be hard to process and compress. We can compress it of course, that's what current DL algorithms and compression softwares do, but they have limits otherwise they loose too much information.
Usual models are far from being perfectly optimized but when you try to optimize them too much you can quickly loose in accuracy. Under 1.000.000 parameters it's hard to have anything that could compete with more standard DL models on the tasks I've described... at least for now. Perhaps people will have great ideas but it would require to really push current limits.
yldedly t1_j3dwdv6 wrote
I agree of course, you can't compress more than some hard limit, even in lossy compression. I just think DL finds very poor compression schemes compared to what's possible (compare DL for that handwriting problem above to the solution constructed by human experts).
IntelArtiGen t1_j3dyhfy wrote
By default it's true that DL algorithms are truly unoptimized on this point because modelers usually don't really care about optimizing the number of parameters.
For example Resnet50 uses 23 million parameters, which is much more than efficient net B0 which uses 5 million parameters and have a better accuracy (and is harder to train). But when you try to further optimize algorithms which were already optimized on their number of parameters you quickly see these limits. You would need models that would be even more efficient than these DL models which are already optimized regarding their number of parameters.
A DL model could probably solve this handwriting problem with a very low number of parameters if you build it specifically with this goal in mind.
Viewing a single comment thread. View all comments