Submitted by 4bedoe t3_yas9k0 in MachineLearning
seba07 t1_itd3x25 wrote
A few random things I learned in practice: "Because it worked" is a valid answer to why you chose certain parameters or algorithms. Older architectures like resnet are still state of the art for certain scenarios. Executive time is crucial, we often take the smallest models available.
Apprehensive-Grade81 t1_itgoofb wrote
This is a great answer. The simplest approach is often the best in practice.
Furrealpony t1_itdbrky wrote
I am still shocked as to how densenets are not the standart in the industry, also with a cross stage partial design you split the gradient flow and allow for much much easier training process. Is it the complexity of implementation thats holding them back I wonder?
Red-Portal t1_itliz8y wrote
Anybody who has tried to run densenet knows, that it requires an absurd amount of memory in comparison to resnets.
Viewing a single comment thread. View all comments