jackfaker t1_iqzixrs wrote on October 4, 2022 at 6:34 AM

A common theme in these topics is people observe the status quo design decisions, such as linear layers connected by relu, and then try and backwards rationalize it with relatively hand-wavy mathematical justification. Citing things such as the universal approximation theorem which are not particularly relevant.

The reality is that this field is heavily driven by empirical results, and I would be highly skeptical of anyone saying that "xyz is the clear best way to do it".