ZestyData t1_iqwjiua wrote on October 3, 2022 at 4:52 PM

Layered linear nodes can model non-linear behaviours.
Computational complexity. Its more efficient to use the aforementioned layers of linears than it is to use non-linear functions directly

MrFlufypants t1_iqx6bhc wrote on October 3, 2022 at 7:16 PM

The activation functions are key. A linear combination of linear combinations is probably equal to a linear combination, so 10 layers would equate to a single layer, which is only capable of so much. The activation functions destroy the linearity though and are the key ingredient there

jms4607 t1_iqxdhw3 wrote on October 3, 2022 at 8:02 PM

Without activation functions an mlp would just be y=sum(m•x) + b

ZestyData t1_iqybakm wrote on October 4, 2022 at 12:01 AM

Lol thank you! Totally dropped the ball by missing that crucial element out.