seba07 t1_iqltogp wrote on October 1, 2022 at 9:25 AM

I think the real answer for many ML problems is "because it works". Why are we using relu (=max(x,0)) instead of sigmoid or tanh as layer activations nowdays? Math would discourage this as the derivative at 0 is not defined, but it's fast an it works.

jesuslop t1_iqmbdpg wrote on October 1, 2022 at 1:01 PM

Genuine interest, How frequently would you say new projects/libraries use ReLU activations nowadays? (as oposed to other activations)

EDIT: reformultated

cthorrez OP t1_iqmvcze wrote on October 1, 2022 at 3:36 PM

Exactly, lots of people use gelu now. (A more expensive version which utilizes a Gaussian distribution...)