Submitted by cthorrez t3_xsq40j in MachineLearning
seba07 t1_iqltogp wrote
I think the real answer for many ML problems is "because it works". Why are we using relu (=max(x,0)) instead of sigmoid or tanh as layer activations nowdays? Math would discourage this as the derivative at 0 is not defined, but it's fast an it works.
jesuslop t1_iqmbdpg wrote
Genuine interest, How frequently would you say new projects/libraries use ReLU activations nowadays? (as oposed to other activations)
EDIT: reformultated
cthorrez OP t1_iqmvcze wrote
Exactly, lots of people use gelu now. (A more expensive version which utilizes a Gaussian distribution...)
Viewing a single comment thread. View all comments