mocny-chlapik t1_iqlt8jo wrote on October 1, 2022 at 9:19 AM

Reply to comment by cthorrez in [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez

It's about the speed of computation, not about the complexity of definition. If you need to calculate the function million or even billion times for each sample, it makes sense to optimize it.

cthorrez OP t1_iqmv564 wrote on October 1, 2022 at 3:35 PM

I'm not really convinced by this. I bet sigmoid is a little bit faster but I highly doubt the difference between logistic sigmoid and gaussian sigmoid final activation could even be detected when training a transformer model. The other layers are the main cost.

Also people do all sorts of experiments which increase cost. A good example is gelu vs relu. This adds gaussian calculations to every layer and people still do it.