Submitted by cthorrez t3_xsq40j in MachineLearning
mocny-chlapik t1_iqlt8jo wrote
Reply to comment by cthorrez in [D] Why is the machine learning community obsessed with the logistic distribution? by cthorrez
It's about the speed of computation, not about the complexity of definition. If you need to calculate the function million or even billion times for each sample, it makes sense to optimize it.
cthorrez OP t1_iqmv564 wrote
I'm not really convinced by this. I bet sigmoid is a little bit faster but I highly doubt the difference between logistic sigmoid and gaussian sigmoid final activation could even be detected when training a transformer model. The other layers are the main cost.
Also people do all sorts of experiments which increase cost. A good example is gelu vs relu. This adds gaussian calculations to every layer and people still do it.
Viewing a single comment thread. View all comments