Submitted by cthorrez t3_xsq40j in MachineLearning
its_ean t1_iqlr342 wrote
hyperbolic tangent is convenient for backpropogation since its derivative is 1-tanh²
cthorrez OP t1_iqlrf1v wrote
I'm not necessarily saying it should be replaced in every layer but I think it would at least make sense to investigate other options for final probability generation. tanh is definitely good for intermediate layer activation.
chatterbox272 t1_iqm72tk wrote
Tanh is not a particularly good intermediate activation function at all. It's too linear around zero and it saturates at both ends.
cthorrez OP t1_iqnk270 wrote
Well it's an even worse final output activation for binary classification because the outputs are -1 to 1 not 0 to 1.
I've never seen it used as anything but an internal activation.
Viewing a single comment thread. View all comments