its_ean t1_iqlr342 wrote on October 1, 2022 at 8:46 AM

hyperbolic tangent is convenient for backpropogation since its derivative is 1-tanh²

cthorrez OP t1_iqlrf1v wrote on October 1, 2022 at 8:51 AM

I'm not necessarily saying it should be replaced in every layer but I think it would at least make sense to investigate other options for final probability generation. tanh is definitely good for intermediate layer activation.

chatterbox272 t1_iqm72tk wrote on October 1, 2022 at 12:20 PM

Tanh is not a particularly good intermediate activation function at all. It's too linear around zero and it saturates at both ends.

cthorrez OP t1_iqnk270 wrote on October 1, 2022 at 6:30 PM

Well it's an even worse final output activation for binary classification because the outputs are -1 to 1 not 0 to 1.

I've never seen it used as anything but an internal activation.