Ulfgardleo t1_j97nb2q wrote on February 19, 2023 at 9:48 PM

#1,875,337

sigmoid of 0 is 0.5

Repulsive_Tart3669 t1_j97p211 wrote on February 19, 2023 at 10:00 PM

#1,875,460

I believe a common approach is to use a linear activation function for regression problems unless target variable has certain semantics that suggest the use of other non-linearities (sigmoid, tanh etc.). Also consider rescaling your targets instead of trying to match the desired output with activation functions.

From you description (I might be wrong though), it seems like the 0 output is a special case. In this case you might want to use a binary classifier to classify input samples into two classes first. For class 0 the output is 0. For class 1 you use another model (regressor) that outputs a prediction.

bremen79 t1_j97sb9r wrote on February 19, 2023 at 10:24 PM

#1,875,681

The sigmoid will make effectively very hard for the network to produce values close to 1, because it would require a pre activation value close to infinity. Would this be a good behavior in your application?

__lawless t1_j97v07m wrote on February 19, 2023 at 10:43 PM

#1,875,865

Easiest solution no sigmoid no relu in the last layer just clamp it between 0 and 1. Works surprisingly well

squidward2022 t1_j97veu5 wrote on February 19, 2023 at 10:46 PM

#1,875,888

Shifting the domain of sigmoid S from (-infty,infty) to (0,infty) is going to be kind of weird. In the first (original) case we would have S(-infty) = 0, S(0) = 1/2, S(infty) = 1, and thus the finite logit values w your network may output will be between -infty and infty and S(w) will give something meaningful. Now if you mentally shift S to be defined between (0, infty) you get S(0) = 0 S(infty) = 1. What value w would be needed to achieve S(w) = 1/2 ? infty / 2 ? It seems important that Sigmoid is defined on the open interval (-infty, infty) not just because we wish logits to be arbitrary valued, but also because we want S to be "expressive" around the logit values we see in practice, which must be finite.

Here is something you could do that doesn't require a shifted sigmoid: You have network f(x) = w which maps an input x to a score w. Take tanh(f(x)) and you get something with range (-1,1). Any negative w is mapped to a negative value in the range(-1,0) Now just take the ReLU of this, relu(tanh(f(x)) and all negative values from the tanh, which come from negative w's, go to 0 and all the positive values from the tanh, which come from positive w's, are unnafected.

In this way we have, negative w --> (-1,0) --> 0 and positive w --> (0,1) --> (0,1).

mrwafflezzz OP t1_j99f6kd wrote on February 20, 2023 at 6:53 AM

#1,880,955

Replying to Repulsive_Tart3669 (#1,875,460)

The two model approach is the original setup :). I'm just looking for potential alternatives.

mrwafflezzz OP t1_j99f9kl wrote on February 20, 2023 at 6:54 AM

#1,880,962

Replying to squidward2022 (#1,875,888)

Will it be able to approach 1 somewhat effectively as well?

[deleted] t1_j9atzu6 wrote on February 20, 2023 at 4:00 PM

#1,885,362

Replying to squidward2022 (#1,875,888)

[deleted]

squidward2022 t1_j9au3bg wrote on February 20, 2023 at 4:01 PM

#1,885,376

Replying to mrwafflezzz (#1,880,962)

Yup! If you look at the graph of tanh you will see relu(tanh) will smush the left half of the graph to 0. The right half of the graph on (0,infty) ranges in value from 0 and 1 but you can see saturation towards 1 starts to occur around 2-2.5. Since relu leaves this half unchanged you’ll be able to approach 1 very effectively with reasonable finite values.

mrwafflezzz OP t1_j9b414n wrote on February 20, 2023 at 5:06 PM

#1,886,407

Replying to squidward2022 (#1,885,376)

Very interesting. Thanks!

[D] Relu + sigmoid output activation

Comments