Comments

You must log in or register to comment.

bremen79 t1_j97sb9r wrote

The sigmoid will make effectively very hard for the network to produce values close to 1, because it would require a pre activation value close to infinity. Would this be a good behavior in your application?

4

Repulsive_Tart3669 t1_j97p211 wrote

I believe a common approach is to use a linear activation function for regression problems unless target variable has certain semantics that suggest the use of other non-linearities (sigmoid, tanh etc.). Also consider rescaling your targets instead of trying to match the desired output with activation functions.

From you description (I might be wrong though), it seems like the 0 output is a special case. In this case you might want to use a binary classifier to classify input samples into two classes first. For class 0 the output is 0. For class 1 you use another model (regressor) that outputs a prediction.

2

mrwafflezzz OP t1_j99f6kd wrote

The two model approach is the original setup :). I'm just looking for potential alternatives.

1

__lawless t1_j97v07m wrote

Easiest solution no sigmoid no relu in the last layer just clamp it between 0 and 1. Works surprisingly well

2

squidward2022 t1_j97veu5 wrote

Shifting the domain of sigmoid S from (-infty,infty) to (0,infty) is going to be kind of weird. In the first (original) case we would have S(-infty) = 0, S(0) = 1/2, S(infty) = 1, and thus the finite logit values w your network may output will be between -infty and infty and S(w) will give something meaningful. Now if you mentally shift S to be defined between (0, infty) you get S(0) = 0 S(infty) = 1. What value w would be needed to achieve S(w) = 1/2 ? infty / 2 ? It seems important that Sigmoid is defined on the open interval (-infty, infty) not just because we wish logits to be arbitrary valued, but also because we want S to be "expressive" around the logit values we see in practice, which must be finite.

Here is something you could do that doesn't require a shifted sigmoid: You have network f(x) = w which maps an input x to a score w. Take tanh(f(x)) and you get something with range (-1,1). Any negative w is mapped to a negative value in the range(-1,0) Now just take the ReLU of this, relu(tanh(f(x)) and all negative values from the tanh, which come from negative w's, go to 0 and all the positive values from the tanh, which come from positive w's, are unnafected.

In this way we have, negative w --> (-1,0) --> 0 and positive w --> (0,1) --> (0,1).

2

mrwafflezzz OP t1_j99f9kl wrote

Will it be able to approach 1 somewhat effectively as well?

2

squidward2022 t1_j9au3bg wrote

Yup! If you look at the graph of tanh you will see relu(tanh) will smush the left half of the graph to 0. The right half of the graph on (0,infty) ranges in value from 0 and 1 but you can see saturation towards 1 starts to occur around 2-2.5. Since relu leaves this half unchanged you’ll be able to approach 1 very effectively with reasonable finite values.

2

Ulfgardleo t1_j97nb2q wrote

sigmoid of 0 is 0.5

−1