Submitted by mrwafflezzz t3_116nm8c in MachineLearning
[removed]
Submitted by mrwafflezzz t3_116nm8c in MachineLearning
[removed]
I believe a common approach is to use a linear activation function for regression problems unless target variable has certain semantics that suggest the use of other non-linearities (sigmoid, tanh etc.). Also consider rescaling your targets instead of trying to match the desired output with activation functions.
From you description (I might be wrong though), it seems like the 0 output is a special case. In this case you might want to use a binary classifier to classify input samples into two classes first. For class 0 the output is 0. For class 1 you use another model (regressor) that outputs a prediction.
The two model approach is the original setup :). I'm just looking for potential alternatives.
Easiest solution no sigmoid no relu in the last layer just clamp it between 0 and 1. Works surprisingly well
Shifting the domain of sigmoid S from (-infty,infty) to (0,infty) is going to be kind of weird. In the first (original) case we would have S(-infty) = 0, S(0) = 1/2, S(infty) = 1, and thus the finite logit values w your network may output will be between -infty and infty and S(w) will give something meaningful. Now if you mentally shift S to be defined between (0, infty) you get S(0) = 0 S(infty) = 1. What value w would be needed to achieve S(w) = 1/2 ? infty / 2 ? It seems important that Sigmoid is defined on the open interval (-infty, infty) not just because we wish logits to be arbitrary valued, but also because we want S to be "expressive" around the logit values we see in practice, which must be finite.
Here is something you could do that doesn't require a shifted sigmoid: You have network f(x) = w which maps an input x to a score w. Take tanh(f(x)) and you get something with range (-1,1). Any negative w is mapped to a negative value in the range(-1,0) Now just take the ReLU of this, relu(tanh(f(x)) and all negative values from the tanh, which come from negative w's, go to 0 and all the positive values from the tanh, which come from positive w's, are unnafected.
In this way we have, negative w --> (-1,0) --> 0 and positive w --> (0,1) --> (0,1).
Will it be able to approach 1 somewhat effectively as well?
Yup! If you look at the graph of tanh you will see relu(tanh) will smush the left half of the graph to 0. The right half of the graph on (0,infty) ranges in value from 0 and 1 but you can see saturation towards 1 starts to occur around 2-2.5. Since relu leaves this half unchanged you’ll be able to approach 1 very effectively with reasonable finite values.
Very interesting. Thanks!
[deleted]
sigmoid of 0 is 0.5
bremen79 t1_j97sb9r wrote
The sigmoid will make effectively very hard for the network to produce values close to 1, because it would require a pre activation value close to infinity. Would this be a good behavior in your application?