gdahl t1_iqpg125 wrote on October 2, 2022 at 3:20 AM

People use lots of other things too. Probit regression, Poisson likelihoods, all sorts of stuff. As you said, it is best to fit what you are doing to the problem.

Logistic-regression style output layers are very popular in deep learning, perhaps even more than in other parts of the ML community. But Gaussian Process Classification is often done with probit models (see http://gaussianprocess.org/gpml/chapters/RW3.pdf ). However, if necessary people will design neural network output activation functions and losses to fit the problem they are solving.

That said, a lot of people doing deep learning joined the field in the past 2 years and just use what they see other people using, without giving it much thought. So we get these extremely popular cross entropy losses.

cthorrez OP t1_iqpko2f wrote on October 2, 2022 at 4:04 AM

Thanks for the reply and the resource! You're right about the relatively recent influx of people who enter the ML field via deep learning first. Seems like most of the intro material focuses on logistic sigmoid based methods.

That said, do you think there is a fundamental reason why other log likelihood based methods such as probit and poisson as you mentioned haven't caught on in the deep learning field? Is it just that probit doesn't give an edge in classification, and such a large portion of use cases don't require anything besides a classification based loss?