[D] Classification with final layer having no activation? Submitted by AbIgnorantesBurros t3_y0y3q6 on October 11, 2022 at 3:11 AM in MachineLearning 7 comments 6
mrpogiface t1_irz4o45 wrote on October 12, 2022 at 2:51 AM The theoretical justification of having the softmax in the loss is nice. Aside from the numerical stability bit, using the softmax / cross entropy makes sense probabilistically Permalink 1
Viewing a single comment thread. View all comments