[D] Classification with final layer having no activation? Submitted by AbIgnorantesBurros t3_y0y3q6 on October 11, 2022 at 3:11 AM in MachineLearning 7 comments 6
pocolai t1_irw0b82 wrote on October 11, 2022 at 1:45 PM this is just for numerical stability when computing the loss. the user can apply softmax to the last layer during inference. Permalink 2
Viewing a single comment thread. View all comments