Viewing a single comment thread. View all comments

mrpogiface t1_irz4o45 wrote

The theoretical justification of having the softmax in the loss is nice. Aside from the numerical stability bit, using the softmax / cross entropy makes sense probabilistically

1