Viewing a single comment thread. View all comments

pocolai t1_irw0b82 wrote

this is just for numerical stability when computing the loss. the user can apply softmax to the last layer during inference.

2