AlexeyKruglov t1_j04rrg5 wrote
Probably because temperature parameter is not 1.0 when the model samples next tokens. Having it above 1 leads to the bias towards the more probable tokens.
Osemwaro OP t1_j06z76a wrote
Yeah, u/farmingvillein suggested that before you. The temperature parameter behaves like temperature in physics though, so low temperatures (i.e. temperatures below 1) decrease entropy, by biasing it towards the most probable tokens, and high temperatures increase entropy, by making the distribution more uniform.
Viewing a single comment thread. View all comments