Viewing a single comment thread. View all comments

KD_A OP t1_jegxas8 wrote

Interesting, and I think I know what you mean. One naive idea is a "top-k tokens" system. This system considers the top k highest probability tokens (conditional on previous ones) for each completion token, and for each completion. And then take the sum of the average likelihoods across all k^n (n = # completion tokens) paths for each completion. That would be one way to address this synonym problem. But ofc it results in way more computation.

Edit: actually, thinking a bit more, I think the synonym problem is more-or-less a non-issue for LMs trained to do next-token prediction.

2

PassingTumbleweed t1_jeh0p1j wrote

I'm curious to get your thoughts about a simple example where you have three classes: cat, dog, and bird. What happens if the top-1 prediction is "eagle"? Does that probability mass get discarded? Because it should probably go into the bird category

1

KD_A OP t1_jeh0ygl wrote

Yup it gets totally discarded. Hopefully, the conditional probability of bird is higher than cat or dog.

2

PassingTumbleweed t1_jeh1248 wrote

One thing I've seen with these LLMs is that you can prompt them with the classes using sort of a multiple choice style. It would be interesting to experiment with whether this can stabilize the outputs and reduce the amount of out of vocabulary predictions you get

2