KD_A OP t1_jegxas8 wrote on March 31, 2023 at 10:56 PM

Reply to comment by PassingTumbleweed in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A

Interesting, and I think I know what you mean. One naive idea is a "top-k tokens" system. This system considers the top k highest probability tokens (conditional on previous ones) for each completion token, and for each completion. And then take the sum of the average likelihoods across all k^n (n = # completion tokens) paths for each completion. That would be one way to address this synonym problem. But ofc it results in way more computation.

Edit: actually, thinking a bit more, I think the synonym problem is more-or-less a non-issue for LMs trained to do next-token prediction.

PassingTumbleweed t1_jeh0p1j wrote on March 31, 2023 at 11:22 PM

I'm curious to get your thoughts about a simple example where you have three classes: cat, dog, and bird. What happens if the top-1 prediction is "eagle"? Does that probability mass get discarded? Because it should probably go into the bird category

KD_A OP t1_jeh0ygl wrote on March 31, 2023 at 11:24 PM

Yup it gets totally discarded. Hopefully, the conditional probability of bird is higher than cat or dog.

PassingTumbleweed t1_jeh1248 wrote on March 31, 2023 at 11:25 PM

One thing I've seen with these LLMs is that you can prompt them with the classes using sort of a multiple choice style. It would be interesting to experiment with whether this can stabilize the outputs and reduce the amount of out of vocabulary predictions you get