Submitted by KD_A t3_127pbst in MachineLearning
PassingTumbleweed t1_jegvhb5 wrote
Reply to comment by KD_A in [P] CAPPr: use OpenAI or HuggingFace models to easily do zero-shot text classification by KD_A
What I was thinking is that some kind of hierarchical LLM taxonomy might be interesting, where you can re-jigger the conditional probability tree onto any arbitrary vocab of token sequences.
KD_A OP t1_jegxas8 wrote
Interesting, and I think I know what you mean. One naive idea is a "top-k tokens" system. This system considers the top k highest probability tokens (conditional on previous ones) for each completion token, and for each completion. And then take the sum of the average likelihoods across all k^n (n = # completion tokens) paths for each completion. That would be one way to address this synonym problem. But ofc it results in way more computation.
Edit: actually, thinking a bit more, I think the synonym problem is more-or-less a non-issue for LMs trained to do next-token prediction.
PassingTumbleweed t1_jeh0p1j wrote
I'm curious to get your thoughts about a simple example where you have three classes: cat, dog, and bird. What happens if the top-1 prediction is "eagle"? Does that probability mass get discarded? Because it should probably go into the bird category
KD_A OP t1_jeh0ygl wrote
Yup it gets totally discarded. Hopefully, the conditional probability of bird is higher than cat or dog.
PassingTumbleweed t1_jeh1248 wrote
One thing I've seen with these LLMs is that you can prompt them with the classes using sort of a multiple choice style. It would be interesting to experiment with whether this can stabilize the outputs and reduce the amount of out of vocabulary predictions you get
Viewing a single comment thread. View all comments