Submitted by mrx-ai t3_zgr7nr in MachineLearning
Paper: Large language models are not zero-shot communicators (arXiv)
Abstract:
Despite widespread use of LLMs as conversational agents, evaluations of performance fail to capture a crucial aspect of communication: interpreting language in context. Humans interpret language using beliefs and prior knowledge about the world. For example, we intuitively understand the response "I wore gloves" to the question "Did you leave fingerprints?" as meaning "No". To investigate whether LLMs have the ability to make this type of inference, known as an implicature, we design a simple task and evaluate widely used state-of-the-art models. We find that, despite only evaluating on utterances that require a binary inference (yes or no), most perform close to random. Models adapted to be "aligned with human intent" perform much better, but still show a significant gap with human performance. We present our findings as the starting point for further research into evaluating how LLMs interpret language in context and to drive the development of more pragmatic and useful models of human discourse.
Authors: Laura Ruis, Akbir Khan, Stella Biderman, Sara Hooker, Tim Rocktäschel, Edward Grefenstette
Acceptable-Cress-374 t1_izigh23 wrote
Would this improve with some prompt engineering? Could you perhaps use the LLM to first provide itself some context and then answer the question (in what becomes a few-shot attempt)? In other words, is it worth training for 0shot or can we use the LLMs to self provide some context and answer the prompt in self-learned few-shot? Does my question even make sense?