Submitted by __ingeniare__ t3_11zhttl in singularity
​
It's taken from OpenAI's GPT-4 research post and as far as I understand it shows GPT-4's own estimation of how certain it is that what it is saying is factually correct versus how likely it was to actually be correct on a subset of the MMLU benchmark. The dotted line represents the ideal case. In other words, GPT-4 could accurately estimate its own confidence in the prediction!
Unfortunately, the RLHF training (where it is trained to act like an ethical assistant) significantly affected this, so it is no longer true for the model that is released to the public. Assuming that this issue can be mitigated in the future, why would this be a big deal?
Hallucinations could be dramatically reduced through chain-of-thought prompting by only answering if the confidence is high enough, and simply admitting that it does not know if the confidence is low. Hallucinations seem to be the main thing preventing mass adoption of LLM systems in the near future due to fears of confidently wrong answers, and this seems to indicate that the problem might be much more solvable that people think.
On another note, isn't it pretty weird that GPT-4 can do this at all? The only reason I can think of is that it has learned what people in general think is hard, which might correlate with how much training data it happens to have seen on that subject, and so it outputs the probability a human might give to answering the question correctly and that happened to correlate with how likely it itself was to get it right, since that correlated with how much training data it had seen on that particular subject. Impossible to say without seeing the examples it was tested on. Anyway, pretty amazing.
Educational_Ice151 t1_jdcdrhw wrote
So you could create a prompt that only provides a response if the confidence is greater than X.
Prompt:
You are a language model, I will provide you with an answer and a confidence score for each response. Please input your question and specify the minimum confidence threshold (default is 60%):
Question: {your_question_here} Confidence threshold: {desired_threshold_here}
Reply with “Confidence system enable.” to begin.