Submitted by __ingeniare__ t3_11zhttl in singularity

​

https://preview.redd.it/p91wew7o2hpa1.png?width=813&format=png&auto=webp&v=enabled&s=597dff71203996d375556831b76e61c3ec973604

It's taken from OpenAI's GPT-4 research post and as far as I understand it shows GPT-4's own estimation of how certain it is that what it is saying is factually correct versus how likely it was to actually be correct on a subset of the MMLU benchmark. The dotted line represents the ideal case. In other words, GPT-4 could accurately estimate its own confidence in the prediction!

Unfortunately, the RLHF training (where it is trained to act like an ethical assistant) significantly affected this, so it is no longer true for the model that is released to the public. Assuming that this issue can be mitigated in the future, why would this be a big deal?

Hallucinations could be dramatically reduced through chain-of-thought prompting by only answering if the confidence is high enough, and simply admitting that it does not know if the confidence is low. Hallucinations seem to be the main thing preventing mass adoption of LLM systems in the near future due to fears of confidently wrong answers, and this seems to indicate that the problem might be much more solvable that people think.

On another note, isn't it pretty weird that GPT-4 can do this at all? The only reason I can think of is that it has learned what people in general think is hard, which might correlate with how much training data it happens to have seen on that subject, and so it outputs the probability a human might give to answering the question correctly and that happened to correlate with how likely it itself was to get it right, since that correlated with how much training data it had seen on that particular subject. Impossible to say without seeing the examples it was tested on. Anyway, pretty amazing.

62

Comments

You must log in or register to comment.

Educational_Ice151 t1_jdcdrhw wrote

So you could create a prompt that only provides a response if the confidence is greater than X.

Prompt:

You are a language model, I will provide you with an answer and a confidence score for each response. Please input your question and specify the minimum confidence threshold (default is 60%):

Question: {your_question_here} Confidence threshold: {desired_threshold_here}

Reply with “Confidence system enable.” to begin.

18

Nukemouse t1_jdcftfv wrote

Couldn't you just have it tell you how confident it is. Like put a little bar next to the output that the more full it is the more confident etc it is to warn users.

10

dwarfarchist9001 t1_jdegww3 wrote

Uh, the whole point of this thread is that for the GPT-4 base model it is not hallucinated. In fact the confidence estimates it gives are within the margin of error of the actual rate of correctness.

1