https://preview.redd.it/p91wew7o2hpa1.png?width=813&format=png&auto=webp&v=enabled&s=597dff71203996d375556831b76e61c3ec973604

It's taken from OpenAI's GPT-4 research post and as far as I understand it shows GPT-4's own estimation of how certain it is that what it is saying is factually correct versus how likely it was to actually be correct on a subset of the MMLU benchmark. The dotted line represents the ideal case. In other words, GPT-4 could accurately estimate its own confidence in the prediction!

Unfortunately, the RLHF training (where it is trained to act like an ethical assistant) significantly affected this, so it is no longer true for the model that is released to the public. Assuming that this issue can be mitigated in the future, why would this be a big deal?

Hallucinations could be dramatically reduced through chain-of-thought prompting by only answering if the confidence is high enough, and simply admitting that it does not know if the confidence is low. Hallucinations seem to be the main thing preventing mass adoption of LLM systems in the near future due to fears of confidently wrong answers, and this seems to indicate that the problem might be much more solvable that people think.

On another note, isn't it pretty weird that GPT-4 can do this at all? The only reason I can think of is that it has learned what people in general think is hard, which might correlate with how much training data it happens to have seen on that subject, and so it outputs the probability a human might give to answering the question correctly and that happened to correlate with how likely it itself was to get it right, since that correlated with how much training data it had seen on that particular subject. Impossible to say without seeing the examples it was tested on. Anyway, pretty amazing.

Comments

Educational_Ice151 t1_jdcdrhw wrote on March 23, 2023 at 12:02 PM

#2,315,993

So you could create a prompt that only provides a response if the confidence is greater than X.

Prompt:

You are a language model, I will provide you with an answer and a confidence score for each response. Please input your question and specify the minimum confidence threshold (default is 60%):

Question: {your_question_here} Confidence threshold: {desired_threshold_here}

Reply with “Confidence system enable.” to begin.

Nukemouse t1_jdcftfv wrote on March 23, 2023 at 12:21 PM

#2,316,188

Couldn't you just have it tell you how confident it is. Like put a little bar next to the output that the more full it is the more confident etc it is to warn users.

ingeniare OP t1_jdck0lh wrote on March 23, 2023 at 12:56 PM

#2,316,676

Replying to Nukemouse (#2,316,188)

Yes you could, the specific implementation is irrelevant, the big thing is that it can estimate the confidence at all

Veleric t1_jdck1ks wrote on March 23, 2023 at 12:57 PM

#2,316,678

Replying to Nukemouse (#2,316,188)

I've seen this done before but I'd like to see more research on the effectiveness of it.

galactic-arachnid t1_jddnkm3 wrote on March 23, 2023 at 5:22 PM

#2,320,833

Replying to Veleric (#2,316,678)

I believe you’re talking about logprobs, and there is an enormous amount of literature on them

signed7 t1_jddver5 wrote on March 23, 2023 at 6:11 PM

#2,321,671

Replying to Honest_Science (#2,316,822)

The confidence level it says it has is probably hallucinated though.

WTFnoAvailableNames t1_jde0uv3 wrote on March 23, 2023 at 6:45 PM

#2,322,193

Replying to Honest_Science (#2,316,822)

It doesn't work. I asked ut how much weight an M10 bolt could hold without breaking. It answered 7 kg. Thats obviously wrong so I asked it how certain it was on a scale from 1 to 100. It said 100.

vivehelpme t1_jde431v wrote on March 23, 2023 at 7:05 PM

#2,322,487

Replying to Educational_Ice151 (#2,315,993)

As A large language model I'm not able to give useful answers

Educational_Ice151 t1_jde4bxh wrote on March 23, 2023 at 7:07 PM

#2,322,516

Replying to vivehelpme (#2,322,487)

That means it’s working

mckirkus t1_jde544y wrote on March 23, 2023 at 7:12 PM

#2,322,596

Replying to Educational_Ice151 (#2,315,993)

Bing does this now. It first asks if you want something precise, balanced or creative.

SgathTriallair t1_jdedl7l wrote on March 23, 2023 at 8:06 PM

#2,323,446

Replying to Educational_Ice151 (#2,315,993)

You need at least three states: I am certain (somewhere between 80 and 90%)

My best guess is (somewhere between 60 and 80%)

I don't know (less than 60%)

SgathTriallair t1_jdeen5z wrote on March 23, 2023 at 8:13 PM

#2,323,557

Replying to galactic-arachnid (#2,320,833)

The effectiveness is more about seeing if the AI says it is 40% certain does that make people trust it or not.

dwarfarchist9001 t1_jdegdqf wrote on March 23, 2023 at 8:24 PM

#2,323,741

Replying to mckirkus (#2,322,596)

Most likely that just changes the temperature value unless Microsoft has said otherwise.

dwarfarchist9001 t1_jdegww3 wrote on March 23, 2023 at 8:27 PM

#2,323,797

Replying to signed7 (#2,321,671)

Uh, the whole point of this thread is that for the GPT-4 base model it is not hallucinated. In fact the confidence estimates it gives are within the margin of error of the actual rate of correctness.

Why is this graph not a bigger deal?

Comments

Educational_Ice151 t1_jdcdrhw wrote on March 23, 2023 at 12:02 PM

Nukemouse t1_jdcftfv wrote on March 23, 2023 at 12:21 PM

ingeniare OP t1_jdck0lh wrote on March 23, 2023 at 12:56 PM

Veleric t1_jdck1ks wrote on March 23, 2023 at 12:57 PM

Honest_Science t1_jdcljn9 wrote on March 23, 2023 at 1:09 PM

galactic-arachnid t1_jddnkm3 wrote on March 23, 2023 at 5:22 PM

signed7 t1_jddver5 wrote on March 23, 2023 at 6:11 PM

WTFnoAvailableNames t1_jde0uv3 wrote on March 23, 2023 at 6:45 PM

vivehelpme t1_jde431v wrote on March 23, 2023 at 7:05 PM

Educational_Ice151 t1_jde4bxh wrote on March 23, 2023 at 7:07 PM

mckirkus t1_jde544y wrote on March 23, 2023 at 7:12 PM

SgathTriallair t1_jdedl7l wrote on March 23, 2023 at 8:06 PM

SgathTriallair t1_jdeen5z wrote on March 23, 2023 at 8:13 PM

dwarfarchist9001 t1_jdegdqf wrote on March 23, 2023 at 8:24 PM

dwarfarchist9001 t1_jdegww3 wrote on March 23, 2023 at 8:27 PM