Submitted by __ingeniare__ t3_11zhttl in singularity

​

https://preview.redd.it/p91wew7o2hpa1.png?width=813&format=png&auto=webp&v=enabled&s=597dff71203996d375556831b76e61c3ec973604

It's taken from OpenAI's GPT-4 research post and as far as I understand it shows GPT-4's own estimation of how certain it is that what it is saying is factually correct versus how likely it was to actually be correct on a subset of the MMLU benchmark. The dotted line represents the ideal case. In other words, GPT-4 could accurately estimate its own confidence in the prediction!

Unfortunately, the RLHF training (where it is trained to act like an ethical assistant) significantly affected this, so it is no longer true for the model that is released to the public. Assuming that this issue can be mitigated in the future, why would this be a big deal?

Hallucinations could be dramatically reduced through chain-of-thought prompting by only answering if the confidence is high enough, and simply admitting that it does not know if the confidence is low. Hallucinations seem to be the main thing preventing mass adoption of LLM systems in the near future due to fears of confidently wrong answers, and this seems to indicate that the problem might be much more solvable that people think.

On another note, isn't it pretty weird that GPT-4 can do this at all? The only reason I can think of is that it has learned what people in general think is hard, which might correlate with how much training data it happens to have seen on that subject, and so it outputs the probability a human might give to answering the question correctly and that happened to correlate with how likely it itself was to get it right, since that correlated with how much training data it had seen on that particular subject. Impossible to say without seeing the examples it was tested on. Anyway, pretty amazing.

62

Comments

You must log in or register to comment.

Educational_Ice151 t1_jdcdrhw wrote

So you could create a prompt that only provides a response if the confidence is greater than X.

Prompt:

You are a language model, I will provide you with an answer and a confidence score for each response. Please input your question and specify the minimum confidence threshold (default is 60%):

Question: {your_question_here} Confidence threshold: {desired_threshold_here}

Reply with “Confidence system enable.” to begin.

18

mckirkus t1_jde544y wrote

Bing does this now. It first asks if you want something precise, balanced or creative.

3

dwarfarchist9001 t1_jdegdqf wrote

Most likely that just changes the temperature value unless Microsoft has said otherwise.

1

SgathTriallair t1_jdedl7l wrote

You need at least three states: I am certain (somewhere between 80 and 90%)

My best guess is (somewhere between 60 and 80%)

I don't know (less than 60%)

1

Nukemouse t1_jdcftfv wrote

Couldn't you just have it tell you how confident it is. Like put a little bar next to the output that the more full it is the more confident etc it is to warn users.

10

Veleric t1_jdck1ks wrote

I've seen this done before but I'd like to see more research on the effectiveness of it.

8

galactic-arachnid t1_jddnkm3 wrote

I believe you’re talking about logprobs, and there is an enormous amount of literature on them

1

SgathTriallair t1_jdeen5z wrote

The effectiveness is more about seeing if the AI says it is 40% certain does that make people trust it or not.

1

__ingeniare__ OP t1_jdck0lh wrote

Yes you could, the specific implementation is irrelevant, the big thing is that it can estimate the confidence at all

3

Honest_Science t1_jdcljn9 wrote

6

signed7 t1_jddver5 wrote

The confidence level it says it has is probably hallucinated though.

6

dwarfarchist9001 t1_jdegww3 wrote

Uh, the whole point of this thread is that for the GPT-4 base model it is not hallucinated. In fact the confidence estimates it gives are within the margin of error of the actual rate of correctness.

1

WTFnoAvailableNames t1_jde0uv3 wrote

It doesn't work. I asked ut how much weight an M10 bolt could hold without breaking. It answered 7 kg. Thats obviously wrong so I asked it how certain it was on a scale from 1 to 100. It said 100.

2