was_der_Fall_ist t1_jdwz4qw wrote on March 27, 2023 at 8:18 PM

Reply to comment by sineiraetstudio in [D]GPT-4 might be able to tell you if it hallucinated by Cool_Abbreviations_9

I think you make a good point. We probably need better methods of post-training LLMs. But it does seem like the current regime is still sometimes more useful than the pre-trained model, which Christiano also says. It's only in some contexts that this behavior is worse. I'm not sure if it's really better than top-p sampling, though. I'm not sure that it is. But RLHF models do seem pretty useful.

sineiraetstudio t1_jdymf8q wrote on March 28, 2023 at 3:25 AM

Oh, RLHF absolutely has all sorts of benefits (playing with top-p only makes answers more consistent - but sometimes you want to optimize for something different than "most likely"), so it's definitely here to stay (for now?), it's just not purely positive. Ideally we'd have a RLHF version that's still well calibrated (or even better, some way to determine confidence without looking at logits that also works with chain of thought prompting).