blueSGL t1_jdl02u6 wrote on March 25, 2023 at 3:57 AM

Reply to comment by liyanjia92 in [P] ChatGPT with GPT-2: A minimum example of aligning language models with RLHF similar to ChatGPT by liyanjia92

>So with GPT-2 medium, what we really do here is to parent a dumb kid, instead of a "supernaturally precocious child" like GPT-3. What interested me is that RLHF does actually help to parent this dumb kid to be more socially acceptable.

> In other words, if we discover the power of alignment and RLHF earlier, we might foresee the ChatGPT moment much earlier when GPT-2 is out in 2019.

That just reads to me as capability overhang. If there is "one simple trick" to make the model "behave" what's to say there that this is the only one. (or that the capabilities derived from the current behavior modification are the 'best they can be') Scary thought.