Extension-Mastodon67 t1_jdivgzf wrote
I guess gpt2-medium is too small for RLHF.
PPO output:
Human: If I have 3 bananas and 4 pears. How many fruits do I have?
Assistant:
3 bananas: 1 pears, 1⁄2 cup of sugar, 2 tbsp of butter, and 1⁄2 tsp of cinnamon
4 bananas: 1⁄4 cup of sugar, 1 tbsp of butter, 1⁄2 tsp of cinnamon
5 bananas: 1⁄4 cup of sugar, 1 tbsp of butter, 1⁄2 tsp of cinnamon
6 bananas: 1⁄4 cup of sugar, 1 tbsp of butter, 1⁄2 tsp of
--------------
Vanilla output:
Human: If I have 3 bananas and 4 pears. How many fruits do I have?
Assistant:
Me: 3 bananas: 46 fruits (1 pie, half banana, 3 bananas)
Assistant: 46 pears: 39 fruits (1 pie, half orange, 1 pear)
Me: 47 oranges: 38 fruits (1 pie, half banana, 4 oranges)
Assistant: 47 pears: 43 fruits (1 pie, half banana, 6 oranges)
Me: 49 apples: 4 fruit (1 pie, half banana, 2 apples)
Assistant: 49 pears: 38 fruits (3 apples, 1 pear, 2 apples)
Me: 50 nectarines: 3 fruit (1 pie, half orange, 1 pear)
Assistant: 50 pears: 37 fruits (1 pie, half orange, 4
currentscurrents t1_jdj5p8d wrote
gpt2-medium is only 355M parameters so don't expect quality.
Even the fullsize GPT-2 is smaller than the smallest variants of most modern language models.
liyanjia92 OP t1_jdj7h0x wrote
Thanks for trying out! This is a good example to show the difference between RLHF'ed GPT-2 medium vs the vanilla GPT-2 medium. You can see that GPT-2 medium is completely outputting garbage while the RLHF version tend to come up with some answer for human. (although it failed)
The way i see this is that pre-trained model encode the knowledge of the world, and RLHF is just a way to align the model with human's preference of how to interact with the world.
You might see this tweet before: https://twitter.com/geoffreyhinton/status/1636110447442112513?s=20
So with GPT-2 medium, what we really do here is to parent a dumb kid, instead of a "supernaturally precocious child" like GPT-3. What interested me is that RLHF does actually help to parent this dumb kid to be more socially acceptable.
In other words, if we discover the power of alignment and RLHF earlier, we might foresee the ChatGPT moment much earlier when GPT-2 is out in 2019.
I'm also thinking to do the same with LLaMA to maybe have a nanoChatGPT that actually could be useful for a real life application. Stay tuned!
blueSGL t1_jdl02u6 wrote
>So with GPT-2 medium, what we really do here is to parent a dumb kid, instead of a "supernaturally precocious child" like GPT-3. What interested me is that RLHF does actually help to parent this dumb kid to be more socially acceptable.
> In other words, if we discover the power of alignment and RLHF earlier, we might foresee the ChatGPT moment much earlier when GPT-2 is out in 2019.
That just reads to me as capability overhang. If there is "one simple trick" to make the model "behave" what's to say there that this is the only one. (or that the capabilities derived from the current behavior modification are the 'best they can be') Scary thought.
Viewing a single comment thread. View all comments