Submitted by lambolifeofficial t3_zzn35o in MachineLearning
currentscurrents t1_j2cm36p wrote
TL;DR they want to take another language model (Google’s PaLM) and do Reinforcement Learning with Human Feedback (RLHF) on it like OpenAI did for ChatGPT.
At this point they haven't actually done it yet, since they need both compute power and human volunteers to do the training:
>Human volunteers will be employed to rank those responses from best to worst, using the rankings to create a reward model that takes the original model’s responses and sorts them in order of preference, filtering for the top answers to a given prompt.
>However, the process of aligning this model with what users want to accomplish with ChatGPT is both costly and time-consuming, as PaLM has a massive 540 billion parameters. Note that the cost of developing a text-generating model with only 1.5 billion parameters can reach up to $1.6 million.
Since it has 540b parameters, you will still need a GPU cluster to run it.
Ok_Reference_7489 t1_j2e73fe wrote
>At this point they haven't actually done it yet
There is no "they" there. This is just some random crypto guy's blog who clearly does not know what he is talking about.
currentscurrents t1_j2ef37r wrote
Right, he's not the developer - it's just an article about the project.
Ok_Reference_7489 t1_j2eg79x wrote
There is no project.
currentscurrents t1_j2ege8h wrote
Ok_Reference_7489 t1_j2ehw9g wrote
LucidDrains "implements" all kinds of a papers. He has more than 200 such repos. But, as far as I know, he never actually tries to reproduce the results in the paper or run at any kind of scale. Note, that in the readme he points people to other projects.
FruityWelsh t1_j2covdi wrote
it'll be interesting if something like petal.ml can help with this. The human reinforcement and getting gpu processing parts that is.
lucidrage t1_j2e7pgv wrote
Just Blockchain it and use the rewards tokens for api consumption
Whiteboinebony t1_j2evf1c wrote
How would you prevent people from giving bad responses?
Southern-Trip-1102 t1_j2ffir3 wrote
As long as the net responses are good shouldn't it still work albeit less efficiently? not talking bout block
[deleted] t1_j2eyxu3 wrote
[deleted]
[deleted] t1_j2f8o4v wrote
[deleted]
[deleted] t1_j2dzkvw wrote
[deleted]
Viewing a single comment thread. View all comments