Submitted by mvujas t3_zo5imc in MachineLearning
I was reading a little bit about ChatGPT training which led me to a realization how smart of a move making it free to use actually is. We basically know that during the training ChatGPT uses human feedback, which is relatively expensive to get. However, by making it free to use and providing users an option to give feedback opens a door to massive amounts of training data for a relatively cheap price per training sample (the cost of running server). This approach is quite fascinating to me, and makes me wonder about other similar examples of this, so I would like to hear them in the comments if you have any?
CriticalTemperature1 t1_j0l6du5 wrote
Most people aren't labelling outputs as good or bad so how do they get any reward or training signals from these beta users