I was reading a little bit about ChatGPT training which led me to a realization how smart of a move making it free to use actually is. We basically know that during the training ChatGPT uses human feedback, which is relatively expensive to get. However, by making it free to use and providing users an option to give feedback opens a door to massive amounts of training data for a relatively cheap price per training sample (the cost of running server). This approach is quite fascinating to me, and makes me wonder about other similar examples of this, so I would like to hear them in the comments if you have any?

Comments

You must log in or register to comment.

CriticalTemperature1 t1_j0l6du5 wrote on December 17, 2022 at 2:24 PM

Most people aren't labelling outputs as good or bad so how do they get any reward or training signals from these beta users

mvujas OP t1_j0l9aht wrote on December 17, 2022 at 2:49 PM

That is true, but it's a similar case with crowdsourcing, they have some clever things there such as honeypots and weighted expertise scores or whatever they are called in order to make the most of the data. But I would even argue that continuing a conversation is a form of positive feedback or even coming back to the website

Nameless1995 t1_j0lri08 wrote on December 17, 2022 at 5:04 PM

I just had a thought. I think resampling with "try again" button itself can be used as a feedback (a noisy signal for the "user didn't like the earlier version"). Moreover if a user switches back to the earlier sample that can be another feedback (the earlier version being preferred more). They can get a lot of data from these. I expect users to be using "try again" more frequently that upvotes/downvotes.

Aggravating-Act-1092 t1_j0lhqqx wrote on December 17, 2022 at 3:55 PM

I’d agree. You can probably even ask ChatGPT to review the follow up someone gives it and assign a score based on that.

Personally if it gives me buggy code I point it out and try to fix for example, that’s clear negative. I also sometimes write thank you to it when I’m happy with its answer.

fimari t1_j0rpmx0 wrote on December 18, 2022 at 10:29 PM

Probably the same way Google detect good search results - people stop searching when the result is good and people stop fiddle around if they have what they want.

mvujas OP t1_j0llkjy wrote on December 17, 2022 at 4:23 PM

Oh, just reading the answer again, there is actually a feedback button in the top right corner of each answer, but I would assume that even if a small percentage of users is using this button, it ends up costing less than paying people to do this manually

humanbeingmusic t1_j0nc16t wrote on December 17, 2022 at 11:46 PM

Was thinking the same thing, that is a reinforcement signal for sure, lots of other data to imply

mettle t1_j0mrkqp wrote on December 17, 2022 at 9:16 PM

lots of implicit signals to look at based on what the user does after.

30katz t1_j0m3iuj wrote on December 17, 2022 at 6:28 PM

Just analyzing questions and gleaning what could be going on would be a gold mine

I’m sure Google can come up with a lot of very profitable metrics

RandomIsAMyth t1_j0s0ydc wrote on December 18, 2022 at 11:51 PM

I don't think that's right. Human inputs are great training signals. Fine tuning chatgpt on them (basically trying to predict what the human would have said) has a pretty high value.

They are running ChatGPT for something like 100k$ a day but getting millions of data points. They think that the data they get are worth these 100k$. A new version will come soon and they will probably be able to make better and better training data out of the crowdsourcing experiment.

If supervised learning is the way to go, make the labelling large and big. For free, on the simplest website ever. I think they nailed it.

ChuckSeven t1_j0u2grw wrote on December 19, 2022 at 12:40 PM

> ut I would even argue that continuing a conversation is a form of positive feedback or even coming back to the websit

It is way cheaper to take real conversations and have a crowdworker label it for being a good conversation or a bad conversation.

jasondads1 t1_j0kzfdk wrote on December 17, 2022 at 1:19 PM

The made the same approach with dalle2 before charging for it.

mvujas OP t1_j0l0l1n wrote on December 17, 2022 at 1:30 PM

Does dalle2 use human feedback in any form other than labeling false positives? I haven't played much with dalle2 to be honest, but I can definitely see how they could have been collecting data for a future iteration of the model that may use reinforcement learning in some form.

rikliem t1_j0l8wa6 wrote on December 17, 2022 at 2:46 PM

When generating an image. The one you download they take it as positive feedback . My theory is that if you repeat a prompt twice or more they probably can label it as bad result. They could also use the enlarging of pictures after they are generated as additional feedback

DoctorFuu t1_j0lp9js wrote on December 17, 2022 at 4:49 PM

"Click all the images of cars to prove you're not a bot"

mvujas OP t1_j0lrh8u wrote on December 17, 2022 at 5:04 PM

That's a good one!

bartvanh t1_j0mkewt wrote on December 17, 2022 at 8:25 PM

I wonder how long that will stay around, since by now we must be pretty close to the point (if not over it) where a bot is better at spotting cars than meatbags

Ok_Fault8217 t1_j0o0o6k wrote on December 18, 2022 at 3:03 AM

this reminded me why Google bought reCaptcha back in the 2000’s: https://techcrunch.com/2007/09/16/recaptcha-using-captchas-to-digitize-books/amp/

it’s a pretty interesting approach to get people contributing good data back to the system

Agreeable_Bid7037 t1_j0lb3ww wrote on December 17, 2022 at 3:04 PM

ChatGPT can probably provide some similar examples lol.

akRonkIVXX t1_j0nzbkl wrote on December 18, 2022 at 2:52 AM

Pretty sure this is how the google voice to text got so good.

ChuckSeven t1_j0u2e2c wrote on December 19, 2022 at 12:39 PM

It reminded me of Tesla's data engine.

mvujas OP t1_j0v81dn wrote on December 19, 2022 at 5:52 PM

I haven’t thought much of that but after reading everything about it, it seems just like an example I was looking for! Very clever approach!

CalligrapherFine6407 t1_j0o20wo wrote on December 18, 2022 at 3:14 AM

Side Question:

Why does ChatGPT always sound so confident even when it's wrong?

Nameless1995 t1_j0olnhi wrote on December 18, 2022 at 6:21 AM

One reason for confidence-sounding responses could be that internet data (in which it is trained) generally consists of confident sounding answers. Many humans are also confidently think they are righ while being wrong. Besides it doesn't have the ability nor is it exactly trained to model "truthfulness". So it may just maintain the confident-sounding style indiscriminately whether it's speaking truth or fiction (although it can probably adopt a "less confident" attitude if explicitly asked to role play as such but then it may just be less confident indiscriminately).

While OpenAI may have found some ways to make it more cautious (not necessarily adopting less confident styles, but denying response when more "uncertain" (probably based on perplexity or something IDK exactly how the enforce cautiousness)):

See:

https://openai.com/blog/chatgpt/

> ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows. > ChatGPT sometimes writes plausible-sounding but incorrect or nonsensical answers. Fixing this issue is challenging, as: (1) during RL training, there’s currently no source of truth; (2) training the model to be more cautious causes it to decline questions that it can answer correctly; and (3) supervised training misleads the model because the ideal answer depends on what the model knows, rather than what the human demonstrator knows.

[deleted] t1_j0o2zvz wrote on December 18, 2022 at 3:23 AM

[deleted]