harharveryfunny t1_itn57pm wrote on October 24, 2022 at 9:31 PM

Reply to comment by 4e_65_6f in Large Language Models Can Self-Improve by xutw21

They increase the sampling "temperature" (amount of randomness) during the varied answer generation phase, so they will at least get some variety, but ultimately it's GIGO - garbage-in => garbage out.

How useful this technique is would seem to depend on the quality of data it was initially trained on and the quality of deductions it was able to glean from that. Best case this might work as a way to clean up it's training data by rejecting bogus conflicting rules it has learnt. Worst case it'll reinforce bogus chains of deduction and ignore the hidden gems of wisdom!

What's really needed to enable any system to self learn is to provide feedback from the only source that really matter - reality. Feedback from yourself, based on what you think you already know, might make you more rational, but not more correct!