E_Snap t1_jdsht5g wrote on March 26, 2023 at 9:05 PM

Reply to comment by farmingvillein in [D] GPT4 and coding problems by enryu42

I guess I could have worded it better. What I mean to say is that once they’ve output something, it’s in the record. There’s no pausing to think and go through a few different iterations of the sentence, or evaluating if what they’re about to say has faults. They just output directly, instead of reading what they’re about to output and vetting it.

farmingvillein t1_jdsmsh9 wrote on March 26, 2023 at 9:41 PM

Gotcha. Yeah, that is presumably where the power of inner monologue / step-by-step / reflection come from.

Will be cool to see that (presumably) progressively systematized.

sdmat t1_jdt85pr wrote on March 27, 2023 at 12:24 AM

Yes, it's amazing to see something as simple as "Assess the quality of your answer and fix any errors" actually work.

Or for more subjective results such as poetry "Rate each line in the preceding poem" then "Rewrite the worst lines".

yaosio t1_jdtf57p wrote on March 27, 2023 at 1:21 AM

The neat part is it doesn't work for less advanced models. The ability to fix its own mistakes is an emergent property of a sufficiently advanced model. Chain of thought prompting doesn't work in less advanced models either.

sdmat t1_jdtj3ia wrote on March 27, 2023 at 1:54 AM

Definitely, I was extremely skeptical of LLMs as a path to AGI but this makes it look possible. Maybe even likely.

yaosio t1_jdtvycq wrote on March 27, 2023 at 3:47 AM

It's really neat how fast this stuff has been going. I remember when OpenAI claimed GPT-2 was too dangerous to release, which is amusing now because the output of GPT-2 is so bad. But when I used a demo that would write news articles from a headline I thought it was absolutely amazing. Then I, and most of the public, forgot about it.

Then GPT-3 comes out, and AI Dungeon used it before OpenAI censored it sonhsrd AI Dungeon stopped using it. The output was so much better than GPT-2 that I couldn't believe I liked anything GPT-2 made. I told people this was the real deal, it's perfect and amazing! But it goes off the rails very often, and it doesn't understand how a story should be told so it just does whatever.

Then ChatGPT comes out, which we now know is something like a finetune of GPT-3.5. You can chat, code, and it writes stories. The stories are not well written, but they follow the rules of story telling and don't go off the rails. It wasn't fine tuned on writing stories like AI Dungeon did with GPT-3.

Then Bing Chat comes out, which turned out to be based on GPT-4. It's story writing ability is so much better than ChatGPT. None of that "once upon a time" stuff. The stories still aren't compelling, but way better than before.

I'm interested in knowing what GPT-5 is going to bring. What deficiencies will it fix, and what deficiencies will it have? I'd love to see a model that doesn't try to do everything in a single pass. Like coding, even if you use chain of thought and self reflection GPT-4 will try to write the entire program in one go. Once something is written it can't go back and change it if it turns out to be a bad idea, it is forced to incorporate it. It would be amazing if a model can predict how difficult a task will be and then break it up into manageable pieces rather than trying to do everything at once.

sdmat t1_jdtytyy wrote on March 27, 2023 at 4:16 AM

> Like coding, even if you use chain of thought and self reflection GPT-4 will try to write the entire program in one go. Once something is written it can't go back and change it if it turns out to be a bad idea, it is forced to incorporate it. It would be amazing if a model can predict how difficult a task will be and then break it up into manageable pieces rather than trying to do everything at once.

I've had some success leading it through this in coding with careful prompting - have it give a high level outline, check its work, implement each part, check its work, then put the thing together. It will even revise the high level idea if you ask it to and update a corresponding implementation in the context window.

But it definitely can't do so natively. Intuitively it seems unlikely that we can get similar results to GPT4+human with GPT4+GPT4 regardless of how clever the prompting scheme is. But the emergent capabilities seen already are highly surprising, so who knows.

Really looking forward to trying these schemes with a 32K context window.

Add code execution to check results and browsing to to get library usage right and it seems all the pieces are there for an incredible level of capability even it still needs human input in some areas.