Submitted by RedditPolluter t3_125nchg in MachineLearning

I've seen posts claiming this. There is a paper saying that self-scrutiny feedback loops can improve the performance of GPT-4 by 30%. I've experimented with feedback loops using the API and don't doubt that this can, or in future may be able to, produce emergent behaviour. I'm no expert but my surface-level understanding of transformers is that they would not create feedback loops just from prompting and would merely just respond as if they were.

If it were true, it would have significant economical implications since creating the feedback loop separately multiplies the price each loop.

10

Comments

You must log in or register to comment.

itshouldjustglide t1_je4yj0x wrote

It seems to be capable of handling the request but it's hard to tell how much of this is just a trick of the light and whether it's actually doing the reflection. Would probably help to know more about how the model actually works.

9

Sure_Cicada_4459 t1_je5w1bh wrote

Spin off project based on reflection, apparently GPT-4 gets 20% improvement in coding tasks: https://github.com/GammaTauAI/reflexion-human-eval

People finetuning Llama using this prompt structure with much better results: https://twitter.com/Orwelian84/status/1639859947948363777?s=20

Someone already build an autonomous agent using feedback loops (not necessary related to reflexion): https://twitter.com/yoheinakajima/status/1640934493489070080

Seems to yield performance improvement up to a certain point obviously, but it's also a very basic prompt stucture overall one can image all kinds of "cognitive structures"

8

harharveryfunny t1_je50vw9 wrote

There's no indication that I've seen that it maintains any internal state from one word generated to the next. Therefore the only way it can build upon it's own "thoughts" is by generating "step-by-step" output which is fed back into it. It seems its own output is its only working memory, at least for now (GPT-4), although that's an obvious area for improvement.

7

visarga t1_je6a6w8 wrote

> its own output is its only working memory

All the fantastic feats LLMs can do are thanks to context conditioning.

1

ghostfaceschiller t1_je84v6j wrote

You are talking about two different things here.

  1. Reflexion/ReAct uses another system (like LangChain) to allow the bot to genuinely loop back over previous results to try and improve them. This indeed ends up getting you better results or outcomes in the end

  2. You can also simply tell the bot, in ur prompt, something like "before you respond, review your first draft for errors, and only output your second draft". Now, this is not what the bot will actually do, but regardless, this will often result in higher quality output, presumably bc in the training data that kind of phrase is typically associated with a certain type of answer (IE: better answers)

5

yaosio t1_je56tet wrote

There's a limit, otherwise you would be able to ask it to self-reflect on anything and always get a correct answer eventually. Finding out why it can't get the correct answer the first time would be incredibly useful. Finding out where the limits are and why is also incredibly useful.

1

Cantareus t1_je8pk04 wrote

There's no self-reflecting happening when the request to self-reflect is in the prompt. The improvement happens because the expected output after asking it to self-reflect is a more thought out response. You can get a type of reflecting by pasting the output back into the prompt with your own feedback.

3