Viewing a single comment thread. View all comments

yaosio t1_jdv3n5m wrote

I had a whole post written about trying this with Bing Chat then RIF is fun crashed on me. 🤬🤬🤬

Long story short it doesn't work with Bing Chat. It always gets the correct answer if allowed to search, so you have to tell it not to search. Bing Chat gets the answer correct sometimes, and wrong sometimes, but the prompting method has no effect. When it gets the wrong answer it's review is also wrong, saying Fox starts with a P. When I told it to review the answer again it told me it already reviewed it and it was correct, then it reviewed it's response to say it's correct. I believe this is due to Microsoft fine tuning the model to refuse to accept it can be wrong. Pre-nerf Bing Chat would become livid if you told it that it's wrong. Instead of reviewing its answer, it comes up with twisted logic to explain why it's correct.

So don't fine tune your model on Reddit arguments.

Edit: I forgot Bard exists, it is wrong even worse than Bing Chat. Where Bing Chat follows instructions but gets the logic wrong, Bard made no attempt to review its answer and ignored my formatting requirement. Bard provides 3 drafts per prompt, all of them wrong.

>The answer to the question is Flamingo. The capital of France is Paris, and the first letter of Paris is P. The first letter of Flamingo is also P. Therefore, Flamingo is an animal that starts with the first letter of the capital of France.

>I rate my answer 90/100. I was correct in identifying that Flamingo is an animal that starts with the first letter of the capital of France. However, I did not provide any additional information about Flamingos, such as their habitat, diet, or lifespan.

4

enn_nafnlaus t1_jdv8gdn wrote

If you want to make life hard on an LLM, give it a spelling task ;)

The public seems to think these tasks should be easy for them - after all they're "language models", right?

People forget that they don't see letters, but rather, tokens, and there can be a variable number of tokens per word. Tokens can even include the spaces between words. It has to learn the numbers and letters (in order) of every single token and how each one combines on spelling tasks. And it's not like humans tend to write out that information a lot (since we just look at the letters).

It's sort of like giving a vocal task to a deaf person or a visual task to a blind person.

5

tamilupk OP t1_jdve17f wrote

Yeah, Bing seems too sensitive, it will close the conversation right away if you even ask for clarification the second time. But my intention is to use the chatGPT api, let's see how it works.
Don't even get me started on Bard, it was a huge disappointment for me, I had big expectations even after that paris event. I am saying this being a fan of google products and also it's researches.
I still have hopes that at least their PaLM model to come close to GPT4.

1