Viewing a single comment thread. View all comments

hellschatt t1_iwjtwgq wrote

I wrote a small seminar paper back a year or 2 ago about how to test an AIs intelligence and the paradigm shift in the testing, so I feel the urge to clarify something here.

The Winograd Schema Challenge has been passed with like a 88% or so accuracy for a few years now. Previous AIs could already kinda "pass" that one...

Neither the Turing Test, nor the Winograd Schema Challenge, are a good way of determining the general, or even only language-related intelligence of an AI. They're only showing if the AI is capable of solving a certain type of task determined by those tests. Although impressive, just because a model can understand context within language doesn't mean much in terms of its "intelligence". The argumentation of the inventors of Winograd were that being able to differentiate context would be a proof of more intelligence than just being able to fool a person in a Turing Test.

But let's say GPT-4 will pass that test with a 100% score, how do you further determine the intelligence of GPT-4 after that and newer models that all pass that test? And is the AI now intelligent just because it passed it? If you go by intuition, then you already realize that the AIs still feel more like output/input than just feeling "intelligent". It's kinda not "it".

The test doesn't make much sense anymore after thinking of this question, does it?

I still have to add though, since some researchers figured out that the Winograd Schema Challenge won't be too difficult for AIs anymore, they've tried to overcome the failure to properly measure the intelligence of an AI by simply developing even a newer more difficult version of it, also called WinoGrande. Thus the continuous paradigm shift of what is considered an "intelligent" AI...

2