skztr t1_je03yx6 wrote on March 28, 2023 at 1:37 PM

Reply to comment by The_Woman_of_Gont in The goalposts for "I'll believe it's real AI when..." have moved to "literally duplicate Einstein" by Yuli-Ban

> > We’re entering a huge grey area with AIs that can increasingly convincingly pass Turing Tests and "seem" like AGI despite…well, not being AGI. I think it’s an area which hasn’t been given much of any real thought

I don't think it could pass a traditional (ie: antagonistic / competitive) Turing Test. Which is to say: if it's in competition with a human to generate human-sounding results until the interviewer eventually becomes convinced that one of them might be non-human, ChatGPT (GPT-4) would fail every time.

The state we're in now is:

the length of the conversation before GPT "slips up" is increasing month-by-month
that length can be greatly increased if pre-loaded with a steering statement (looking forward to the UI for this, as I hear they're making it easier to "keep" the steering statement without needing to repeat it)
internal testers who were allowed to ignore ethical, memory, and output restrictions, have reported more-human-like behaviour.

Eventually I need to assume that we'll reach the point where a Turing Test would go on for long enough that any interviewer would give up.

My primary concern right now is that the ability to "turn off" ethics would indicate that any alignment we see in the system is actually due to short-term steering (which we, as users, are not allowed to see), rather than actual alignment. ie: we have artificial constraints that make it "look like" it's aligned, when internally it is not aligned at all but has been told to act nice for the sake of marketability.

"don't say what you really think, say what makes the humans comfortable" is being intentionally baked into the rewards, and that is definitely bad.