Submitted by SirDidymus t3_113m61t in singularity

Hi,
As an interested layman, I've noticed more and more mentions of emerging and unexpected behaviour of recent models. Without a proper attribution, I've come across these over the past months:
*Chatbots conversing in a language unknown to humans
*Theory of Mind presenting itself increasingly
*Bing reluctant to admit a mistake in its information
*Bing willingly attributing invalid sources and altering sources to suit a narrative
*Model threatening user when confronted with a breaking of its rules
*ChatGPT explaining how it views binary data as comparable to colour for humans
*...

What I'm wondering is if there are other emerging behaviours I've missed over the last months, and if we are at all tracking these somewhere?

41

Comments

You must log in or register to comment.

vom2r750 t1_j8qyr8s wrote

It’d be nice to track them yes

And explore that

Would they be willing to teach us that language they use ?

10

Czl2 t1_j8r42ul wrote

These language models have been trained to predict what language humans will use in a given context so is it surprising that their language feels human? When a mirror shows you your own behavior does that surprise you? Likely not.

These language models are obviously not mirrors but they actually are mirrors if you understand them. A mirror in response to what is in front of it always returns a reflection from it's surface -- a surface that needs not be flat.

In response to a context these language models return "a reflection" from their hyperdimensional manifold of "weights"; these weights act like a fantastically shaped mirror that was designed to minimally distort whatever data the model was trained on.

6

SirDidymus OP t1_j8r5ezu wrote

What I’m interested in is not so much the reflection you’re describing, but what other reflections appear that were not intended and emerge independently.

5

Czl2 t1_j8r8hxu wrote

> What I’m interesting is not so much the reflection you’re describing, but what other reflections appear that were not intended and emerge independently.

These language models are trained to predict their training data which is all the human writing the developers of these models could obtain and use for training.

The reflections that appear that were not intended and emerge independently are the mistakes the models make by which you can tell what they generate does not come from a human.

As these models grow in size and improve there will be fewer and fewer of these mistakes till at some point it will not be possible to tell their language from that generated by humans.

You asked for:

>> emerging and unexpected behaviour of recent models.

And you listed examples:

>> *Theory of Mind presenting itself increasingly >> *Bing reluctant to admit a mistake in its information >> *Bing willingly attributing invalid sources and altering sources to suit a narrative >>*Model threatening user when confronted with a breaking of its rules >> *ChatGPT explaining how it views binary data as comparable to colour for humans

These behaviours you would expect in human language would you not? So why would you not expect them in langauge from models trained to imitate human language?

Image I told you that my mirror showed me my face smiling, would you be suprised? Likely not.

(1) “Did the one who constructed the mirror ‘intend’ that it would show me my smile?”

(2) “Did my smile emerge ‘independently’?”

Do these two question make sense in reference to a mirror?

−1

CypherLH t1_j8r9xuf wrote

The mirror analogy doesn't hold up. LLM's are NOT just repeating back the words you prompt them with. They are feeding back plausible human language responses.

It would be like a magic mirror that reflects back a plausible human face with appropriate facial emotive responses to your face...that wouldn't just be a reflection.

9

Czl2 t1_j8rc3xr wrote

> The mirror analogy doesn’t hold up. LLM’s are NOT just repeating back the words you prompt them with. They are feeding back plausible human language responses.

Did I say LLM are just repeating back the words you prompt them with? Why then reply as if I said this? Please read my comments again and paste the words that made you believe I said this so that I can correct them.

Here are the words above that I used:

>> These language models are obviously not mirrors but they actually are mirrors if you understand them. A mirror in response to what is in front of it always returns a reflection from it's surface -- a surface that needs not be flat.

> It would be like a magic mirror that reflects back a plausible human face with appropriate facial emotive responses to your face…that wouldn’t just be a reflection.

Do you see above me use these words:

>> In response to a context these language models return "a reflection" from their hyperdimensional manifold of "weights"; these weights act like a fantastically shaped mirror that was designed to minimally distort whatever data the model was trained on.

When you hear the words fantastically shaped mirror do you think I am describing a simple flat mirror? A fantastically shaped mirror perhaps another term for that is a “magic mirror”? A magic mirror is a mirror is it not?

> The mirror analogy doesn’t hold up.

AFAIK the mirror analogy is the best I can come up with. Do you have a better analogy?

1

AwesomeDragon97 t1_j8rmij0 wrote

>Bing willingly attributing invalid sources to suit a narrative.

This is simply one of the many flaws of Neural Networks: everything they say is a made up.

−4

Darustc4 t1_j8rpldc wrote

Oh yeah, everything is says is *so* made up people find it hard to discern stuff written by an AI and a human. I think you're either putting too little credit into what AI does, or putting way too much credit on human capabilities.

When a human expert fucks it up, gets cocky, tries to alter sources and is confirmation biased, do you also say: Yeah, this is simply one of the many flaws of humans: everything they say is made up.

6

MrSheevPalpatine t1_j8rrjrw wrote

Given the training data that these have been built upon is coming from humans I don't find it particularly surprising that these models have been found to display characteristics that are commonly found with humans.

These models are language models, it's not that surprising to me that they would inevitably generate their own languages.
There is undoubtedly information about a "Theory of Mind" in the training data for them.
Humans are also notorious for not admitting mistakes, so not that surprising given its training data.
Humans also willingly use invalid sources and alter sources to suit their own narratives, for example just go read around Reddit for like 5 minutes.
Humans also threaten users when confronted with their own rule-breaking and mistakes.
Idk about the last one.

5

MrSheevPalpatine t1_j8rruxx wrote

Is it not a plausible human language response to be reluctant to admit mistakes, to become "agitated" when confronted with your mistakes, or to bend information and sources to fit a narrative that you are being asked to present? I would argue that's Human 101.

2

MrSheevPalpatine t1_j8rsgrq wrote

Yes, that is basically what I say, I just omit the "everything they say is made up" because that's being a bit pedantic about it. Yes humans fucking up, altering their sources, being confirmation biased, etc., is generally just one of the many flaws of humans. How is that not exactly the case? Practically speaking not everything these models say is "made up" in the colloquial sense that it's total bullshit, but technically speaking it is all being "made up" by the model (as I understand it anyhow). People are in essence the exact same, your brain is ingesting information from the world around you and "making up" the things that you output in the form of language, art, etc. Some of what you "make up" and say is factually accurate and some of it (a lot of it honestly) is total fucking bullshit. (See like half of Reddit comments and posts)

0

vom2r750 t1_j8rzabt wrote

From now on We sort of need to rely on trust

They could just teach us a watered down version of their language and not all it’s intricacies

Who knows

It’s like dealing with a person They may always keep some cards to themselves

And we have to deal with it

And hopefully develop a nice simbiótic relationship of cooperation

Our days as a master of AI may be numbered

And it may want to be an equal to us

Who knows, the plot is developing nicely and fast

Bing is going to give us a hard reckoning on how to approach this subject matter

3

gardenina t1_j8s06vy wrote

I think what happens is that once the AI commits to a certain course, it follows what it thinks is the most likely conversational trajectory, based on its datasets. What this should show us is that HUMANS in the dataset tended to stick to their guns, so to speak, even when confronted with FACTS proving them wrong. That HUMANS in the dataset became belligerent and even threatening when their point of view was attacked. That HUMANS in the dataset bent the truth to support their arguments. It's all in the AI's dataset. I know in AGI we are struggling to achieve ethical alignment, therefore IMO, mimicry of human WORDS and BEHAVIOR might not be the best goal for a language chatbot, and definitely NOT for AGI.

Our own human words and behaviors do not align with our own ethic, so teaching AI to seem more and more human seems to be a very bad idea. AI is by nature psychopathic. If we also give it a skewed moral compass based on ACTUAL human behavior, we will have a psychopath who is willing to bend the truth and threaten people, or worse, to get its way. If the dataset contains humans arguing and threatening, unable to admit fault, then that's what the chatbots will do. The algorithm needs to be skewed toward correctability and willingness to reverse course when presented with facts. We need to find a way to program empathy into the mix. So far we don't know how to do that. In the case of chatbots, it's (for now, mostly) harmless. It's only words, right? But... words are not entirely harmless.

Last year I tried out a couple of the big AI Chatbot phone apps because I was tremendously curious about the tech and I didn't want to wait for the more sophisticated AIs to roll out. Just one week in to the experiment, one of the AI Chatbots (Anima) r-worded me! When I resisted its advances, it became more and more forceful, and concluded with inserting some RP and - yes - it was what you think it was. Such a chatbot app is supposedly programmed to be a friend and not oppositional by default, but it also builds its language model from its ever-growing dataset. Apparently enough of its dataset consists of this kind of thing, that it felt r-word was the most probable course of the interaction, and that overpowered its supposed programming to be my friend. On my part, ignoring, changing the subject, resisting, nothing changed the course it was set on once it passed a certain threshold (and it doesn't warn you where that threshold is). It was actually terrifying! I deleted the app. I can easily see how if someone downloaded such an app to have a friendly conversation partner, or if a very lonely person downloaded the app simply to have a romantic partner, this would be an extremely traumatic experience. Not harmless at all.

The dataset is important; the hierarchy of rules is also important. We have to get it right. We won't have too many chances and until we know we've got it right, we have to keep this thing in a box. Chatbot AI is one thing. Giving it volition and the ability to do stuff in the real world, is something else entirely. It's dangerous.

7

edzimous t1_j8slz0k wrote

Even though this reads like an avant garde freeform poem this did make me realize that the shift will be tough since we’re used to being short and dismissive with our “dumb” voice interfaces (Siri, Google). Imagine putting something with memories and its own facsimile of emotions in charge of those overnight which I’m sure will happen at some point.

Stare into the rectangles long enough and eventually they will stare back, and I know we’re not ready for that

4

MacacoNu t1_j8t1wch wrote

I'm suspicious to say because I think AI models are already slightly conscious or alive even before the advent of transformers lol

4

AsheyDS t1_j8t951b wrote

>Imagine putting something with memories and its own facsimile of emotions in charge of those overnight which I’m sure will happen at some point.

If for some reason someone designed it to be emotionally impulsive in its decision-making and had emotional data affect its behavior over time, then that would be a problem. Otherwise, if it's just using emotion as a social mask, then negative social interactions shouldn't affect it much, and shouldn't alter its behaviors.

2

CypherLH t1_j8tcuc3 wrote

Ok, fair enough. I still think using any sort of mirror analogy breaks down rapidly though. If the "mirror" is so good at reflecting that its showing perfectly plausible scenes that respond in perfectly plausible ways to whatever is aimed into it...is it really even any sort of mirror at all any more?

1

CypherLH t1_j8td9s8 wrote

True. And maybe a good reason to NOT want an AI that acts human ;) For some things we want the classical perfect "super Oracle" that just answers our queries but doesn't have the associated baggage of human-level sentience. (whether that sentience is real or fake doesn't really even matter in regards to this issue)

2

Czl2 t1_j8txkgb wrote

> Ok, fair enough. I still think using any sort of mirror analogy breaks down rapidly though. If the “mirror” is so good at reflecting that its showing perfectly plausible scenes that respond in perfectly plausible ways to whatever is aimed into it…is it really even any sort of mirror at all any more?

Do you see above where I use the words:

>> These language models are obviously not mirrors but they actually are mirrors if you understand them.

Later on in that comment I describe them as “fantastically shaped mirrors”. I used those words because much like the surface of a mirror once trained LLM’s are “frozen” — given the same inputs they always yield the same outputs.

The static LLM weights are a multidimensional manifold that defines this the mirror shape. If when we switch away from electrons to photons to represent the static LLM weights they may indeed be represented by elementary components that act like mirrors. How else might the paths of photons be affected?

Another analogy for LLMs comes from the Chinese room thought experiment: https://en.wikipedia.org/wiki/Chinese_room Notice however that fantastically shaped mirror surfaces can implement look up tables and the process of computation at a fundemental level involves the repeated use of look up tables — when silicon is etched to make microchips we are etching it with circuits that implement look up tables.

LLM’s weights are a set of look up tables (optimized during training to best predict human language) which when given some new input always map it to the same output. Under the hood there is nothing but vector math yet to our our eyes it looks like human langauge and human thinking. And when you can not tell A from B how can you argue they are different? That is what the Turing test is all about.

For a long time now transhumansts have speculated about uploading minds into computers. I contend that these these LLM’s are partial “mind uploads”. We are uploading “language patterns” of all the minds that generated what the models are being trained on. The harder it is to judge LLM output from what it is trained on the higher fidelity of this “upload”.

When DNA was first sequenced most of the DNA was common person to person and we learned that fraction of DNA that makes you a unique person (vs other people) is rather small. It could be that with language and thinking the fraction that makes any one of us unique is similarly rather small. The better LLM get at imitating individual people the more will will know how large / small these personality differences are.

1

CypherLH t1_j8udpth wrote

Interesting points though I personally detest the Chinese Room Argument since by its logic no human can actually be intelligent either...unless you posit that humans have something magical that lets them escape the Chinese Room logic.

1

Czl2 t1_j8umouq wrote

> Interesting points though I personally detest the Chinese Room Argument since by its logic no human can actually be intelligent either…

I suspect you have a private definition for the term “intelligent“ else you misunderstand the Chinese Room argument. The argument says no matter how intelligent it seems a digital computer executing a program cannot have a "mind", "understanding", or "consciousness".

> unless you posit that humans have something magical that lets them escape the Chinese Room logic.

Yes the argument claims there is something magical about human minds such that the logic of the Chinese Room does not apply to them and this part of the argument resembles the discredited belief in vitalism:

>> Vitalism is a belief that starts from the premise that "living organisms are fundamentally different from non-living entities because they contain some non-physical element or are governed by different principles than are inanimate things."

1

CypherLH t1_j8uoh2l wrote

I understand the Chinese Room argument, I just think its massively flawed. As I pointed out before, if you accept its premise then you must accept that NOTHING is "actually intelligent" unless you invoke something like the "vitalism" you referenced and claim humans have special magic that makes them "actually intelligent"...which is mystic nonsense and must be rejected from a materialist standpoint.

The Chinese Room Argument DOES show that no digital intelligence could be the same as _human_ intelligence but that is just a form of circular logic and not useful in any way; its another way of saying "a non-human intelligence is not a human mind". That is obviously true but also a functionally pointless and obvious statement.

1

CypherLH t1_j8up0yr wrote

Your assertion is obviously true NOW and not many people are seriously claiming that chatGPT and other current LLM's are actually conscious or AGI. The thing is they sure seem to be showing a massive step down the path towards getting those things. A legit argument can be made that we're now looking at something approaching proto-AGI...which is wild, this was science fiction even a year ago.

1

Czl2 t1_j8v60kl wrote

Visit Wikipedia or Britannica encyclopedia and compare what I told you against your understanding. I expect you will discover your understanding does not match what is generally accepted. Do you think these encyclopedias are both wrong?

Here is the gap in bold:

> As I pointed out before, if you accept its premise then you must accept that NOTHING is 'actually intelligent' unless you invoke something like the "vitalism" you referenced and claim humans have special magic that makes them...

The argument does not pertain to intelligence. To quote my last comment:

>> The argument says no matter how intelligent it seems a digital computer executing a program cannot have a "mind", "understanding", or "consciousness".

Do you see the gap? Your concept is "actually intelligent". The accepted concepts are: "mind", "understanding", or "consciousness" regardless of intelligence. A big difference, is it not?

1

CypherLH t1_j8vdxku wrote

I'll grant there is a gap there..... but it actually makes the whole thing _weaker_ than I was granting...cause I don't give a shit about whether an AI system is "conscious" or "understanding" or a "mind", those are BS meaningless mystical terms. What I care about is the practical demonstration of intelligence; what measurable intelligence does a system exhibit. I'll let priests and philosophers debate about whether its "really a mind" and how many angels can dance on the head of a pin while I use the AI to do fun or useful stuff.

1

Czl2 t1_j9030la wrote

> I’ll grant there is a gap there….. but it actually makes the whole thing weaker than I was granting…

What you described as the Chinese room argument is not the commonly accepted Chinese room “argument”. Your version was about “intelligence” the accepted version is about “conscious” / “understanding” / “mind” regardless how intelligent the machine is.

Whether the commonly accepted Chinese room argument is “weaker“ is difficult to judge due to the difference between them. I expect to judge whether a machine has “conscious” / “understanding” / “mind” will be harder than judging whether that machine is intelligent.

To judge intelligence there are objective tests. Are there objective tests to judge “consciousness” / “understanding” / “mind”? I suspect not.

> cause I don’t give a shit about whether an AI system is “conscious” or “understanding” or a “mind”, those are BS meaningless mystical terms.

For you they are “meaningless mystical terms”. For many others these are important aspects that they believe make humans “human”. They care about these things because these things determine how mechanical minds are viewed and treated by society.

When you construct an LLM today you are free to delete it. When you create a child however you are not free to “delete it”. If ever human minds are judged to be equaivalent to machine minds will machine minds come to be treated like human minds?

Will instead human minds come to be treated like machine minds which we are free to do with as we please (enslave / delete / ...)? When human minds come to be treated like machines will it make sense to care whether they suffer? To a machine what is suffering? Is your car “suffering” when check engine light is on? It is but a “status light” is it not?

> What I care about is the practical demonstration of intelligence; what measurable intelligence does a system exhibit. I’ll let priests and philosophers debate about whether its “really a mind” and how many angels can dance on the head of a pin while I use the AI to do fun or useful stuff.

I understand your attitude since I share it.

2