Disclaimer: This thread is partly based on my personal hands-on experience, and partly based on extrapolation. It's a discussion meant to explore potentials roads AGI could go with this specific context. Context on my work here.

>> If you had a proto-AGI, how much would you let it interact with other humans?

Definitions

First let's do some defining: By "proto-AGI", I mean an ACE (Autonomous Cognitive Entity), that has the express purpose of becoming an AGI. By ACE, I mean any architecture / piece of code that is capable of setting its goal, making internal decisions, taking action into the world, and reflecting on all of those aspects, at least semi-autonomously.

AGI Approach

The way I view it, models providers like OpenAI give access through their API to "raw intelligence", be it semantic, visual or otherwise. The rest of the work is to shape this raw intelligence into a smart architecture, using memory as a central hub. The memory is where the "being" happens for your agent: It stores the experiences, maps, learnings and Identity of your specific agent (the "You are your memories" of psychology).

One way to go with developing with Cognitive Archtecture is resetting memories every session (the behavior that ChatGPT exhibits). The other approach is to have an AGI remember everything and have everything influence it.

Problems

The downside to this is that all experiences will influence its behavior. This has several implications:

Bad Learning: Cognitive ACEs have many flaws in their behaviors. They might be credule, influenced, or otherwise corrupted with bad interactions. Similar to how a child could. Not limiting human contact during the learning phasemeans that you are loosing control on its learning. Learning could go in a negative direction, and malintentioned actors could harm your ACE on purpose.
Data privacy: There is a security risk if you share personal data with your ACE. It might repeat the knowledge to other people.
Costs: Running ACEs tend to be quite expensive compute-wise, using dozens to hundred LLM calls for each single input. Running them at scale is very costly.

Solutions

I imagine several ways one could go:

Self-protection: Most obvious, but hardest solution: Make your ACE know what is a secret, how to keep them, and how to not be manipulated. This will be an uphill battle, and is unlikely to be solved soon without severely limiting the AI.
Solo learning: One way would be to have the ACE only interact with you at all. It would not answer to anybody but yourself, on channels you control.
Select tall-play: Letting it have full interactions, but only with a select group (your friend, your company). These might happen at OpenAI & such (I have no idea about this, don't quote me ^^)
Select broad-play: One other approach would be to let your ACE have access to everyone, but with severe restriction, for example by limiting access to a few interaction each time, and deactivating the memory retrieving aspects. I have to say, the results of this would look remarkably close to what Bing is displatying with Sydney.
Covert interactions: Through a persona and social accounts, interactions could be made online while pretending to be a human.

Let me know what you think! I might have skipped several solutions, and problems, or got things wrong. Also let me know if you have questions!

Comments

You must log in or register to comment.

turnip_burrito t1_ja2m4x7 wrote on February 26, 2023 at 11:09 AM

At a glance this looks good.

Also you want a mechanism to make sure once you have the right values or behavior, your AI won't just forget it over time and take on a new personality. So you need a way to crystallize older patterns of thought and behavior.

Lesterpaintstheworld OP t1_ja2mobc wrote on February 26, 2023 at 11:16 AM

At this stage this is actually surprisingly easy. People have to intentionally be very manipulativr and creative to get ChatGPT to "behave badly" now. Without those "bad actors", this behavior would almost never happen.

One easy way to do that is to preface each prompt with a reminded of values / objectives / personality. Every thought is then colored with this. The only moment I had alignment problems is when I made obvious mistakes in my code.

I'm actually working on making the ACE like me less, because he has a tendency to take everything I say as absolute truths ^^

turnip_burrito t1_ja2ngmw wrote on February 26, 2023 at 11:26 AM

That's good.

Maybe also in the future, for an extra layer of safety, when you can several LLMs together, you can use separate LLMs "judges". The judges can have memory refreshed every time you interact with the main one, and can screen the main LLM for unwanted behavior. They can do this by taking the main LLM's tentative output string as their own input, and use that to stop the main LLM from misbehaving.

Lesterpaintstheworld OP t1_ja2nq8n wrote on February 26, 2023 at 11:30 AM

Whoo, forks & merges, with a consensus layer. I like that

DizzyNobody t1_ja2pthy wrote on February 26, 2023 at 11:57 AM

What about running it in the other direction: have the judge LLMs screen user input/prompts. If the user is being mean or deceptive, their prompts never make it to the main LLM. Persistently "bad" users get temp banned for increasing lengths of time, which creates an incentive for people to behave when interacting with the LLM.

turnip_burrito t1_ja2q7t6 wrote on February 26, 2023 at 12:02 PM

That's also interesting. It's like building a specialized "wariness" or "discernment" layer into the agent.

This really makes one wonder which kinds of pre-main and post-main processes (like other LLMs) would be useful to have.

DizzyNobody t1_ja2uka9 wrote on February 26, 2023 at 12:51 PM

I wonder if you can combine the two - have a judge that examines both input and output. Perhaps this is one way to mitigate the alignment problem. The judge/supervisory LLM could be running on the same model / weights as the main LLM, but with a much more constrained objective - prevent the main LLM from behaving in undesirable ways either by moderating its input and even by halting the main LLM when undesirable behaviour is detected. Perhaps it could even monitor the main LLM's internal state, and periodically use that to update its own weights.

turnip_burrito t1_ja6re1h wrote on February 27, 2023 at 6:43 AM

I think if we had the right resources, this would make a hell of a research paper and conference talk.

AsheyDS t1_ja3kmyz wrote on February 26, 2023 at 4:18 PM

Addressing your problems individually...

Bad Learning: This is a problem of bad data. So it either needs to be able to identify and discard bad data as you define it, or you need to go through the data as it learns it and make sure it understands what is good data and bad data, so it can gradually build up recognition for these things. Another way might be AI-mediated manual data input. I don't know how the memory in your system works, but if data can be manually input, then it's a matter of formatting the data to work with the memory. If you can design a second AI (or perhaps even just a program) to format data input into it so it is compatible with your memory schema, then you can perhaps automate the process. But that's just adding more steps in-between for safety. How you train it and what you train it on is more of a personal decision though.

Data Privacy: You won't get that if it's doing any remote calls that include your data. Keeping it all local is the best you can do. Any time anyone has access to it, that data is vulnerable. If it can learn to selectively divulge information, that's fine, but if the data is human-readable then it can be accessed one way or another, and extracted.

Costs: Again, you'll probably need to keep it local. LLM isn't the best way to go in my opinion, but if you intend on sticking with it, you'll want something lightweight. I think Meta is coming out with a LLM that can run on a single GPU, so I'd maybe look into that or something similar. That could potentially solve or partially solve two of your issues.

Lesterpaintstheworld OP t1_ja3rt9d wrote on February 26, 2023 at 5:06 PM

Thanks for the answers. What alternatives do you have from LLMs? The single GPU is interesting Indeed, it would allow me to let it run 24/7

AsheyDS t1_ja3zkxk wrote on February 26, 2023 at 5:56 PM

>What alternatives do you have from LLMs?

I don't personally have an alternative for you, but I would steer away from just ML and more towards a symbolic/neurosymbolic approach. LLMs are fine for now if you're just trying to throw something together, but they shouldn't be your final solution. As you layer together more processes to increase its capabilities, you'll probably start to view the LLM as more and more of a bottleneck, or even a dead-end.

Lesterpaintstheworld OP t1_ja402r0 wrote on February 26, 2023 at 5:59 PM

One of the difficulties have been sewing together different types of data (text & images, other percepts, or even lower levels). I wonder what approaches could be relevant

AsheyDS t1_ja46bxe wrote on February 26, 2023 at 6:40 PM

I'm not sure if you can find anything useful looking into DeepMind's Gato, which is 'multi-modal' and what some might consider 'Broad AI'. But the problem with that and what you're running into is that there's no easy way to train it, and you'll still have issues with things like transfer learning. That's why we haven't reached AGI yet, we need a method for generalization. Looking at humans, we can easily compare one unrelated thing to another, because we can recognize one or more similarities. Those similarities are what we need to look for in everything, and find a root link that we can use as a basis for a generalization method (patterns and shapes in the data perhaps). It shouldn't be that hard for us to figure out, since we're limited by the types of data that can be input (through our senses) and what we can output (mostly just vocalizations, and both fine and gross motor control). The only thing that makes it more complex is how we combine those things into new structures. So I would stay more focused on the basics of I/O to figure out generalization.

IluvBsissa t1_ja2my8h wrote on February 26, 2023 at 11:20 AM

Duuude, why don't you do a PhD and get peer-reviewed for your project ? You're preaching to a choir of mostly ignorants here.

helpskinissues t1_ja2qjao wrote on February 26, 2023 at 12:05 PM

If they're here preaching to ignorants, it's for a reason.

Lesterpaintstheworld OP t1_ja2ntuq wrote on February 26, 2023 at 11:31 AM

Never thought of this as an option, thanks

just-a-dreamer- t1_ja2socx wrote on February 26, 2023 at 12:31 PM

This is more on a Ben Goertzel level.