Viewing a single comment thread. View all comments

IntrepidRestaurant88 t1_jcakcq8 wrote

I wonder how good gpt-4 is at booting itself. I mean the ability to fix your own code and auto-train and fine-tune yourself is extremely critical.

55

Nanaki_TV t1_jcat0g4 wrote

It was tried in the ARC trial. So they tried to create a self-replicating AI that could get out of control and lead to AGI

40

Enough_Evening46422 t1_jcavccy wrote

I just read a book about this. "The Metamorphosis of Prime Intellect." More fiction than science but still a pretty interesting singularity book if anyone's interested. Kinda fucked up though so be warned lol

21

7734128 t1_jcctphe wrote

Of course GPT-4 is nowhere close to that level yet, but I love the idea that the way to see if an AI system can escape its confines and go rogue is to give it a bunch of money and encourage it to do so.

That's like testing the max weight capacity of a bridge by driving multiple overloaded trucks on it.

6

jugalator t1_jcbczxl wrote

That's eerie, like how they think this is worth trying now, that it may be within reach. They aren't like redditors falling for the hype either but experts in their field.

5

TheRidgeAndTheLadder t1_jcc8fva wrote

They've mostly left this sub at this stage, but most of the experts on this are also redditors

8

MustacheEmperor t1_jcc5851 wrote

>Preliminary assessments of GPT-4’s abilities, conducted with no task-specific finetuning, found it ineffective at autonomously replicating, acquiring resources, and avoiding being shut down “in the wild.”

>ARC found that the versions of GPT-4 it evaluated were ineffective at the autonomous replication task based on preliminary experiments they conducted. These experiments were conducted on a model without any additional task-specific fine-tuning, and fine-tuning for task-specific behavior could lead to a difference in performance. As a next step, ARC will need to conduct experiments that (a) involve the final version of the deployed model (b) involve ARC doing its own fine-tuning, before a reliable judgement of the risky emergent capabilities of GPT-4-launch can be made

So, don't start collecting canned food yet.

3

TallOutside6418 t1_jcctehd wrote

Yeah, I'm sure the first few efforts to modify bat corona viruses so they could replicate in humans failed too.

3

GeneralZain t1_jccwk9m wrote

how many more times will they have to try though...

2

Lawjarp2 t1_jcb2j0i wrote

It scores at the 5th percentile on codeforces. It can barely solve medium hard questions on leetcode.

Most software development doesn't need one to be good at anything mentioned above. But they do indicate ones ability to do leap of logic required to solve something like AGI. GPT-4 is not ready for that yet.

14

throwawaydthrowawayd t1_jcb48bo wrote

Unfortunately, they didn't tell us anything about how they did the codeforces test. It sounds like they just tried zero-shot, had GPT-4 see the problem and immediately write code to solve it. But that's not humans solve codeforces problems, we sit down and think through the problem. In a more real world scenario, I think GPT-4 would do way better at codeforces. Still not as good as a human, but definitely way better than their test.

12

SoylentRox t1_jcb6ljc wrote

They could fine tune it, use prompting or multiple pass reasoning, give it an internal python interpreter. Lots of options that would more fairly produce results closer to what this generation of compute plus model architecture is capable of.

I don't know how well that will do but i expect better than median human as these are the result google got who were using a weaker model than gpt-4.

6

MustacheEmperor t1_jcc5crl wrote

Our CTO and I tried getting it to write some relatively challenging Swift as a benchmark example and it just repeatedly botched it. It would produce something close to working code, but kept insisting on using libraries that didn't have support for what it was trying to do with them, which was also an issue with 3.5.

3

HurricaneHenry t1_jccs4wi wrote

I haven’t tried chatGPT-4, but I was very unimpressed with Bing, which is powered by GPT-4, when asking it to learn Gradio’s API and write some simple code using it. It made multiple weirdly simple errors even with guidance in a short session. It did apologize though.

3