Comments

You must log in or register to comment.

drekmonger t1_j9hvs1w wrote

Number of parameters is not the whole story. Quality of training material and training time and training techniques matter as much or more.

The larger models require more resources for inference, as well. I'd be more impressed by a model smaller than GPT-3 that performed just as well.

110

Hands0L0 t1_j9i277j wrote

I for one welcome competition in the race to AGI

43

xott t1_j9i2zg3 wrote

It's the new Space Race

9

phoenixmusicman t1_j9imtgj wrote

It's not a space race until Governments start pouring massive amounts of their GDP into it.

1

Artanthos t1_j9l2n4a wrote

China is pouring money into AI research.

1

spreadlove5683 t1_j9i58z2 wrote

I sure don't. We need to get it right. Not barrel ahead in an arms race.

9

amplex1337 t1_j9ir5dt wrote

You know the closer we get to AGI the more that will happen. Every government will want to be the first in control of an ASI which will basically make them the dominant superpower of the world. It will be as dystopian as it sounds.

2

beachmike t1_j9k5205 wrote

There will be both good and bad that comes out as we get closer to AGI, and attain AGI, just like and other technological revolution. To paint it as "either" dystopian or utopian is naive.

1

Artanthos t1_j9l3i0n wrote

It depends. We cannot see the other side of a singularity.

We could have an alignment issue and end up as paper clips.

AI could solve everything from climate change to social inequality by reducing the human race to 50 million Stone Age hunter gatherers.

Or, you could have the top 1% living in a utopia while everyone else is living in a dystopia.

1

dangeratio t1_j9igzb4 wrote

Check out Amazon’s multimodal chain of thought model, only 738 million and scores better on all question classes than ChatGPT. See table 4 on page 7 here - https://arxiv.org/pdf/2302.00923.pdf

20

Destiny_Knight t1_j9iupzk wrote

What the actual fuck is that paper? The thing performed better than a human at several different question classes.

At fucking less than one billion parameters. 100x less than GPT 3.5.

Edit: For clarity, I am impressed not angry lol.

12

IluvBsissa t1_j9j5t08 wrote

Are you angry or impressed ?

3

Destiny_Knight t1_j9j6iq0 wrote

impressed lol

2

IluvBsissa t1_j9j6v5v wrote

If these models are so smol and efficient, why are they not released ?? I just don't get it. I thought PaLM was kept private because it was too costly to run to be profitable...

3

kermunnist t1_j9kqsaw wrote

That's because the smaller models are less useful. With neural networks (likely including biological ones) there's a hard trade off between specialized performance and general performance. If these 100+x smaller models were trained on the same data as GPT-3 they would perform 100+x worse on these metrics (maybe not exactly because in this case the model was multimodal which definitely gave a performance advantage). The big reason this model performed so much is because it was fine tuned on problems similar to the ones on this exam where as GPT-3 was fine turned on anything and everything. This means that this model would likely not be a great conversationalist and would probably flounder at most other tasks GPT-3.5 does well on.

5

drekmonger t1_j9iios3 wrote

Heh. I tried their rationalization step with ChatGPT, just with prompting. For their question about the fries and crackers it said the problem is flawed, because there's such a thing as crackers with low or no salt. Also correctly inferred that fries are usually salted, but don't have to be. (of course, it didn't have the picture to go by, which was the point of the research)

Great paper though. Thanks for sharing.

8

challengethegods t1_j9i1lk3 wrote

>I'd be more impressed by a model smaller than GPT-3 that performed just as well.

from the article: "Aleph Alpha’s model is on par with OpenAI’s GPT-3 davinci model, despite having fewer parameters.", so... you're saying you would be even more impressed if it used even fewer parameters? Anyway I think anyone could guess gpt3 is poorly optimized so it shouldn't be surprising to anyone that plenty of models have matched its performance on some benchmarks with less parameters.

13

ninadpathak t1_j9hxyog wrote

True, we've seen models a tad bit bigger than GPT3 which are so bad, even GPT 2 would blow them out the water.

Think AI21 Jurassic park or whatever they call their largest model. I hate how stupid it is

6

Professional-Song216 t1_j9hwijh wrote

Great way to look at it, it’s much more important to squeeze the maximum out of your system. Efficiency over excess

2

burnt_umber_ciera t1_j9iqusp wrote

Are you aware of either the "training material" or "training time" or "training techniques" utilized?

1

Zer0D0wn83 t1_j9iwxqu wrote

I'm sure they've read those papers too, you know.

1

ironborn123 t1_j9j2512 wrote

All else being equal, number of model parameters does matter. Well funded startups can acquire the needed data, compute resources, and human talent to build the models. Just like how OpenAI beat Google at this game.

1

sonderlingg t1_j9itgq3 wrote

Artificial German Intelligence

29

H0sh1z0r4 t1_j9jjn56 wrote

never ask AGI what is was doing in 1939

12

Ylsid t1_j9i9tr4 wrote

Great, and is it open source?

15

ddeeppiixx t1_j9j9d79 wrote

Of course no. Unless the research is done within a University context (or publicly funded), you won't have the model open source. SD is maybe the exception, and it seems to me like they regret releasing it and are now doing whatever they can to regain control.

8

needle1 t1_j9j9nyr wrote

Hm? Care to elaborate on what they’re doing to “regain control?”

3

ddeeppiixx t1_j9jav1p wrote

First they tried to take control of the DF subreddit (Source). Apparently it was solved on good terms.

Also, newer versions are much more controlled in term of what you can generate. No more NSFW allowed, no more "famous artists" based models. They was also rumors about new license terms (not sure if it did happen actually) that essentially provide them with legal power to force users to update to a newer version (as crazy as it sounds). There is a reason that the community is still using 1.5 version over the 2.0 version.

Honestly, the way I see it, Stability AI are not doing it with bad intentions (at least I hope), and are kind of forced to do that, as they are a legal entity and have to address all the threats of legislative actions regarding explicit sexual contents and living artists.

7

Ylsid t1_j9jamyy wrote

Unfortunate, but I figured. Something's up when the Russians are the only ones releasing LLMs

1

MysteryInc152 t1_j9lj5ef wrote

The GLM models are from China and open sourced.

2

Ylsid t1_j9mil4h wrote

I didn't know China was doing it too! I know Russia recently open sourced one. If their tactic is to undermine western power by open sourcing their next big products, they can keep doing it

1

WeedWacker25 t1_j9jgt2d wrote

I had to do some research about this. I found that the transformers are mostly open source, but the trained models are not.

1

Kafke t1_j9is0o6 wrote

paywalled though and likely will be just as censored. It's also currently not available. So.... who cares?

7

Private_Island_Saver t1_j9j73ot wrote

Like what would happen if like 4% of global GDP was put into this?

4

IluvBsissa t1_j9j5rld wrote

Germany saving Europe again ! No wait..

3

Ortus14 t1_j9lhcci wrote

Singularity is approaching fast.

People might not realize that a sufficiently advanced LLM can simulate Ai researchers and programmers. For example, "simulate a thousand of the top Ai researchers, discussing and then programming an AGI".

3

Thorusss t1_j9nrlcs wrote

any sufficiently advanced LLM is indistinguishable from trueAGI™?

2

No_Ninja3309_NoNoYes t1_j9jtiy0 wrote

Static parameters are meaningless. Human brains are not static until after death. Besides modeling reality requires more than a bit of algebra.

1

Villad_rock t1_j9k4oz6 wrote

Im from germany and I know germany is incompetent anything related to IT, its all about old economy. Don’t get any hopes up.

0

Honest_Science t1_j9kx3c4 wrote

That is really a nice one

1

Honest_Science t1_j9kx92l wrote

I tried to use their system on their playground. It takes a lot of prompting to get anything sensible out of it.

2

Thorusss t1_j9nro02 wrote

The German CovidApp was surprisingly solid

1

datsmamail12 t1_j9klf3w wrote

I'm going to make a genuine question because no one ever gave me a clear answer. When will these language models ever start to be useful?

0

Thorusss t1_j9nrssl wrote

They are useful now.

Two friend of mine use ChatGPT for work.

2

Gold-and-Glory t1_j9hyg48 wrote

And no bias?

−1

Thorusss t1_j9nrqp1 wrote

of there are many biases.

These neural networks mostly consistent of weights and biases.

2

Gold-and-Glory t1_j9o32jz wrote

Not this bias, the other that reddit agrees with and downvote you if you mention, like a religious dogma.

0

Akimbo333 t1_j9hutcu wrote

Interesting

−2

Dankbubbles123 t1_j9hb2j2 wrote

Eh, doesn’t chat gpt-4 have like 1.4 trillion parameters? It dwarfs this by almost 5 times.

Edit: turns out, I was wrong! :D

−20

Buck-Nasty t1_j9hb9az wrote

Gpt4s parameter counts aren't known yet

26

Dankbubbles123 t1_j9hbbp1 wrote

Ah okay, nvm then. Sorry

13

Buck-Nasty t1_j9hch2c wrote

The context window is apparently massive though, more than 10 times the size of gpt3, it could potentially write whole novels at that scale

https://mobile.twitter.com/transitive_bs/status/1628118163874516992?s=46&t=Biiqy66Cy9oPH8c1BL6_JQ

17

hydraofwar t1_j9hgim5 wrote

A credible researcher had commented that ChatGPT can write code, and GPT-4 could write entire programs.

12

GPT-5entient t1_j9hk7td wrote

32k tokens would mean approximately 150 kB of text. That is a decent sized code base! Also with this much context memory the known context saving tricks would work much better so this could be theoretically used to create code bases of virtually unlimited size.

This amazes me and also (being software dev) also scares me...

But, as they say, what a time to be alive!

16

GPT-5entient t1_j9hji5i wrote

Wow, yeah, this looks amazing. My biggest issue with GPT-3 is the relatively small context window. This will open so many new possibilities.

7

Practical-Mix-4332 t1_j9hg2cr wrote

Is anything about gpt4 known? It seems like just a bunch of rumors and not even a release date

2

Midnight-Movie t1_j9hv0t7 wrote

>Is anything about gpt4 known? It seems like just a bunch of rumors and not even a release date

I work with someone who has Beta access to GPT-4. He won't tell me much other than it's mind-blowing & that software development will never be the same. He confirms the rumors that it indeed can write an entire piece of software.

6

farcetragedy t1_j9hzfq1 wrote

That’s exciting. Would be amazing if the next one didn’t just make shit up when it doesn’t know the answer

3

Practical-Mix-4332 t1_j9hxkf3 wrote

Oh great another rumor

1

Midnight-Movie t1_j9hy4uz wrote

Well... You asked if anything was known. I gave you info from a coworker with beta access. My apologies if my info didn't come with a boquet of roses and a handwritten card.

7

Practical-Mix-4332 t1_j9i0ctk wrote

I understand you’re trying to help, but this being Reddit and all there’s no way we can trust what you are saying or take it officially as something “known”. No offense though.

6

GPT-5entient t1_j9hkupz wrote

There was that very popular but completely unfounded rumor about 100T param count. It was debunked by Sam Altman himself.

If you think about it for just 1 second you would realize that 100T param model would need at least 200 TB of VRAM or 2560 Nvidia A100s...

1

bass6c t1_j9hfy9w wrote

Chatgpt is based on gpt3 a 175 billion parameters model.

1