32k tokens would mean approximately 150 kB of text. That is a decent sized code base! Also with this much context memory the known context saving tricks would work much better so this could be theoretically used to create code bases of virtually unlimited size.

This amazes me and also (being software dev) also scares me...

But, as they say, what a time to be alive!

GPT-5entient t1_j9hkupz wrote on February 22, 2023 at 12:50 AM

#1,909,915

Replying to Buck-Nasty (#1,908,715)

There was that very popular but completely unfounded rumor about 100T param count. It was debunked by Sam Altman himself.

If you think about it for just 1 second you would realize that 100T param model would need at least 200 TB of VRAM or 2560 Nvidia A100s...

BlueMoon_Josh t1_j9hs5v1 wrote on February 22, 2023 at 1:44 AM

#1,910,866

Replying to Dankbubbles123 (#1,908,691)

It was 5 Morbillion parameters

[deleted] t1_j9htnkv wrote on February 22, 2023 at 1:55 AM

#1,911,041

[removed]

Akimbo333 t1_j9hutcu wrote on February 22, 2023 at 2:04 AM

#1,911,180

Interesting

Midnight-Movie t1_j9hv0t7 wrote on February 22, 2023 at 2:05 AM

#1,911,208

Replying to Practical-Mix-4332 (#1,909,323)

>Is anything about gpt4 known? It seems like just a bunch of rumors and not even a release date

I work with someone who has Beta access to GPT-4. He won't tell me much other than it's mind-blowing & that software development will never be the same. He confirms the rumors that it indeed can write an entire piece of software.

drekmonger t1_j9hvs1w wrote on February 22, 2023 at 2:11 AM

#1,911,298

Number of parameters is not the whole story. Quality of training material and training time and training techniques matter as much or more.

The larger models require more resources for inference, as well. I'd be more impressed by a model smaller than GPT-3 that performed just as well.

VeganPizzaPie t1_j9hwb1w wrote on February 22, 2023 at 2:15 AM

#1,911,362

Gesundheit

Professional-Song216 t1_j9hwijh wrote on February 22, 2023 at 2:16 AM

#1,911,380

Replying to drekmonger (#1,911,298)

Great way to look at it, it’s much more important to squeeze the maximum out of your system. Efficiency over excess

Practical-Mix-4332 t1_j9hxkf3 wrote on February 22, 2023 at 2:24 AM

#1,911,518

Replying to Midnight-Movie (#1,911,208)

Oh great another rumor

ninadpathak t1_j9hxyog wrote on February 22, 2023 at 2:27 AM

#1,911,574

Replying to drekmonger (#1,911,298)

True, we've seen models a tad bit bigger than GPT3 which are so bad, even GPT 2 would blow them out the water.

Think AI21 Jurassic park or whatever they call their largest model. I hate how stupid it is

ML4Bratwurst t1_j9hxzv8 wrote on February 22, 2023 at 2:27 AM

#1,911,579

It's not about the size;)

Midnight-Movie t1_j9hy4uz wrote on February 22, 2023 at 2:28 AM

#1,911,591

Replying to Practical-Mix-4332 (#1,911,518)

Well... You asked if anything was known. I gave you info from a coworker with beta access. My apologies if my info didn't come with a boquet of roses and a handwritten card.

Gold-and-Glory t1_j9hyg48 wrote on February 22, 2023 at 2:31 AM

#1,911,633

And no bias?

farcetragedy t1_j9hzfq1 wrote on February 22, 2023 at 2:38 AM

#1,911,770

Replying to Midnight-Movie (#1,911,208)

That’s exciting. Would be amazing if the next one didn’t just make shit up when it doesn’t know the answer

Practical-Mix-4332 t1_j9i0ctk wrote on February 22, 2023 at 2:45 AM

#1,911,886

Replying to Midnight-Movie (#1,911,591)

I understand you’re trying to help, but this being Reddit and all there’s no way we can trust what you are saying or take it officially as something “known”. No offense though.

challengethegods t1_j9i1lk3 wrote on February 22, 2023 at 2:54 AM

#1,912,015

Replying to drekmonger (#1,911,298)

>I'd be more impressed by a model smaller than GPT-3 that performed just as well.

from the article: "Aleph Alpha’s model is on par with OpenAI’s GPT-3 davinci model, despite having fewer parameters.", so... you're saying you would be even more impressed if it used even fewer parameters? Anyway I think anyone could guess gpt3 is poorly optimized so it shouldn't be surprising to anyone that plenty of models have matched its performance on some benchmarks with less parameters.

Hands0L0 t1_j9i277j wrote on February 22, 2023 at 2:59 AM

#1,912,081

Replying to drekmonger (#1,911,298)

I for one welcome competition in the race to AGI

xott t1_j9i2zg3 wrote on February 22, 2023 at 3:05 AM

#1,912,159

Replying to Hands0L0 (#1,912,081)

It's the new Space Race

spreadlove5683 t1_j9i58z2 wrote on February 22, 2023 at 3:22 AM

#1,912,431

Replying to Hands0L0 (#1,912,081)

I sure don't. We need to get it right. Not barrel ahead in an arms race.

Ylsid t1_j9i9tr4 wrote on February 22, 2023 at 3:59 AM

#1,912,928

Great, and is it open source?

fumblesmcdrum t1_j9ib2lt wrote on February 22, 2023 at 4:10 AM

#1,913,038

Replying to xott (#1,912,159)

latent space race

dangeratio t1_j9igzb4 wrote on February 22, 2023 at 5:03 AM

#1,913,730

Replying to drekmonger (#1,911,298)

Check out Amazon’s multimodal chain of thought model, only 738 million and scores better on all question classes than ChatGPT. See table 4 on page 7 here - https://arxiv.org/pdf/2302.00923.pdf

[deleted] t1_j9iin71 wrote on February 22, 2023 at 5:20 AM

#1,913,939

[deleted]

drekmonger t1_j9iios3 wrote on February 22, 2023 at 5:20 AM

#1,913,943

Replying to dangeratio (#1,913,730)

Heh. I tried their rationalization step with ChatGPT, just with prompting. For their question about the fries and crackers it said the problem is flawed, because there's such a thing as crackers with low or no salt. Also correctly inferred that fries are usually salted, but don't have to be. (of course, it didn't have the picture to go by, which was the point of the research)

Great paper though. Thanks for sharing.

Twinkies100 t1_j9ik1qr wrote on February 22, 2023 at 5:34 AM

#1,914,085

Replying to ML4Bratwurst (#1,911,579)

r/thatswhatshesaid

phoenixmusicman t1_j9imtgj wrote on February 22, 2023 at 6:04 AM

#1,914,380

Replying to xott (#1,912,159)

It's not a space race until Governments start pouring massive amounts of their GDP into it.

bluehands t1_j9inzl3 wrote on February 22, 2023 at 6:17 AM

#1,914,491

Replying to Twinkies100 (#1,914,085)

You obviously do not know my ex.

Liktwo t1_j9io4z9 wrote on February 22, 2023 at 6:18 AM

#1,914,505

Replying to Gold-and-Glory (#1,911,633)

NEIN!

burnt_umber_ciera t1_j9iqusp wrote on February 22, 2023 at 6:50 AM

#1,914,824

Replying to drekmonger (#1,911,298)

Are you aware of either the "training material" or "training time" or "training techniques" utilized?

amplex1337 t1_j9ir5dt wrote on February 22, 2023 at 6:54 AM

#1,914,857

Replying to spreadlove5683 (#1,912,431)

You know the closer we get to AGI the more that will happen. Every government will want to be the first in control of an ASI which will basically make them the dominant superpower of the world. It will be as dystopian as it sounds.

Kafke t1_j9is0o6 wrote on February 22, 2023 at 7:04 AM

#1,914,925

paywalled though and likely will be just as censored. It's also currently not available. So.... who cares?

musing2020 t1_j9it8e1 wrote on February 22, 2023 at 7:19 AM

#1,915,047

Replying to drekmonger (#1,911,298)

Achieving GPT 175B Level Accuracy with a 10x More Efficient Model

https://sambanova.ai/blog/achieving-gpt-175b-level-accuracy-with-a-10x-more-efficient-model/

sonderlingg t1_j9itgq3 wrote on February 22, 2023 at 7:22 AM

#1,915,070

Artificial German Intelligence

Destiny_Knight t1_j9iupzk wrote on February 22, 2023 at 7:38 AM

#1,915,180

Replying to dangeratio (#1,913,730)

What the actual fuck is that paper? The thing performed better than a human at several different question classes.

At fucking less than one billion parameters. 100x less than GPT 3.5.

Edit: For clarity, I am impressed not angry lol.

Zer0D0wn83 t1_j9iwxqu wrote on February 22, 2023 at 8:06 AM

#1,915,339

Replying to drekmonger (#1,911,298)

I'm sure they've read those papers too, you know.

Ziggy5010 t1_j9j1id1 wrote on February 22, 2023 at 9:09 AM

#1,915,723

Replying to spreadlove5683 (#1,912,431)

Agreed

ironborn123 t1_j9j2512 wrote on February 22, 2023 at 9:18 AM

#1,915,763

Replying to drekmonger (#1,911,298)

All else being equal, number of model parameters does matter. Well funded startups can acquire the needed data, compute resources, and human talent to build the models. Just like how OpenAI beat Google at this game.

IluvBsissa t1_j9j5rld wrote on February 22, 2023 at 10:09 AM

#1,916,101

Germany saving Europe again ! No wait..

IluvBsissa t1_j9j5t08 wrote on February 22, 2023 at 10:10 AM

#1,916,103

Replying to Destiny_Knight (#1,915,180)

Are you angry or impressed ?

Destiny_Knight t1_j9j6iq0 wrote on February 22, 2023 at 10:20 AM

#1,916,154

Replying to IluvBsissa (#1,916,103)

impressed lol

[deleted] t1_j9j6qqt wrote on February 22, 2023 at 10:23 AM

#1,916,175

Replying to amplex1337 (#1,914,857)

[deleted]

IluvBsissa t1_j9j6v5v wrote on February 22, 2023 at 10:24 AM

#1,916,185

Replying to Destiny_Knight (#1,916,154)

If these models are so smol and efficient, why are they not released ?? I just don't get it. I thought PaLM was kept private because it was too costly to run to be profitable...

Private_Island_Saver t1_j9j73ot wrote on February 22, 2023 at 10:28 AM

#1,916,202

Like what would happen if like 4% of global GDP was put into this?

ddeeppiixx t1_j9j9d79 wrote on February 22, 2023 at 10:58 AM

#1,916,412

Replying to Ylsid (#1,912,928)

Of course no. Unless the research is done within a University context (or publicly funded), you won't have the model open source. SD is maybe the exception, and it seems to me like they regret releasing it and are now doing whatever they can to regain control.

MysteryInc152 t1_j9j9dvt wrote on February 22, 2023 at 10:58 AM

#1,916,413

Replying to Practical-Mix-4332 (#1,909,323)

32k context window it seems.

https://mobile.twitter.com/transitive_bs/status/1628118163874516992?s=20

needle1 t1_j9j9nyr wrote on February 22, 2023 at 11:02 AM

#1,916,441

Replying to ddeeppiixx (#1,916,412)

Hm? Care to elaborate on what they’re doing to “regain control?”

Ylsid t1_j9jamyy wrote on February 22, 2023 at 11:14 AM

#1,916,532

Replying to ddeeppiixx (#1,916,412)

Unfortunate, but I figured. Something's up when the Russians are the only ones releasing LLMs

ddeeppiixx t1_j9jav1p wrote on February 22, 2023 at 11:17 AM

#1,916,556

Replying to needle1 (#1,916,441)

First they tried to take control of the DF subreddit (Source). Apparently it was solved on good terms.

Also, newer versions are much more controlled in term of what you can generate. No more NSFW allowed, no more "famous artists" based models. They was also rumors about new license terms (not sure if it did happen actually) that essentially provide them with legal power to force users to update to a newer version (as crazy as it sounds). There is a reason that the community is still using 1.5 version over the 2.0 version.

Honestly, the way I see it, Stability AI are not doing it with bad intentions (at least I hope), and are kind of forced to do that, as they are a legal entity and have to address all the threats of legislative actions regarding explicit sexual contents and living artists.

WeedWacker25 t1_j9jgt2d wrote on February 22, 2023 at 12:23 PM

#1,917,205

Replying to Ylsid (#1,912,928)

I had to do some research about this. I found that the transformers are mostly open source, but the trained models are not.

Ylsid t1_j9jgvso wrote on February 22, 2023 at 12:24 PM

#1,917,209

Replying to WeedWacker25 (#1,917,205)

Boo

H0sh1z0r4 t1_j9jjn56 wrote on February 22, 2023 at 12:50 PM

#1,917,522

Replying to sonderlingg (#1,915,070)

never ask AGI what is was doing in 1939

Hands0L0 t1_j9jqo4a wrote on February 22, 2023 at 1:49 PM

#1,918,348

Replying to fumblesmcdrum (#1,913,038)

Fuck dude, that's clever

No_Ninja3309_NoNoYes t1_j9jtiy0 wrote on February 22, 2023 at 2:11 PM

#1,918,686

Static parameters are meaningless. Human brains are not static until after death. Besides modeling reality requires more than a bit of algebra.

Benderisgreat4 t1_j9ju7ap wrote on February 22, 2023 at 2:16 PM

#1,918,762

Replying to sonderlingg (#1,915,070)

Surely it won't be funny...

Villad_rock t1_j9k4oz6 wrote on February 22, 2023 at 3:59 PM

#1,920,133

Im from germany and I know germany is incompetent anything related to IT, its all about old economy. Don’t get any hopes up.

beachmike t1_j9k5205 wrote on February 22, 2023 at 4:02 PM

#1,920,174

Replying to amplex1337 (#1,914,857)

There will be both good and bad that comes out as we get closer to AGI, and attain AGI, just like and other technological revolution. To paint it as "either" dystopian or utopian is naive.

InsideATurtlesMind t1_j9k57j3 wrote on February 22, 2023 at 4:03 PM

#1,920,193

Replying to sonderlingg (#1,915,070)

Künstliche Allgemeine Intelligenz 🇩🇪

datsmamail12 t1_j9klf3w wrote on February 22, 2023 at 5:44 PM

#1,921,997

I'm going to make a genuine question because no one ever gave me a clear answer. When will these language models ever start to be useful?

kermunnist t1_j9kqsaw wrote on February 22, 2023 at 6:17 PM

#1,922,629

Replying to IluvBsissa (#1,916,185)

That's because the smaller models are less useful. With neural networks (likely including biological ones) there's a hard trade off between specialized performance and general performance. If these 100+x smaller models were trained on the same data as GPT-3 they would perform 100+x worse on these metrics (maybe not exactly because in this case the model was multimodal which definitely gave a performance advantage). The big reason this model performed so much is because it was fine tuned on problems similar to the ones on this exam where as GPT-3 was fine turned on anything and everything. This means that this model would likely not be a great conversationalist and would probably flounder at most other tasks GPT-3.5 does well on.

Honest_Science t1_j9kx3c4 wrote on February 22, 2023 at 6:55 PM

#1,923,392

Replying to Villad_rock (#1,920,133)

That is really a nice one

Honest_Science t1_j9kx92l wrote on February 22, 2023 at 6:56 PM

#1,923,413

Replying to Honest_Science (#1,923,392)

I tried to use their system on their playground. It takes a lot of prompting to get anything sensible out of it.

Artanthos t1_j9l2n4a wrote on February 22, 2023 at 7:29 PM

#1,924,167

Replying to phoenixmusicman (#1,914,380)

China is pouring money into AI research.

myusernameblabla t1_j9l2v3q wrote on February 22, 2023 at 7:30 PM

#1,924,200

Replying to InsideATurtlesMind (#1,920,193)

He Kai, sag mal was gescheites.

Artanthos t1_j9l3i0n wrote on February 22, 2023 at 7:34 PM

#1,924,295

Replying to beachmike (#1,920,174)

It depends. We cannot see the other side of a singularity.

We could have an alignment issue and end up as paper clips.

AI could solve everything from climate change to social inequality by reducing the human race to 50 million Stone Age hunter gatherers.

Or, you could have the top 1% living in a utopia while everyone else is living in a dystopia.

SupportstheOP t1_j9ldmse wrote on February 22, 2023 at 8:36 PM

#1,925,689

Replying to Benderisgreat4 (#1,918,762)

Inb4 The Germans literally create Funny-Bot

Ortus14 t1_j9lhcci wrote on February 22, 2023 at 8:58 PM

#1,926,097

Singularity is approaching fast.

People might not realize that a sufficiently advanced LLM can simulate Ai researchers and programmers. For example, "simulate a thousand of the top Ai researchers, discussing and then programming an AGI".

MysteryInc152 t1_j9lj5ef wrote on February 22, 2023 at 9:09 PM

#1,926,310

Replying to Ylsid (#1,916,532)

The GLM models are from China and open sourced.

Ylsid t1_j9mil4h wrote on February 23, 2023 at 1:07 AM

#1,930,272

Replying to MysteryInc152 (#1,926,310)

I didn't know China was doing it too! I know Russia recently open sourced one. If their tactic is to undermine western power by open sourcing their next big products, they can keep doing it