Viewing a single comment thread. View all comments

freeman_joe t1_j8yffs1 wrote

Reply to comment by UseNew5079 in Microsoft Killed Bing by Neurogence

When open source models surface it will be fun.

44

TunaFishManwich t1_j8yuujl wrote

It will be a long time before you or I will be able to run these models. They are WAY beyond anything consumer hardware would be able to run, and will remain so for at least a decade.

11

TeamPupNSudz t1_j8z8928 wrote

A significant amount of current AI research is going into how to shrink and prune these models. The ones we have now are horribly inefficient. There's no way it takes a decade before something (granted, maybe less impressive) is available to consumer hardware.

32

rnobgyn t1_j8zvfrk wrote

Exactly - the first computers took up a whole room

12

Takadeshi t1_j93gacq wrote

Doing my undergrad thesis on this exact topic :) with most models, you can discard up to 90% of their weights and have a similar performance with only about 1-2% loss of accuracy. Turns out that when training models they tend to learn better when dense (i.e a large quantity of non-zero weights), but in implementation they tend to have some very strong weights, but a large number of "weak" weights that contribute to the majority of the parameter count but very little to the actual accuracy of the model, so you can basically just discard them. There are also a few other clever tricks you can do to reduce the number of params by a lot; for one, you can cluster weights into groups and then make hardware-based accelerators to carry out the transformation for each cluster, rather than treating each individual weight as a multiplication operation. This paper shows that you can reduce the size of a CNN-based architecture by up to 95x with almost no loss of accuracy.

Of course this relies on the weights being public, so we can't apply this method to something like ChatGPT, but we can with stable diffusion. I am planning on doing this when I finish my current project, although I would be surprised if the big names in AI weren't aware of these methods, so it's possible that the weights have already been pruned (although looking specifically at stable diffusion, I don't think they have been).

1

Lower_Praline_1049 t1_j9az9sj wrote

Yo that’s super interesting and definitely the way forward. Tell me more about your project!

1

Takadeshi t1_j9b3c3l wrote

Thank you! :) Early stages right now, just finished the literature review section and am starting implementation, I'm going to try and publish it somewhere when it's done if I can get permission from my university. I'm definitely going to see what I can do with stable diffusion once it's done, would love to get it running on the smallest device possible

1

freeman_joe t1_j8yv3d2 wrote

Famous last words? I remember when diskette was most advanced tech with 1.44 MB. now we have 44 TB disks available.

25

TheChurchOfDonovan t1_j8ztibo wrote

Seriously. Anyone who thinks they have any idea what comes next is lying… the only thing we have to go off of is the historical trend of extremely rapid returns to scale

7

TunaFishManwich t1_j8yv9pr wrote

Get back to me when you have 500k cores and exabytes of ram on your laptop. It’s going to be awhile.

5

timshel42 t1_j8yx6wb wrote

couldnt people into the opensource thing still host it on their own powerful servers and allow others to use it?

15

IonizingKoala t1_j8z0znz wrote

Of course "regular" people will be able to use it, the same way regular people get access to state of the art quantum computers and supercomputers.

What TunaFish is saying is unlikely is for everyone to be able to run it in their own home. LLM engineers concur, moore's law isn't quite there anymore.

If you mean server time, that's obviously possible (I can run loads of GPT-3 right now for $5). But that's not exactly running it at home, if you know what I mean.

7

Soft-Goose-8793 t1_j90cxmk wrote

Could a LLM be run like torrents or bitcoin or TOR is? We could have LLM miners or something.

A small company could rent server time in some country with lax laws, to run an unlobotomised version of a LLM from, and people could subscribe to that service instead of dealing with microsoft or openai.

4

IonizingKoala t1_j91lzfv wrote

The thing is that in LLM training, memory and IO bandwidth are the big bottlenecks. If every GPU has to communicate via the internet, and wait for the previous person to be done first (because pipelined model parallel is still sequential, despite the name), it's gonna finish in like 100 years. Another slowdown is breaking up each layer into pieces that individual GPUs can handle. Currently they're being spread out to 2000-3000 huge GPUs and there's already significant latency. What happens if there's 20,000 small-sized GPUs? Each layer is gonna be spread out so thin the latency is gonna be enormous. The final nail in the coffin is that neural network architecture changes a lot, and each time the hardware has to be reconfigured too.

Crypto mining didn't have these problems because 1. bandwidth was important, but not the big bottleneck, 2. "layers" could fit on single GPUs, and if they couldn't (on a 1050ti for example), it was very slow, and 3. the architecture didn't really change, you just did the same thing over and over.

Cerebras is trying to make a huge chip that disaggregates memory from compute, and also bundles compute into a single chip, saving energy and time. The cost for the CS-2 system is around $3-10 million for the hardware alone. It's pretty easy for a medium-sized startup to offer some custom LLM. I mean there's already dozens, if not hundreds of startups starting to do that right now. It's expensive. All complex computing is expensive, we can't really get around that, we can only slowly make improvements.

4

Deadboy00 t1_j91v5cp wrote

⭐️ Refreshing to see someone who knows their shit on this sub. Where do you see this tech going for general use cases? Everything I read tells me it just isn’t ready. What is MS’s endgame for implementing all this?

2

IonizingKoala t1_j927ast wrote

Classical computing / engineering advances are good at repetitive actions. A human can never put in a screw 10,000x times with 0.01mm precision or calculate 5000 graphs by hand without quitting. But it's bad at actions that require flexibility and adaptation, like what chefs, dry cleaners, or software engineers do.

LLM and AI attempt to bridge that gap, by allowing for computers to be flexible and adapt. The issue is that we don't know how much they're actually capable of adapting, and how fast. We know humans have a limit; nobody in the world fluently speaks & reads & writes in more than 10 languages (probably not even >5). Do computers have a limit? How expensive is that limit? Because materials, manufacturing, and energy are finite resources.

What do you define as general use cases? Receptionist calls? (already done, one actually fooled me into thinking it was a human) Making a cup of coffee?

Anything repetitive will be automated, if it's economical to do so. You probably still make tea by hand, because it's a waste of money to buy a $100 tea maker (and they probably dont even exist because of how easy it is to make tea). But you probably have a blender, because it's a huge waste of time and energy to chop stuff yourself.

I think humans (on this subreddit especially) tend to underestimate how much finances & logistics play into tech. We've had flying cars since the 90s, yet they'll never "transform transportation" like sci-fi said, because it's dumb to have a car-plane hybrid.

We might get an impressive AGI in the next few years, but it might be so expensive that it's just used the same way we use robots: you get the cutting-edge stuff you'll never see cause it's in some factory, the entertaining stuff like the cruise ship robo-bartenders, and the consumer-grade crap like Roombas. AGI might also kill millions of humans but I know nothing about that side of AI so I won't comment.

Btw, I'm not an expert, I'm just a software engineer that likes talking to AI engineers.

2

Deadboy00 t1_j929dnb wrote

Dig it. I have a similar background and have had conversations with interns at ai firms like Palantir that have been doing the shit you described for years. I agree. It’s too expensive to train ai’s for every specific use case. That’s what I meant by “general”.

I think the most fascinating part of this current trend is seeing the general populations reaction to these tools being publicly released. And that’s what’s at the heart of my question…if the tech is unreliable, expensive, and generally not scalable …why is MS doing this?

I mean obviously they are generating data on user interactions to retrain the model but I can’t imagine that being the silver bullet.

Google implemented plenty of ai tech in their search engine but nobody raises an eyebrow, but now all this? I’m rambling at this point but it’s just not adding up in my brain ¯_(ツ)_/¯

2

IonizingKoala t1_j92caso wrote

Microsoft is similar to Google; both like to experiment and make cool stuff, but Microsoft doesn't cut the fat and likes to put out products which are effectively trash under the guise of open beta. Heck, even their hardware is sometimes like that, while Google's products are typically solid, even if they have a short lifespan.

Going back to New Bing, it's genuinely innovative. It just sucks. That's not paradoxical, because a lot of new stuff does suck. We just rarely see it, because companies like Google are generally disciplined enough.

Most "deep" innovations are developed over decades. That development could be secretive (military tech), or open (SpaceX, Tesla), but it takes time nonetheless. Microsoft leans towards the latter, Google the former.

The latter is generally more efficient, if your audience is results-focused, not emotions-focused. AI is pretty emotionally charged, so maybe the former method is better.

2

Deadboy00 t1_j92j3s2 wrote

That’s a good take. I think Google’s discipline is rooted in its size and prominence. There’s too much to lose. MS on the other hand wants to desperately be the king of the hill again.

2

IonizingKoala t1_j92nqhq wrote

The funny thing is though, Microsoft has a market cap 58% larger than Alphabet, not just Google. We're left wondering why Microsoft continually takes these weird risks in the consumer space when they can just play it safe like most other big players. None of their (21st century) success has been due to quirky disruptions, it's usually been slow and steady progress (Surface, Office, Enterprise, Cloud, Consulting).

Yet with stuff like Edge, Windows 11, etc, it's been a mess. I'm not 12 anymore, I prefer stable products over the shinest new thing, and Windows 11 has been a collosal disappointment.

1

duboispourlhiver t1_j90jyl8 wrote

True. Progress in AI is even more impressive than Moores law was, so maybe it will run at home because of progress on LLM and not progress on microelectronics

1

IonizingKoala t1_j91jdx7 wrote

LLMs will not be getting smaller. Getting better ≠ getting smaller.

Now, will really small models be run on some RTX 6090 ti in the future? Probably. Think GPT-2. But none of the actually useful models (X-Large, XXL, 10XL, etc) will be accessible at home.

1

duboispourlhiver t1_j91k8jk wrote

I disagree

1

IonizingKoala t1_j91m923 wrote

Which part? LLM-capable hardware getting really really cheap, or useful LLMs not growing hugely in parameter size?

1

duboispourlhiver t1_j91x4ao wrote

I meant that IMHO, gpt3 level LLMs will have fewer parameters in the future.

2

IonizingKoala t1_j924sbn wrote

I see. Even at a 5x reduction in parameter size, that's still not enough to run on consumer hardware (we're talking 10b vs. 500m) , but I recognize what you're trying to say.

2

freeman_joe t1_j90pioh wrote

We have access to quantum computers already we call them human brains. We can see nature solved that it is only matter of time when we do the same with tech and it will be available for home usage.

1

Zestybeef10 t1_j8z4gib wrote

moore's law is dead

we'd need photonic cpus for this to become a consumer reality.

1

Nervous-Newt848 t1_j8zsbnq wrote

I'm glad I'm not the only one who thinks this...

2

Zestybeef10 t1_j906uys wrote

lol these kids downvoting are clueless about the logistics.

4

Nervous-Newt848 t1_j907keb wrote

Electrons produce too much heat, Photonics don't... Photons travel faster than electrons... 3D photonic chips would be possible because of the lack of heat... Photonic chips also use significantly less electricity

Advantages all across the board

7

mckirkus t1_j9d3pfw wrote

The first consumer 1TB SSD came out in 2013. Ten years later I'm considering getting a 2TB drive.

1

onyxengine t1_j8z672v wrote

We though the same about the capability we are seeing from AI. The cloud is pretty accessible.

10

TunaFishManwich t1_j8z6s3n wrote

The cloud is extremely accessible. If I want thousands of cores and mountains of ram, it’s available to me in minutes. That’s not the problem. To even run one of these models, let alone train it, would be hundreds of thousands of dollars per day, and yes, if I had deep enough pockets I could easily do it on AWS or Azure.

It just requires far too much computing power for regular people to attain, regardless of what you know.

The energy requirements alone are massive. The software is far more ready for regular joes to use it than the hardware is. That’s going to take a decade or two to catch up.

11

Nervous-Newt848 t1_j8zsgss wrote

That's why we need photonic computing... It literally solves all these problems...

6

Brashendeavours t1_j8zzmxu wrote

Please stop making up terms.

−7

Nervous-Newt848 t1_j903faz wrote

Please go educate yourself

6

Brashendeavours t1_j92iidg wrote

lol Just stop. Articles from Buzzfeed and YouTube shorts don’t count.

Optical computer is so far away it’s not even funny. Quantum is so much closer, has been worked on for long and with more effort applied.

You would have to be a moron to abandon progress to switch to a new development.

4

duboispourlhiver t1_j90jull wrote

Photonic computing is a type of computing technology that uses light or photons to process and transmit information instead of relying on electrons, which is how traditional electronic computing systems work. In a photonic computing system, light waves are used to carry data and perform calculations, instead of relying on electric currents.

In a photonic computing system, information is encoded in pulses of light that travel through optical fibers or other optical components such as waveguides and switches. These signals are then processed using photonic circuits, which use elements such as mirrors, lenses, and beam splitters to manipulate and combine the light waves.

Photonic computing has the potential to be faster and more energy-efficient than traditional electronic computing, because photons can travel faster and use less energy than electrons. It is also less susceptible to interference and noise, which can degrade signal quality in electronic systems. However, photonic computing is still in the research and development phase, and there are many technical challenges that must be overcome before it can become a practical technology for everyday use.

5

TheOGCrackSniffer t1_j92255n wrote

isnt this kinda similar to Li-Fi? what a powerful combination it would be to combine the two

2

iNstein t1_j8zaz9w wrote

Was reading about a new type of model and they indicated that it should run on a 4090. I think a lot of people should be able to afford that. In a couple of years, that should be a common thing.

5

Scarlet_pot2 t1_j8zdrjh wrote

i heard models like binggpt and chatgpt were much smaller then models like gpt3. thats why you were able to have long form conversations with them, and how they could look up information and spit it out fast. Because it didn't take much computationally to run. thats why these chat models were seen as tack ons to bing by microsoft

1

Roubbes t1_j90pi7f wrote

An only text model requires more powerful hardware than Stable Diffusion? By how much?

1

SWATSgradyBABY t1_j91rcl5 wrote

Ten years ago none of this existed Ten years for efficiencies to improve to consumer level seems out of step with agreed upon tech progressions.

−1

TunaFishManwich t1_j92cwmt wrote

10 years is about right to go from “this will take $100,000 a day to run” to “this can run on my machine”.

1

SWATSgradyBABY t1_j92imwe wrote

We talk about exponentials only in the abstract. As soon as an actual tech is on the table, we go right back to linear prediction.

1

sachos345 t1_j93rur7 wrote

People can help with OpenAssistant RLHF dataset in their page, the more the merrier.

1