ElvinRath t1_j2np305 wrote on January 2, 2023 at 6:09 PM

Reply to comment by SoylentRox in You can get used computers with 1TB RAM for around 5k - 10k UK pounds ($7k) by MrEloi

Well, today you can probably get it down to around 350 GB (fp16) so around 150.000.

And probably soon it might work well with around 175 GB with fp8 so.... around 75.000.

But yeah, for now, it's too expensive. IF fp8 works well with this it might be possible to think about building a machine for personal with second hand products in 3-5 years...

Anyway this year we'll probably get open source models with better performance than GPT 3 and far less parameters. Probably still too much for consumer GPUs anyway :(

It''s time to double vram on consumer GPUs.

Twice.

Pretty please.

SoylentRox t1_j2nu1ph wrote on January 2, 2023 at 6:40 PM

It doesn't work that way. You can't reduce precision like that without tradeoffs. Reduced model accuracy for one thing.

You can in some cases add more weights and retrain for fp16.

Int8 may be out of the question.

Also like chatGPT is like the Wright Brothers. Nobody is going to settle for an AI that can't even see or control a robot. So it's only going to get heavier in weights and more computationally expensive.

ElvinRath t1_j2o33lj wrote on January 2, 2023 at 7:38 PM

Sure, there is a tradeoff but I think that for fp16 it isn't that terrible.

For fp8 I just don't know. There is people working with int8 to fit 20B parameters in 3090/4090, but I have no idea of at what price... Just wanted to say that the posibility does exist.

I remember reading about fitting big models in low precision but it was focused in performance/memory usage, but it showed that it was a very useful technique...

Anyway I can't find it now, but I found this while looking for it, haha:

https://twitter.com/thukeg/status/1579449491316674560

They claim almost no degratation with int4 & 130B parameters.

No idea how this could apply to bigger ones, or even about the validity of the claim, but it does sound well. We would be fitting 40B parameters in a 3090 / 4090...

Anyway I think that fp8 might not be out of question at all, but we will see :P

I know that you say "chatGPT is like the Wright Brothers. Nobody is going to settle for an AI that can't even see or control a robot. So it's only going to get heavier in weights and more computationally expensive"

And...Sure, no one is going to settle for less. But consumer hardware is very far behind and people is going to try and work with what they have, for now.

And there is some interest for it. You have NovelAI, DungeonAI and KoboldAI, and people plays with them, when frankly, they work quite poorly.

I hope that with the release of good open sourced LLM with RHLF (I'm looking at you, CarperAI and StabilityAI) & this kind of techniques we start to see this tech becoming more comonplace, maybe even used in some indie games, to start pushing for more VRAM on consumer hardware. (Because if there is a need there is a way. Vram is not that expensive anyway given the prices of GPUs nowadays...)

SoylentRox t1_j2oeflk wrote on January 2, 2023 at 8:51 PM

>And...Sure, no one is going to settle for less. But consumer hardware is very far behind and people is going to try and work with what they have, for now.

No they won't. They are just going to rent access to the proper hardware. It's not that expensive.