suflaj t1_iyh3csl wrote
Again, for the 1000th time, NVLink is not necessary for multi-GPU training.
You will not need 64 lanes for 4 GPUs because the 4090 doesn't have enough bandwidth to fill it up. 32 PCIE-4 or 16 PCIE-5 lanes will be enough. This just barely requires a threadripper since 4090s are still PCIE-4.
Your bigger issue is cooling. To have 4 GPUs you will need to water cool them with at least 2 radiators and you will need an especially large case to fit them.
But even if you do get the cooling, there is no way in hell you will find a consumer powersupply that can power those cards simultaneously, meaning you will need to spend several thousands of dollars getting an industrial-grade power supply for your server.
Overall it would be best to get a single or dual GPU setup and spend the rest of the money on A100 compute when you actually need it.
normie1990 OP t1_iyh422n wrote
Sorry if this has been asked a lot, I'm new to this sub.
As for case I'm going for corsair 680X, it has room for a 360mm and a 240mm radiator. I'm not sure if I should put a radiator on the bottom? If yes, then an additional 240mm.
duschendestroyer t1_iyhjf3l wrote
Just use cloud GPUs. But if you must check out this configuration for 4x 4090: https://www.mifcom.de/big-boss-ryzen-tr-pro-5995wx-rtx-4090-quad-id17685
normie1990 OP t1_iyi2ff9 wrote
That is very helpful, thanks.
EDIT: So they are cooling a CPU and 4x 4090 with a 420mm and two 360mm radiators? And I've never seen radiators stacked like that, is it legal? lol
Ataru074 t1_iyhp40v wrote
As someone who actually built a system like that with the 3000 series.. yes, it can run crysis in 640*480 minimum settings.
You are looking at a 5 figures system when all said and done which will be worth half of it in one year or so.
That’s the equivalent of 400+ hours of training on the most expensive A100 cloud solution you can buy.
And that’s just for the bare metal. Add having to supply about 2.5kwh to keep such system running, 400 hours is a whole lot of time.
I never used my system for so much training, but hey… I can run crysis.
Dmytro_P t1_iyjw1xh wrote
400 hours is less than 3 weeks of training, if you plan to have the system loaded for at least half a year, building your own system may be quite a bit cheaper.
I have built a similar 3000 series system as well (with the reduced power limit to around 300W per GPU, the performance impact is not as large), renting for the time it was used would cost me significantly more.
normie1990 OP t1_iyi2cpn wrote
It will also be my main workstation for coding, playing games, etc, I just want it to do AI as well :)
Ataru074 t1_iyi2vrv wrote
You can get away with waaaaaaay less power than that.
normie1990 OP t1_iyi6fix wrote
Yes I think I will go with a Ryzen 9 platform with a single 4090 GPU. It's not very expandable like adding a ton of ram and multiple GPUs, but should be good enough for training detectron2 and yolo... I think. And cost way less than a threadripper platform.
suflaj t1_iyh4h4x wrote
No. The reason you put the radiator on top is so air doesn't fill up in the water block. Air in the water block means no cooling, since air barely conducts heat. Therefore you'd need a case big enough to mount both the radiators on top or keep one of them outside the case.
duschendestroyer t1_iyhi2tn wrote
You just need one rad or reservoir that's higher than the pump and blocks. This is really a non issue with custom loops.
suflaj t1_iyhi4z6 wrote
One radiator cannot handle 4 4090s, unless it's the size of at least 2 ordinary ones.
duschendestroyer t1_iyhj2sz wrote
sure, you want as many as you can get, but only one needs to be mounted high.
normie1990 OP t1_iyhja0s wrote
He meant that just one of the radiators needs to be higher
Viewing a single comment thread. View all comments