rlvsdlvsml

rlvsdlvsml t1_j3nd87h wrote

I have always felt like the network/security and integration with internal it systems was worse than the physical maintenance. Like people should expect that they have to invest time into integrating into a on-prem data center environment or physical maintenance stuff. I think small teams are benefited by a small gpu cluster with a fixed budget over large cloud gpu training costs. Mid-large companies do better with cloud than on-prem bc they can have better separation of environments but they cost more.

3

rlvsdlvsml t1_j15vzxs wrote

So if you used clossalai u could do locally with a reasonable budget 2-3k. Without that it depends on how much u are using it. If you use it a lot you can do a home setup for anywhere from 32-64k. The cheapest you can do an 8 gpu nvidia home server is around 12k and you need 4x to do 500 Gb vram. (32k new 16gb ram gpu ~1k each ) with used everything maybe you could get to 8k. ( this assumes you could do 16 servers with 4 gpu each all used ) https://alpa.ai/tutorials/opt_serving.html

TLDR largest opt is going to require some model pipelining to keep costs down

3

rlvsdlvsml t1_iri172p wrote

It’s misleading tho bc it’s the same time complexity as current approaches but optimized more for gpu kernel compilation. Arguably the end result is just using a known algorithm to find a more gpu compiler friendly approach since the solution space started with all the operations used in the current approaches.

6