Submitted by [deleted] t3_zrx665 in MachineLearning
-Rizhiy- t1_j1979ji wrote
Very difficult to give a definite answer without knowing more about your situation. Things to consider:
- How big is the model you are planning to train? Many large models are limited by VRAM size rather than compute. Which means you will need to use either 4090 or professional cards like A100. Professional cards cost way more and price trade-offs become way less beneficial than gamer cards.
- How many cards do you need? You can probably make a workstation with 4 GPUs, a server with 16 GPUs. More than that you are looking for multiserver setup.
- Where are you located? Electricity can be a large cost component. e.g. I'm in UK and electricity is at £0.34/kWh ($0.41), which is about $1000 to run a typical GPU for a year straight.
- Are you able to set up electricity supply where you are located. For a big server you are probably looking for a dedicated 5kW line.
- What other components do you need? Probably need a UPS and a fast processor at least, which adds to the cost. Multiple servers will probably also require a good switch.
- How are you going to cool it? Even a 4xGPU workstation will produce something like 1500W of heat and be rather loud. It might be fine during winter, but for summer you will probably need to install a dedicated AC.
- Are you up to building/maintaining all that infrastructure yourself? How much is your time worth?
- Finally, explore all options. AWS has spot and reserved instances, which can be much cheaper than on-demand. LambdaLabs offer cheaper GPUs than AWS. Other cloud providers might have start-up discounts/funds.
P.S. I work for AWS, so probably have a bias.
Viewing a single comment thread. View all comments