Comments

You must log in or register to comment.

rhofour t1_j153vzt wrote

GPT is not a public model. You can't train or run it yourself.

I just checked and saw OpenAI does have a fine tuning API so you can fine tune and use the model through their API, but your hardware doesn't matter.

You can look at open source reproductions of GPT like OPT, but it will be very expensive to get the hardware to run the model, let alone train it. If you really want to use one of these huge models yourself (and not through an API) I'd advise starting with AWS before you consider buying any hardware.

5

No-Trifle2470 t1_j154gqg wrote

Yeah I mean OPT. We are already using aws for some tests but it is expensive as we have a lot of experiments. My idea is to use customs pc to train models and then use them on aws for inference as it is a b2c saas.

1

rlvsdlvsml t1_j15vzxs wrote

So if you used clossalai u could do locally with a reasonable budget 2-3k. Without that it depends on how much u are using it. If you use it a lot you can do a home setup for anywhere from 32-64k. The cheapest you can do an 8 gpu nvidia home server is around 12k and you need 4x to do 500 Gb vram. (32k new 16gb ram gpu ~1k each ) with used everything maybe you could get to 8k. ( this assumes you could do 16 servers with 4 gpu each all used ) https://alpa.ai/tutorials/opt_serving.html

TLDR largest opt is going to require some model pipelining to keep costs down

3

-Rizhiy- t1_j1979ji wrote

Very difficult to give a definite answer without knowing more about your situation. Things to consider:

  • How big is the model you are planning to train? Many large models are limited by VRAM size rather than compute. Which means you will need to use either 4090 or professional cards like A100. Professional cards cost way more and price trade-offs become way less beneficial than gamer cards.
  • How many cards do you need? You can probably make a workstation with 4 GPUs, a server with 16 GPUs. More than that you are looking for multiserver setup.
  • Where are you located? Electricity can be a large cost component. e.g. I'm in UK and electricity is at £0.34/kWh ($0.41), which is about $1000 to run a typical GPU for a year straight.
  • Are you able to set up electricity supply where you are located. For a big server you are probably looking for a dedicated 5kW line.
  • What other components do you need? Probably need a UPS and a fast processor at least, which adds to the cost. Multiple servers will probably also require a good switch.
  • How are you going to cool it? Even a 4xGPU workstation will produce something like 1500W of heat and be rather loud. It might be fine during winter, but for summer you will probably need to install a dedicated AC.
  • Are you up to building/maintaining all that infrastructure yourself? How much is your time worth?
  • Finally, explore all options. AWS has spot and reserved instances, which can be much cheaper than on-demand. LambdaLabs offer cheaper GPUs than AWS. Other cloud providers might have start-up discounts/funds.

P.S. I work for AWS, so probably have a bias.

2

icedrift t1_j15lr43 wrote

Any model north of like 2bil parameters isn't worth the hassle of running locally. Even Ada sized models require a ton of vram and ram. Use a cloud compute service like Azure or Paperspace

1

LetterRip t1_j164stx wrote

If slow training is acceptable you can use DeepSpeed with the weights mapped to the NVME drive (DeepSpeed ZeRo Infinity). It will take significantly longer to fine tune, but dramatically lowers the hardware investment.

1

softclone t1_j16f79b wrote

Depends. https://vast.ai/ is great but has certain limitations. If you can run on 1-4 24GB 3090 RTX cards that's going to be the best value rolling your own. 4090s of course good too but you'd need to find a good deal to make it worth it vs the 3090s. You can always start with 1 and go from there. Otherwise you'll be paying 10X more for some A100s. First step is to get a real good handle on how much compute you're actually using and what the smallest gpu/vram size works efficiently for your data.

1