artsybashev t1_j154fhy wrote on December 21, 2022 at 7:04 PM

Reply to comment by caedin8 in [D] Running large language models on a home PC? by Zondartul

it is just the inference. Training requires more like 100 x A100 and a cluster to train on. Just a million to get started.

AltruisticNight8314 t1_j1ohh7u wrote on December 26, 2022 at 2:38 AM

What hardware would be required to i) train or ii) fine-tune weights (i.e. run a few epochs on my own data) for medium-sized transformers (500M-15B parameters)?

I do research on proteomics and I have a very specific problem where perhaps even fine-tuning the weights of a trained transformer (such as ESM-2) might be great.

Of course, there's always the poor man's alternative of building a supervised model on the embeddings returned by the encoder.

artsybashev t1_j1ph7f3 wrote on December 26, 2022 at 9:40 AM

one A100 80GB will get you started with models 500M-15B. You can rent that for a $50 per day. See where that takes you in a week.

AltruisticNight8314 t1_j1soeji wrote on December 27, 2022 at 2:17 AM

Thanks!