Submitted by hx-zero t3_zl03b0 in MachineLearning
hx-zero OP t1_j07d431 wrote
Reply to comment by SleekEagle in [Project] Run and fine-tune BLOOM-176B at home using a peer-to-peer network by hx-zero
Training from scratch is slow because you need to synchronize all model weights/gradients on each step (though it's possible for somewhat smaller models with some optimizations).
In case of fine-tuning (especially prompt tuning), you train only a small percent of weights, so communication overhead is not that huge anymore. Still, this allows to adapt the LM to most downstream tasks.
SleekEagle t1_j083jkl wrote
Got it, thanks for the explanation!
Viewing a single comment thread. View all comments