hx-zero OP t1_j07d431 wrote on December 14, 2022 at 4:34 PM

Reply to comment by SleekEagle in [Project] Run and fine-tune BLOOM-176B at home using a peer-to-peer network by hx-zero

Training from scratch is slow because you need to synchronize all model weights/gradients on each step (though it's possible for somewhat smaller models with some optimizations).

In case of fine-tuning (especially prompt tuning), you train only a small percent of weights, so communication overhead is not that huge anymore. Still, this allows to adapt the LM to most downstream tasks.

SleekEagle t1_j083jkl wrote on December 14, 2022 at 7:22 PM

Got it, thanks for the explanation!