Long_Two_6176 t1_iu5omuu wrote on October 28, 2022 at 7:05 PM

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.

the_hackelle t1_iu5qdub wrote on October 28, 2022 at 7:17 PM

Also because it's super user friendly and easy to implement, have a look at pytorch lightning. They make distributing and such very easy

sabeansauce OP t1_iu67f3z wrote on October 28, 2022 at 9:16 PM

woah that is a cool project.

sabeansauce OP t1_iu676zn wrote on October 28, 2022 at 9:14 PM

okay I could see how I was thinking about it kind of wrong. Thanks for the reply