Viewing a single comment thread. View all comments

Long_Two_6176 t1_iu5omuu wrote

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.

2

the_hackelle t1_iu5qdub wrote

Also because it's super user friendly and easy to implement, have a look at pytorch lightning. They make distributing and such very easy

2

sabeansauce OP t1_iu676zn wrote

okay I could see how I was thinking about it kind of wrong. Thanks for the reply

1