Submitted by IdeaEnough443 t3_zg6s6d in MachineLearning
I have a big data set that I would like to train on, so my thought is that I am going to do distributed training , but I am currently setting up MultiWorkerMirroredStrategy on tensorflow and i find it hard to use even with https://github.com/tensorflow/ecosystem/tree/master/spark/spark-tensorflow-distributor
https://www.tensorflow.org/api_docs/python/tf/distribute/experimental/MultiWorkerMirroredStrategy
So I was wondering if there are other recommended way of doing NN training if you have big dataset?
f_max t1_izhg9j7 wrote
If you have more than 1 gpu and your model is small enough to fit on 1 gpu, distributed data parallel is the go to. Basically multiple model instances training, with gradients synchronized at end of each batch. PyTorch has it integrated. And probably so does TF.