Submitted by alexnasla t3_yikumt in MachineLearning
patient_zer00 t1_iujl1if wrote
Disc IO is often a bootleneck.
Also, even though using a GPU will increase training speed with LSTMs, too, the computation of the gradient relies on the whole sequence to be processed each sequence step after the other, which can't be parallelized. That's probably why your speed increase is not that big using a K80 vs a A100.
Edit: typos
Viewing a single comment thread. View all comments