Viewing a single comment thread. View all comments

patient_zer00 t1_iujl1if wrote on October 31, 2022 at 8:35 PM

Disc IO is often a bootleneck.

Also, even though using a GPU will increase training speed with LSTMs, too, the computation of the gradient relies on the whole sequence to be processed each sequence step after the other, which can't be parallelized. That's probably why your speed increase is not that big using a K80 vs a A100.

Edit: typos