Viewing a single comment thread. View all comments

DaLameLama t1_iyc7nha wrote

I don't think that's true. It would imply that Bi-LSTMs reach good performance faster than Transformers, and Transformers catch up later during training.

I've never seen proof for that, nor do my personal experiences confirm this.

3