Viewing a single comment thread. View all comments

Reasonable_Boss2750 t1_is97cn3 wrote on October 14, 2022 at 5:38 AM

Possible reason why the author uses attention with Wq and Wk is to fuse information in both encoder and decoder. In that case the formula is (XenWq)(XdeWk)t