[R]Wq can be omited in single head attention Submitted by wangyi_fudan t3_y2w87i on October 13, 2022 at 11:27 AM in MachineLearning 7 comments 17