Jump to main content Jump to sidebar

Forums
Wiki

Log in
Sign up

/f/MachineLearning

[R]Wq can be omited in single head attention

Submitted by wangyi_fudan t3_y2w87i on October 13, 2022 at 11:27 AM in MachineLearning

7 comments

17

Viewing a single comment thread. View all comments

mrfox321 t1_is7pudf wrote on October 13, 2022 at 10:16 PM

Sure, but using W_q allows for low-rank representations of

W := W_k @ W_q^T

Permalink

5

0 points (+0, −0)

Short URL:

http://3.13.36.195:9999/13973

MachineLearning

t5_2r3gv

Created October 1, 2022
Subscribe via RSS

Toolbox

Bans
Moderation log

Running Postmill