Submitted by 51616 t3_yt6slt in MachineLearning
machinelearner77 t1_iw7omuy wrote
Reply to comment by vwings in Relative representations enable zero-shot latent space communication by 51616
That it works seems interesting, especially since I would have thought that it might depend too much on the hyper-parameter (anchors), which apparently it doesn't. But why shouldn't you be able to "backprop over this"? It's just cosine, everything is naturally differentiable
vwings t1_iw857q2 wrote
Yes, sure you can backprop, but what I meant is that you are able to train a network reasonably with this -- although in the backward pass the gradient gets diluted to all anchor samples. I thought you would at least need softmax attention (forward pass) to be able to route the gradients back reasonably.
Viewing a single comment thread. View all comments