Submitted by AutoModerator t3_10cn8pw in MachineLearning
mildresponse t1_j4xjmvw wrote
Reply to comment by inquisitor49 in [D] Simple Questions Thread by AutoModerator
My interpretation is that the words should have different embedding values when they have different positions (context) in the input. Without a positional embedding, the learned word embeddings will be forced into some kind of positional average. The positional offsets give the model more flexibility to resolve differently in different contexts.
Because the embeddings are high dimensional vectors of floats, I'd guess the risk of degeneracy (i.e. that the embeddings could start to overlap with one another) is virtually 0.
Viewing a single comment thread. View all comments