Viewing a single comment thread. View all comments

throwaway2676 t1_j3h780s wrote

Reply to comment by trnka in [D] Simple Questions Thread by AutoModerator

> In a fully-connected layer, the input to the matrix multiply is the output of everything in the previous layer, not just the output of a single unit.

But if the previous layer is 0 everywhere except for one unit, the result is the same, no?

My mental picture is that input layer 0 has V = <token vocabulary size> neurons, and layer 1 has E_d = <embedding dimension> neurons. Layer 0 is 1 in 1 neuron, 0 everywhere else, as one-hot encoding normally goes. The embedding layer 1 is then given by x@W, where x is the layer 0 as a row vector, and W is the weight matrix with dimensions V x E_d. The matrix multiplication then "picks out" the desired row. That would be a fully connected linear layer with no bias.

1

trnka t1_j3i3vk4 wrote

If your input is only ever a single word, that's right.

Usually people work with texts, or sequences of words. The embedding layer maps the sequence of words to a sequence of embedding vectors. It could be implemented as a sequence of one-hot encodings multiplied by the same W though.

2