Viewing a single comment thread. View all comments

ChangingHats t1_j4r2hxx wrote

I am trying to utilize tensorflow's MultiHeadAttention to do regression on time series data for forecasting of a `(batch, horizon, features)` tensor.

During training, I have `inputs ~> (1, 10, 1)` and `targets ~> (1, 10, 1)`. `targets` is a horizon-shifted output of `inptus`.

During inference, `targets` is just a zeros tensor of the same shape.

What's the best way to run attention such that the output utilizes all timesteps in `inputs` as well as each subsequent timestep of the resulting attention output, instead of ONLY the timesteps of the inputs?

Another problem I see is that attention is run between Q and K, and during inference, Q = K, so that will affect the output differently, no?

1