Submitted by AutoModerator t3_10cn8pw in MachineLearning
ChangingHats t1_j4r2hxx wrote
I am trying to utilize tensorflow's MultiHeadAttention to do regression on time series data for forecasting of a `(batch, horizon, features)` tensor.
During training, I have `inputs ~> (1, 10, 1)` and `targets ~> (1, 10, 1)`. `targets` is a horizon-shifted output of `inptus`.
During inference, `targets` is just a zeros tensor of the same shape.
What's the best way to run attention such that the output utilizes all timesteps in `inputs` as well as each subsequent timestep of the resulting attention output, instead of ONLY the timesteps of the inputs?
Another problem I see is that attention is run between Q and K, and during inference, Q = K, so that will affect the output differently, no?
Viewing a single comment thread. View all comments