Viewing a single comment thread. View all comments

randy-adderson t1_j900kon wrote

Question on transformer architecture:

If the task is simply to generate data given a context of data generated so far (such as in the case GPT-3), then can the architecture be simplified?

(The separation of the encoder and decoder layers seems arbitrary when they are processing the exact same data)

1