Submitted by AutoModerator t3_110j0cp in MachineLearning
randy-adderson t1_j900kon wrote
Question on transformer architecture:
If the task is simply to generate data given a context of data generated so far (such as in the case GPT-3), then can the architecture be simplified?
(The separation of the encoder and decoder layers seems arbitrary when they are processing the exact same data)
Viewing a single comment thread. View all comments