[D] Simple Questions Thread Submitted by AutoModerator t3_10cn8pw on January 15, 2023 at 4:00 PM in MachineLearning 103 comments 23
UnderstandingDry1256 t1_j5c0y0o wrote on January 21, 2023 at 10:23 PM What are the training strategies used for GPT models? Are transformer blocks or layers trained independently? Are they trained using some subset of data and fine tuned then? I would appreciate any references or details :) Permalink 2
Viewing a single comment thread. View all comments