[D] are two linear layers better than one? Submitted by alex_lite_21 t3_10kjhhb on January 24, 2023 at 11:17 PM in MachineLearning 10 comments 0
suflaj t1_j5r4u61 wrote on January 24, 2023 at 11:51 PM Dropout is not strictly a linear function (it can be randomly), and the chances are that it will add non-linearity for p>0, so yeah, that probably made the difference. Permalink 2
Viewing a single comment thread. View all comments