Submitted by twocupv60 t3_zbkvd2 in MachineLearning
twocupv60 OP t1_iysmgyv wrote
Reply to comment by Thakshu in [D] Ensemble Training Logistics and Mathematical Equivalences by twocupv60
The initial error is (y - y_hat)^2 where y_hat is mean(y1, ... yn). So the error is divided up among the y1...yn sequence based on how bad they contribute to the y. If models are trained separately, then thefull error of y is backproped. If the models are trained together, one model might have a lot of error which will influence the proportion assigned to the rest which I believe effectively lowers the learning rate. Is this what you mean by "loss values will be smoother."
Is there a mistake here?
Thakshu t1_iysr30r wrote
I think you are right here. But mathematical equivalence bothers me. Since they end up with dissimilar parameters , are they equivalent?.
Viewing a single comment thread. View all comments