Thakshu
Thakshu t1_iysghrm wrote
If i understand correctly , the question is whether training N clasifiers independently and obtaining their mean result is mathematically equivalent to training N classiefiers together with mean output .
For me it appears as not mathematically equivalent .(Edited a wrong statement here)
The gradient for back prop per step is calculated based on mean output of all classifiers . So the loss values will be smoother than the first case , if the starting point is independently initialized.
Do I have a thinking mistake ?. I can't identify it yet.
Thakshu t1_iysr30r wrote
Reply to comment by twocupv60 in [D] Ensemble Training Logistics and Mathematical Equivalences by twocupv60
I think you are right here. But mathematical equivalence bothers me. Since they end up with dissimilar parameters , are they equivalent?.