
Thakshu t1_iysghrm wrote

If i understand correctly , the question is whether training N clasifiers independently and obtaining their mean result is mathematically equivalent to training N classiefiers together with mean output .

For me it appears as not mathematically equivalent .(Edited a wrong statement here)

The gradient for back prop per step is calculated based on mean output of all classifiers . So the loss values will be smoother than the first case , if the starting point is independently initialized.

Do I have a thinking mistake ?. I can't identify it yet.