Submitted by twocupv60 t3_zbkvd2 in MachineLearning
I want to train an ensemble of 50 networks where each network is the same. The input is an image and the output is a scalar; simple binary classifier. Are the following mathematically equivalent:
- Train 50 models independently and average their results for the final ensemble model to use during inference. Logistically, i train 50 models.
- Create a super model composed of the 50 models where the top neuron is the average of all the individual model's output. Thus, I train all 50 models at once implicitly. Logistically, i train one model.
My initial thought is th at these are equivalent since I am taking the mean of the prediction probabilities so the backpropagation isnt aware of the other models. However, I could see the credit assignment of case 2 essentially changing the learning rate because instead of all the error going to a single model as in scheme 1, it is not distributed over all 50 models.
Thakshu t1_iysghrm wrote
If i understand correctly , the question is whether training N clasifiers independently and obtaining their mean result is mathematically equivalent to training N classiefiers together with mean output .
For me it appears as not mathematically equivalent .(Edited a wrong statement here)
The gradient for back prop per step is calculated based on mean output of all classifiers . So the loss values will be smoother than the first case , if the starting point is independently initialized.
Do I have a thinking mistake ?. I can't identify it yet.