Submitted by Worth-Advance-1232 t3_10asgah in MachineLearning
chaosmosis t1_j47d0ev wrote
In addition to being more straightforward, applying the same total amount of compute to a single model doing end to end learning is often better for performance than splitting up compute between multiple models. As far as I'm aware, there aren't any systematic ways to tell when which method will be preferable, this is just a rule of thumb opinion.
jimmymvp t1_j4fcjly wrote
Hm, I'm not sure about that. There's the mixture of experts idea that does not exactly stacking, but rather specializes multiple models to parts of the data so each data point gets assigned to a specific shallow model. What you need then is an assignment rule, mostly done by a classifier and it's been shown that this is cheaper in terms of compute at evaluation time. I'm not sure if the idea is abandoned by now, but Google Brain published a paper on this and there were subsequent works.
chaosmosis t1_j4fpg6g wrote
I'd love the reference if you can find it.
jimmymvp t1_j4hy6xm wrote
chaosmosis t1_j4lnsfe wrote
Thanks!
Viewing a single comment thread. View all comments