smsorin t1_iyza1zg wrote
If you are inference constrained, it might be better. Since a good chunk of the model is shared you need less compute and perhaps even less time, if you can't paralellize sufficiently. The other comments here have other good arguments.
Viewing a single comment thread. View all comments