Viewing a single comment thread. View all comments

smsorin t1_iyza1zg wrote

If you are inference constrained, it might be better. Since a good chunk of the model is shared you need less compute and perhaps even less time, if you can't paralellize sufficiently. The other comments here have other good arguments.

1