Viewing a single comment thread. View all comments

ResourceResearch t1_iro8zof wrote

Afaik it is not clear. In my personal experience, the number of parameters is more important, rather then the layer size, i.e. a smaller number of wider layers does the same job as a large number of narrower layers.

Consider this paper for empirical insights for large models: https://arxiv.org/pdf/2001.08361.pdf

1