_eminorhan_

_eminorhan_ t1_j7zwlu9 wrote

People should be more skeptical of "emergent abilities" in big models: 1) Papers claiming such abilities generally use undertrained small models as per chinchilla scaling (compute is not controlled + suboptimal hyperparam choices for small models) and 2) these papers generally use a semilogx plot to demonstrate "emergence" but even a linear relationship will look exponential in such a plot. I'm not sure if I'd want to call a simple linear relationship "emergent".

2