Viewing a single comment thread. View all comments

---AI--- t1_iwtd2xx wrote

But they are useful. Look at the thousands of real world uses. Look at grammerly, translation, protein folding, and so on. How can you possibly deny it?!

> not fundamentally better

In just the last two years, the models went from scoring 43 on this system of testing to 75. How much more of a fundamental improvement are you after?!

1

dat_cosmo_cat t1_iwteguv wrote

You and I are literally saying the same things. These models have been in prod on every major software platform since BERT.

We don't even need to look at offline eval metrics anymore. If you're an actual MLE / data scientist you likely have the pipelines set up which directly measure the engagement / attributable sales differences and report the real business impact across millions of users each time a new model is released.

I work on a team that has made millions of dollars building applications on top of LLMs since 2018, so when I see the claim "LLMs finally got good this year" it's hard not to laugh. --this is what I am getting at.

Edit*: did you read the article?

5