Submitted by hardmaru t3_yxf875 in MachineLearning
---AI--- t1_iwtd2xx wrote
Reply to comment by dat_cosmo_cat in [R] The Near Future of AI is Action-Driven by hardmaru
But they are useful. Look at the thousands of real world uses. Look at grammerly, translation, protein folding, and so on. How can you possibly deny it?!
> not fundamentally better
In just the last two years, the models went from scoring 43 on this system of testing to 75. How much more of a fundamental improvement are you after?!
dat_cosmo_cat t1_iwteguv wrote
You and I are literally saying the same things. These models have been in prod on every major software platform since BERT.
We don't even need to look at offline eval metrics anymore. If you're an actual MLE / data scientist you likely have the pipelines set up which directly measure the engagement / attributable sales differences and report the real business impact across millions of users each time a new model is released.
I work on a team that has made millions of dollars building applications on top of LLMs since 2018, so when I see the claim "LLMs finally got good this year" it's hard not to laugh. --this is what I am getting at.
Edit*: did you read the article?
Viewing a single comment thread. View all comments