Viewing a single comment thread. View all comments

wazis t1_jdz4v8g wrote

If it is true (too lazy to check), it is not surprizing. If it is not than it is also not surprising

109

Seankala t1_jdz6kty wrote

Yeah I read through the whole thing and it's not surprising. Train-test contamination has been a problem for a while now.

67

hadaev t1_jdzcowi wrote

Well we usually expect it from not really ds peoples like biologists using ds methods and making such a trivial mistake.

It doesnt seems hard to search matches in text. Unlike other data types.

13

master3243 t1_jdzec5r wrote

Seeing how they made sure the bar exam and the math olympiad tests were recent ones that were explicitly stated to not be in the training dataset to avoid contamination, I trusted that all the other reported tests were also as carefully picked to avoid contamination.

14

MotionTwelveBeeSix t1_jdzurlg wrote

The bar exams recycle the same questions every year, there’s very little original about them. Its a test of pure memorization

26

jrkirby t1_jdzx1ef wrote

I'm guessing the hard part is that you can't "untrain" a model. They hadn't thought "I want to benchmark on these problems later" when they started. Then they spent 20K$+ compute on training. Then they wanted to test it. You can easily find the stuff you want to test on in your training dataset, sure. But you can't so easily remove it and train everything again from scratch.

7

Thorusss t1_je1z0ib wrote

>Then they spent 20K$+ compute on training.

Your estimate is a few magnitudes too low

9

AuspiciousApple t1_je2aij3 wrote

Idk, thousands of GPUs going brrrr for months, how much can it cost?

$10?

2

jrkirby t1_je2f63r wrote

2 million dollars or 20 million dollars is greater than 20 thousand. And it makes the main thesis more salient - the more money you've spent training, the less willing you'll be to retrain the entire model from scratch just to run some benchmarks the "proper" way.

1

wazis t1_jdzzs1q wrote

Well they can, but it is expensive

3

RossoMarra t1_je16mod wrote

I really think you are underestimating biologists.

2

is_it_fun t1_jdzs7dw wrote

Biologists are such trash nowadays when it comes to any kind of computational / math methods. Back in our grandfather's days they were really hardcore.

−8