Submitted by Balance- t3_124eyso in MachineLearning
wazis t1_jdz4v8g wrote
If it is true (too lazy to check), it is not surprizing. If it is not than it is also not surprising
Seankala t1_jdz6kty wrote
Yeah I read through the whole thing and it's not surprising. Train-test contamination has been a problem for a while now.
hadaev t1_jdzcowi wrote
Well we usually expect it from not really ds peoples like biologists using ds methods and making such a trivial mistake.
It doesnt seems hard to search matches in text. Unlike other data types.
master3243 t1_jdzec5r wrote
Seeing how they made sure the bar exam and the math olympiad tests were recent ones that were explicitly stated to not be in the training dataset to avoid contamination, I trusted that all the other reported tests were also as carefully picked to avoid contamination.
MotionTwelveBeeSix t1_jdzurlg wrote
The bar exams recycle the same questions every year, there’s very little original about them. Its a test of pure memorization
jrkirby t1_jdzx1ef wrote
I'm guessing the hard part is that you can't "untrain" a model. They hadn't thought "I want to benchmark on these problems later" when they started. Then they spent 20K$+ compute on training. Then they wanted to test it. You can easily find the stuff you want to test on in your training dataset, sure. But you can't so easily remove it and train everything again from scratch.
Thorusss t1_je1z0ib wrote
>Then they spent 20K$+ compute on training.
Your estimate is a few magnitudes too low
AuspiciousApple t1_je2aij3 wrote
Idk, thousands of GPUs going brrrr for months, how much can it cost?
$10?
jrkirby t1_je2f63r wrote
2 million dollars or 20 million dollars is greater than 20 thousand. And it makes the main thesis more salient - the more money you've spent training, the less willing you'll be to retrain the entire model from scratch just to run some benchmarks the "proper" way.
wazis t1_jdzzs1q wrote
Well they can, but it is expensive
RossoMarra t1_je16mod wrote
I really think you are underestimating biologists.
[deleted] t1_jdzh4h0 wrote
[deleted]
is_it_fun t1_jdzs7dw wrote
Biologists are such trash nowadays when it comes to any kind of computational / math methods. Back in our grandfather's days they were really hardcore.
[deleted] t1_jdztsmd wrote
[removed]
ppff01 t1_jdzhmhs wrote
then*
marr75 t1_je14tki wrote
me irl
Historical-Tree9132 t1_je17bln wrote
0/24 0/12 on code problems it never seen before really surprised me
CollectionLeather292 t1_jdzzazr wrote
Tl:dr
Viewing a single comment thread. View all comments