dojoteef t1_iyvxzsz wrote on December 4, 2022 at 4:27 PM

#858,080

See the author's explanation on OpenReview:

> We update the result tables in the camera-ready version. The revision is due to a different data version of query augmentation. Previously, the data is cooked by one of our co-authors while using a different train-test split to train the query generator, causing some data leakage issue. All experiments in the previous submission are based on this query augmentation version, so the performance is relatively higher. When preparing the camera-ready version, we review and reproduce the code end-to-end for official release. At that time, we realize the data leakage problem. So, we re-cook the query augmentation data and reproduce all the experiments again in the new table. After solving the data leakage problem, NCI still shows more than 15% improvement over the current best SOTA. We have released the complete open-source code at GitHub: > > https://github.com/solidsea98/Neural-Corpus-Indexer-NCI > > Welcome to follow and reproduce our work. Looking forward to further discussions and collaborations.

Even_Stay3387 OP t1_iyvyx5w wrote on December 4, 2022 at 4:33 PM

#858,103

Replying to dojoteef (#858,080)

are you sure this is allowed? data leakage... really funny. I can also post a very very good results to cheat the review and then say I am sorry there is data leakage here.

mil24havoc t1_iyw1c2s wrote on December 4, 2022 at 4:50 PM

#858,171

Replying to Even_Stay3387 (#858,103)

You should be a better scientist. Best papers shouldn't be awarded for performance, that would be bad science. Best papers are awarded for innovation and quality. Fixing your results is the responsible thing to do.

dojoteef t1_iyw254f wrote on December 4, 2022 at 4:56 PM

#858,192

Replying to Even_Stay3387 (#858,103)

Mistakes happen. In this case the authors report the issue publicly and should be commended for that.

The NeurIPS organizers can choose to address the issue in whatever way they deem appropriate, especially as the authors are not hiding the fact that their results were changed.

Of course you're free to assume it's malicious if you want (at least that seems to be the stance your taking, but if it's not then I might have misinterpreted your response).

dulipat t1_iyw3ycl wrote on December 4, 2022 at 5:08 PM

#858,259

Replying to Even_Stay3387 (#858,103)

Very good results with bad scientific method will be rejected anyway

lameheavy t1_iyw49t6 wrote on December 4, 2022 at 5:10 PM

#858,270

Good on the authors for admitting the error and correcting the results. I do wonder how many times this happens where authors don’t make a correction.

Comfortable_Use_5033 t1_iywcac8 wrote on December 4, 2022 at 6:04 PM

#858,610

This is actually what I looking for a good science practice. Especially when there are huge number of unreproducible papers with no code and no training config at all.

lemlo100 t1_iywgmvk wrote on December 4, 2022 at 6:32 PM

#858,748

Replying to lameheavy (#858,270)

I really don't wanna know. I think the problem is huge. Anyone who has worked in software engineering has the awareness that bugs always happen and that that makes unit testing crucial. I understand many machine learning researchers have not worked in software engineering so the awareness just isn't there.

Blasket_Basket t1_iywgzrl wrote on December 4, 2022 at 6:35 PM

#858,759

They fixed a data leakage issue. It would have been irresponsible to NOT update their results and fix the issue once they'd found it.

Seeing as you clearly created this account just to complain about this in 3 different subs, I'm guessing you're not gonna understand this point.

FirstOrderCat t1_iywiisx wrote on December 4, 2022 at 6:44 PM

#858,797

very interesting case.

Huge respect to everyone who is involved for still good results and transparent process!

AlmightySnoo t1_iywjf4u wrote on December 4, 2022 at 6:50 PM

#858,828

Mistakes like these can happen for a variety of reasons (bug, typo in the code, forgot to disable some flag that you were using for dirty and fast results during your trials, etc...) and it's actually a good thing they rectified the results.

Why do you always have to assume malicious intent and rush to Reddit with a throwaway account to shame the authors? smh

respeckKnuckles t1_iywkhtq wrote on December 4, 2022 at 6:56 PM

#858,847

Replying to AlmightySnoo (#858,828)

Where did OP assume malicious intent?

pyepyepie t1_iywkmz6 wrote on December 4, 2022 at 6:57 PM

#858,855

Replying to lemlo100 (#858,748)

I was a software engineer for a few years (I would probably say I am a little more skilled as a coder than in DS), and I still find it difficult to not mess up experiments if I don't recheck myself. Mostly, I just assume my results are garbage and try to attack them until I come to the conclusion that it's actually real. It's even more important when the task is not supervised (i.e., difficult to implement, MARL, GANs...), for example (RL) - you might think you developed a nice algorithm just to find out you accidentally modified the rewards.

pyepyepie t1_iywmmd9 wrote on December 4, 2022 at 7:10 PM

#858,905

Replying to Comfortable_Use_5033 (#858,610)

I agree. It's even worse when the cause of the improvement is different from that stated in the paper (you might as well call some papers "really? ADAM is better than SGD most of the time"), causing such a huge time-waste.

AlmightySnoo t1_iywna12 wrote on December 4, 2022 at 7:14 PM

#858,921

Replying to respeckKnuckles (#858,847)

here: https://www.reddit.com/r/MachineLearning/comments/zcdw0k/comment/iyvyx5w/?utm_source=reddit&utm_medium=web2x&context=3

lemlo100 t1_iywnr89 wrote on December 4, 2022 at 7:17 PM

#858,935

Replying to pyepyepie (#858,855)

Totally true. I also tend to believe my results are garbage and double- and triple-check. For my last project I implemented some tests in fact. It was a data augmentation approach for reinforcement learning so it was testable. My supervisor was not happy about is and considered it a waste of time. I also ran about 50 seeds after reading the Neurips best paper "On the edge of the statistical precipice" in my experiments as opposed to only five like my supervisor used to do. We were not able to work together and ended it early because he didn't want me junior interfering in him dashing out cooked results.

Edit: That same supervisor, by the way, had a paper published that contained a bug. Sampling was not quite implemented the way it was described in the paper. When I brought attention to this, since my project was based on this piece of code, instead of thanking me for spotting the bug he argued how in his opinion it shouldn't make a difference. That was shocking.

pyepyepie t1_iywowgo wrote on December 4, 2022 at 7:24 PM

#858,975

Replying to lemlo100 (#858,935)

Thank you sir for making SIGNIFICANT contributions, it takes a lot to go against your supervisor's opinions, but it seems like you did the moral thing.

maxToTheJ t1_iywu78n wrote on December 4, 2022 at 7:58 PM

#859,143

Replying to lameheavy (#858,270)

To be fair you can ask if they would have won Outstanding Paper with the less impressive gains obtained post correction

maxToTheJ t1_iywupll wrote on December 4, 2022 at 8:01 PM

#859,159

Replying to lemlo100 (#858,935)

> Totally true. I also tend to believe my results are garbage and double- and triple-check.

The market doesnt reward that though. We cant really say for sure that the paper being discussed would have won Outstanding Paper with the less impressive gains so at the end of the day not checking could inadvertantly help your career.

pyepyepie t1_iyx0k1s wrote on December 4, 2022 at 8:38 PM

#859,315

Replying to maxToTheJ (#859,159)

True. Who am I to say what is good and what's not, but I tend to enjoy simple papers with good ideas much more than papers that contain many moving parts (I am 100% unable to get that kind of result but I can enjoy it :) ).

I kind of treat complicated papers without robust code as noise or maybe a source of ideas, but when I try to implement it it's mostly not working as well as expected - e.g., I had to implement a model for a task related to speech and I have no expertise in the field, most of the models I tried to use were really bad in comparison to a good, simple solution (inspired by ResNet), and I found a model that performs better only due to preprocessing. It's hard to come up with new ideas so I am happy there is so much information, but sometimes it's too much.

deepbootygame t1_iyx27pv wrote on December 4, 2022 at 8:48 PM

#859,366

Academia is a joke

[deleted] t1_iyx5tcd wrote on December 4, 2022 at 9:10 PM

#859,486

[deleted]

master3243 t1_iyxbtij wrote on December 4, 2022 at 9:50 PM

#859,629

Replying to lemlo100 (#858,748)

As a person who mainly researchers AI but also worked in software engineering previously, I have never seen AI and unit testing together in the same room... sadly

master3243 t1_iyxcdtw wrote on December 4, 2022 at 9:54 PM

#859,641

Replying to deepbootygame (#859,366)

It's very easy to point and criticize but what exactly do you propose is done in this type of situation?

Ban the authors because they acknowledged and rectified their error? Good job you just guaranteed that no author will ever speak up about any mistakes they legitimately made.

Not to mention that their updated results are still a massive improvement.

needlzor t1_iyy4ujr wrote on December 5, 2022 at 1:20 AM

#860,627

Replying to deepbootygame (#859,366)

Academia is fine. It's benchmark-driven science, which seems to be most of ML research, that is a problem.

domestication_never t1_iyy7qc8 wrote on December 5, 2022 at 1:43 AM

#860,736

Replying to pyepyepie (#858,855)

I am a manager that works both with scientists and engineers.Every new scientist gets sent to "coding bootcamp" and doesn't come back till they learn unit testing a a minimum.

Every engineer gets sent to machine learning bootcamp and doesn't come back till they can explain WAPE, MAPE, overfitting etc.

I do this as much for quality software as to stop the damn fights. At least they have an appreciation for the finer points of the others profession.

coniine_ t1_iyzbzhl wrote on December 5, 2022 at 8:27 AM

#862,471

I notice in the Chinese reddit (aka. zhihu.com) someone raised a similar question (url:https://www.zhihu.com/question/570223822, you may read it with Google Translate), but most answerers (18/20, I think) hold a critical, or even satirical attitude, like this one:

>This paper is 100% naked falsification, the experimental numbers are filled in at will. The results written for the first time are much higher than other work, and the second time, it is reported lower.

I am not encouraging a more critical attitude to this work, and I just mention this phenomenon and hope to stimulate more discussions.

deepbootygame t1_iyzfqky wrote on December 5, 2022 at 9:25 AM

#862,565

Replying to master3243 (#859,641)

Start by penalizing people for cheating.

MathChief t1_iz0jzwb wrote on December 5, 2022 at 4:11 PM

#864,297

Replying to coniine_ (#862,471)

Native mandarin speaker here. I don't think the neural translation has captured much of the sentimental and sarcastic nuances of the statements on zhihu.com at all.

A rough translation of some serious accusations in Chinese (a 3rd person paraphrasing).

> 另外，作者放出代码是想证明什么？本篇文章最大的错误就是投稿版本和camera ready版本数据严重不符，极大地影响了审稿人的判断，即使你的代码可以复现出camera ready的数据，依然无法解释最关键的错误。作者还是不要做无谓的解释了，错误已经无法挽回，过多的借口只是越描越黑。主动向nips承认错误并撤稿是基本素质。

This poster said making the source codes public is like a futile attempt to make themselves look innocent. "Even if releasing the source codes can let other replicate the benchmarks, still, it cannot explain the key mistakes." This poster is pretty sure that the authors had cheated (without saying so). The bottom line is to withdraw from NIPS and acknowledging the cheating.

> 证据嘛，有时候会迟到，但迟早会来。还有人说我黑华人学者，他们这一套我实在太熟了。这种rebuttal里面报个更高的数字，欺骗一下审稿人，然后camera-ready不把这个数字加上去，这简直都是小儿科啦，也就是openreview会把这些内幕公之于众，而且这paper拿了个奖搞了个大新闻。更过分的造假比如串通一气、互相审稿那也屡见不鲜了。aaai完蛋一大主因就是先有几个水王当了ac，然后一人得道之后，后面的鸡犬也开始paper爆炸，然后这种ac越来越多，最后劣币驱逐良币。相比之下，训练数据里动点手脚，cherry-pick一下结果，那真的只算是小trick了。其实很多人抱着没什么意义的方向在那儿一次水个十几篇，一方面是因为舒适区内轻车熟路，另一方面不就是这个领域都是老熟人了吗…

This poster says that there are many Chinese scholars having ethical issues, like collusion rings, etc. The "人得道之后，后面的鸡犬也开始paper爆炸" part is referring a famous saying "一人得道，雞犬升天" from some ancient Chinese writing. "水王" can be understood as someone producing lots of templat'ish papers with no new scientific contributions. This phrase came from a saying among Chinese netizens "灌水" which means meaningless content filler like Lorem Ipsum. So when these "Lords of Lorem Ipsum" became AC, the "researchers" around them got lots of publications due to collusions.

Overall, the accusations on zhihu.com are career-ending serious. Unlike the "innocent until proven guilty" atmosphere here, zhihu'ers took the opposite stance, likely attributing to mainland Chinese culture.

[deleted] t1_iz8135o wrote on December 7, 2022 at 3:38 AM

#876,192

Replying to MathChief (#864,297)

[deleted]

42gauge t1_izye18i wrote on December 12, 2022 at 7:35 PM

#914,756

Replying to mil24havoc (#858,171)

> Best papers shouldn't be awarded for performance, that would be bad science. Best papers are awarded for innovation and quality

But exceptionally good performance, whether real or fake, is usually used as a predictor for innovation and quality. If the authors hadn't made this mistake, this paper would have obviously been of higher quality - and yet, do you really think it would have stood out even more without that error?

42gauge t1_izye37i wrote on December 12, 2022 at 7:36 PM

#914,759

Replying to dulipat (#858,259)

Did this paper have very good results with bad scientific method?

42gauge t1_izyeftq wrote on December 12, 2022 at 7:38 PM

#914,782

Replying to master3243 (#859,641)

NeurIPS redacts the award and gives it to another paper, and the authors work to explain the difference?

[D] NeurIPS 2022 Outstanding Paper modified results significantly in the camera ready

Comments

dojoteef t1_iyvxzsz wrote on December 4, 2022 at 4:27 PM

Even_Stay3387 OP t1_iyvyx5w wrote on December 4, 2022 at 4:33 PM

mil24havoc t1_iyw1c2s wrote on December 4, 2022 at 4:50 PM

dojoteef t1_iyw254f wrote on December 4, 2022 at 4:56 PM

dulipat t1_iyw3ycl wrote on December 4, 2022 at 5:08 PM

lameheavy t1_iyw49t6 wrote on December 4, 2022 at 5:10 PM

Comfortable_Use_5033 t1_iywcac8 wrote on December 4, 2022 at 6:04 PM

lemlo100 t1_iywgmvk wrote on December 4, 2022 at 6:32 PM

Blasket_Basket t1_iywgzrl wrote on December 4, 2022 at 6:35 PM

FirstOrderCat t1_iywiisx wrote on December 4, 2022 at 6:44 PM

AlmightySnoo t1_iywjf4u wrote on December 4, 2022 at 6:50 PM

respeckKnuckles t1_iywkhtq wrote on December 4, 2022 at 6:56 PM

pyepyepie t1_iywkmz6 wrote on December 4, 2022 at 6:57 PM

pyepyepie t1_iywmmd9 wrote on December 4, 2022 at 7:10 PM

AlmightySnoo t1_iywna12 wrote on December 4, 2022 at 7:14 PM

lemlo100 t1_iywnr89 wrote on December 4, 2022 at 7:17 PM

pyepyepie t1_iywowgo wrote on December 4, 2022 at 7:24 PM

maxToTheJ t1_iywu78n wrote on December 4, 2022 at 7:58 PM

maxToTheJ t1_iywupll wrote on December 4, 2022 at 8:01 PM

pyepyepie t1_iyx0k1s wrote on December 4, 2022 at 8:38 PM

deepbootygame t1_iyx27pv wrote on December 4, 2022 at 8:48 PM

[deleted] t1_iyx5tcd wrote on December 4, 2022 at 9:10 PM

master3243 t1_iyxbtij wrote on December 4, 2022 at 9:50 PM

master3243 t1_iyxcdtw wrote on December 4, 2022 at 9:54 PM

needlzor t1_iyy4ujr wrote on December 5, 2022 at 1:20 AM

domestication_never t1_iyy7qc8 wrote on December 5, 2022 at 1:43 AM

coniine_ t1_iyzbzhl wrote on December 5, 2022 at 8:27 AM

deepbootygame t1_iyzfqky wrote on December 5, 2022 at 9:25 AM

MathChief t1_iz0jzwb wrote on December 5, 2022 at 4:11 PM

[deleted] t1_iz8135o wrote on December 7, 2022 at 3:38 AM

42gauge t1_izye18i wrote on December 12, 2022 at 7:35 PM

42gauge t1_izye37i wrote on December 12, 2022 at 7:36 PM

42gauge t1_izyeftq wrote on December 12, 2022 at 7:38 PM