suflaj t1_is363fo wrote on October 12, 2022 at 11:30 PM

Reply to comment by farmingvillein in [D] Wide Attention Is The Way Forward For Transformers by SuchOccasion457

I didn't mean that it is useless. I find it funny that someone would actually say that instead of "they perform roughly the same". Especially since they do not show that is a statistically significant difference, we have seen your average BERT get much more performance by just rerolling on a different seed.

farmingvillein t1_is36n5p wrote on October 12, 2022 at 11:34 PM

Sorry, didn't mean to imply that you were saying that it was useless--that is in response to my own criticism of the paper's title (versus the paper itself).

> I find it funny that someone would actually say that instead of "they perform roughly the same"

Yeah...for better or worse, though, if you say something performs "on parity", people assume (because it is frequently true...) that what you really mean is "-0.1% but that totally isn't a big deal".

I don't fault them for highlighting the 0.3% as a light pushback on the above, but I do blame 1) OP in their post highlighting this point (which, to your point, is at best misleading about the key claims of the paper) and 2) the authors for picking the ludicrous title.