36% of HellaSwag benchmark contains errors [D] Submitted by BB4evaTB12 t3_zff5mh on December 7, 2022 at 9:51 PM in MachineLearning 6 comments 33
Jean-Porte t1_ize03mv wrote on December 8, 2022 at 12:27 PM A good thing with bigbench is that google performed nice human evaluations, and they report the results of the best humans as well as the average accuracy Permalink 2
Viewing a single comment thread. View all comments