Submitted by BB4evaTB12 t3_zff5mh in MachineLearning
BB4evaTB12 OP t1_izgu9jj wrote
Reply to comment by Different_Fig4002 in 36% of HellaSwag benchmark contains errors [D] by BB4evaTB12
Totally! We may be thinking of the same example from the GoEmotions dataset, where they mislabeled "Yay, cold McDonald's. My favorite." as Love.
Viewing a single comment thread. View all comments