YonatanBitton OP t1_iu02pl6 wrote on October 27, 2022 at 3:45 PM

Reply to comment by shahaff32 in [R] WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models by YonatanBitton

This is a great point, thank you. The interpretation of common sense tasks varies from person to person, and common sense reasoning involves some ambiguity. WinoGAViL, however, only uses instances which were solved well by three human solvers (over 80% Jaccard index). To validate our dataset, we took other players (who did not take part in the data generation task) and verified that it was solved with high human accuracy (90%).

shahaff32 t1_iu0mobx wrote on October 27, 2022 at 5:54 PM

Thank you for your answer, we will look into it :)