YonatanBitton OP t1_iu02pl6 wrote
Reply to comment by shahaff32 in [R] WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models by YonatanBitton
This is a great point, thank you. The interpretation of common sense tasks varies from person to person, and common sense reasoning involves some ambiguity. WinoGAViL, however, only uses instances which were solved well by three human solvers (over 80% Jaccard index). To validate our dataset, we took other players (who did not take part in the data generation task) and verified that it was solved with high human accuracy (90%).
shahaff32 t1_iu0mobx wrote
Thank you for your answer, we will look into it :)
Viewing a single comment thread. View all comments