Submitted by YonatanBitton t3_yeppof in MachineLearning
Our paper "WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models" was accepted to NeurIPS 2022, Datasets and Benchmark.
Paper: http://arxiv.org/abs/2207.12576Website: http://winogavil.github.ioHuggingface: https://huggingface.co/datasets/nlphuji/winogavilColab: https://colab.research.google.com/drive/19qcPovniLj2PiLlP75oFgsK-uhTr6SSi
Which images best fit the cue werewolf? Did you know V&L AI models only get ~50% on our challenging WinoGAViL association task but humans get 90%?
Introducing WinoGAViL, an online game you can play now against AI! WinoGAViL is a dynamic benchmark for evaluating V&L models. Inspired by the popular card game Codenames, spymasters give a textual cue related to several visual candidates, and another player identifies them.
We analyze the skills required to solve the WinoGAViL dataset, observing challenging visual and non-visual patterns, such as attribution, general knowledge, word sense making, humor, analogy, visual similarity, abstraction, and more.
We use the game to collect 3.5K instances, finding that they are intuitive for humans (>90% Jaccard index) but challenging for state-of-the-art AI models, where the best model (ViLT) achieves a score of 52%, succeeding mostly where the cue is visually salient.
Our game is live, you are welcome to play it now (should be fun 😊). Explore (https://winogavil.github.io/explore) random samples created by humans in our benchmark, or try to create new associations that challenge the AI (https://winogavil.github.io/challenge-the-ai).
​
CatalyzeX_code_bot t1_itz7729 wrote
Found relevant code at https://winogavil.github.io/ + all code implementations here
--
To opt out from receiving code links, DM me