banmeyoucoward t1_jdhg7kt wrote on March 24, 2023 at 12:52 PM

I'd bet that screen recordings + mouse clicks + keyboard inputs made their way into the training data too.

nmkd t1_jdhmgpm wrote on March 24, 2023 at 1:40 PM

Nope, it's multimodal in terms of understanding language and images. It wasn't trained on mouse movement because that's neither language nor imagery.

> use 2 images
> movement
> boom

Absolutely mental

[removed]