plocco-tocco t1_jdjx7qz wrote on March 24, 2023 at 10:47 PM

Reply to comment by ThirdMover in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

The complexity of the input wouldn't change in this case since it's just a screen grab of the display. Just that you'd need to do inference at a certain frame rate to be able to detect the cursor, which isn't that cheap with GPT-4. Now, I'm not sure what the latency or cost would be, I'd need to get access to the API to answer it.