MassiveIndependence8 t1_jdl9oq9 wrote on March 25, 2023 at 5:41 AM

Reply to comment by ThirdMover in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-

You’re actually suggesting putting every single frame into gpt-4? It’ll cost you a fortune after 5 seconds of running it. Plus the latency is super high, it might takes you an hour to process a “5 seconds” worth of images.

ThirdMover t1_jdlabwm wrote on March 25, 2023 at 5:49 AM

What do you mean by "frame"? How many images do you think GPT-4 would need to get a cursor where it needs to go? I'd estimate four or five should be plenty.