Single_Blueberry t1_jdhtc58 wrote
Reply to comment by BinarySplit in [D] I just realised: GPT-4 with image input can interpret any computer screen, any userinterface and any combination of them. by Balance-
What would keep us from just telling it the screen resolution and origin and asking for coordinates?
Or asking for coordinates in fractional image dimensions.
MassiveIndependence8 t1_jdl9s3u wrote
The problem is that it can’t do math and spatial reasoning that well
Single_Blueberry t1_jdnyc2d wrote
Hmm I don't know. It's pretty bad at getting dead-on accurate results, but in many cases the relative error of the result is pretty low.
Viewing a single comment thread. View all comments