BeatLeJuce t1_j6mlxjc wrote on January 31, 2023 at 12:11 PM

your question is answered in the abstract itself ("using only pixels and game points as input"), and repeated multiple times in the text ("In our formulation, the agent’s policy π uses the same interface available to human players. It receives raw RGB pixel input x_t from the agent’s first-person perspective at timestep t, produces control actions a_t ∼ π simulating a gamepad, and receives game points ρt attained"). Did you even attempt to read the paper? The concrete architecture showing the CNN is also in Figure S10.

pfm11231 t1_j6n3emy wrote on January 31, 2023 at 2:39 PM

right my confusion is how it views the rgb pixel input, would you summarize it as it's looking at a screen, a whole image like a human player would, like the little ai is in it's own vr head set. or is it more just looking at numbers and finding a pattern

cruddybanana1102 t1_j6n46op wrote on January 31, 2023 at 2:45 PM

I don't really unserstand the question What do you mean "looking at a screen"? Or "looking at numbers and finding a pattern"?

The model takes in multidimensional array as input. That array is all the rgb values at a given instant. Take that to mean whatever suits you.

BeatLeJuce t1_j6n6x9b wrote on January 31, 2023 at 3:04 PM

It looks at the screen. Your question indicate you're not well versed in AI. I'd advise you to read up more on fundamental deep learning techniques if you don't know what a CNN does.

bacon_boat t1_j6n82xv wrote on January 31, 2023 at 3:12 PM

Am I looking at your comment right now, or is it just some number of voltages over the neurons of my visual cortex?

shawdys t1_j6n6r2j wrote on January 31, 2023 at 3:03 PM

The AI agent is a computer program. It does not have eyes or a physical body. Therefore, it only works with things that exist inside a computer, i.e. numbers.