your question is answered in the abstract itself ("using only pixels and game points as input"), and repeated multiple times in the text ("In our formulation, the agent’s policy π uses the same interface available to human players. It receives raw RGB pixel input x_t from the agent’s first-person perspective at timestep t, produces control actions a_t ∼ π simulating a gamepad, and receives game points ρt attained"). Did you even attempt to read the paper? The concrete architecture showing the CNN is also in Figure S10.
right my confusion is how it views the rgb pixel input, would you summarize it as it's looking at a screen, a whole image like a human player would, like the little ai is in it's own vr head set. or is it more just looking at numbers and finding a pattern
It looks at the screen. Your question indicate you're not well versed in AI. I'd advise you to read up more on fundamental deep learning techniques if you don't know what a CNN does.
The AI agent is a computer program. It does not have eyes or a physical body. Therefore, it only works with things that exist inside a computer, i.e. numbers.
BeatLeJuce t1_j6mlxjc wrote
your question is answered in the abstract itself ("using only pixels and game points as input"), and repeated multiple times in the text ("In our formulation, the agent’s policy π uses the same interface available to human players. It receives raw RGB pixel input x_t from the agent’s first-person perspective at timestep t, produces control actions a_t ∼ π simulating a gamepad, and receives game points ρt attained"). Did you even attempt to read the paper? The concrete architecture showing the CNN is also in Figure S10.