Viewing a single comment thread. View all comments

Mental-Swordfish7129 t1_j3q1hej wrote

Reply to comment by _xenoschema in [N] What's next for AI? by vsmolyakov

It's a model that "chooses" its input stream from a 2d array of sensor data (cam, mics, and servo encoders) in real time using policies decoded from predictions of the bottom layer. Then, it processes this input up the hierarchy of identical layers. Higher layer predictions are used to modulate attention.

It may qualify as a general intelligence (idk) as any data can be encoded into the format of its input stream. What I mean is that I have a particular way of encoding video, audio, anything really, into a universal format which preserves the salient semantics.

Currently, it is greatly inhibited in what it can learn because I cannot feed it experiences at the rate it could take them. It has far more potential than realized knowledge.

1

jimmymvp t1_j3q5wmj wrote

Sry, what's the "active" part here? Is the model actually generative? I'm aware of Karl Friston and the free-energy principle. Is the active part the input stream selection? I thought that the active part refers to learning, in a sense that I get to pick my training data along the way. Sounds like what you're doing is akin to Gato from DeepMind with tokenization and is about multi-modal policies (modulo the hierarchical processing and attention).

Is there a math writeup somewhere?

1

Mental-Swordfish7129 t1_j3q731g wrote

Also, I do mean "active" in the ways you describe. The bottom layer actively controls the sensors via servos and a voice coil. The other layers actively modulate their input by masking it (ignoring it non-trivially).

2

Mental-Swordfish7129 t1_j3q6p7m wrote

The model is generative. Each layer generates predictions about the patterns of the layers below. The bottom layer generates predictions about the sensory data, some of which is proprioception data.

I have never published anything. I do not have that much time and it would largely be redundant. You can look at Friston, et.al. for the math. I use nearly the same math and logic.

What I'm doing bears only a superficial similarity to Gato in my opinion, but I can't say I've looked into it deeply. I've been far too busy with life. I only have my tiny spare time for this project unfortunately.

1

jimmymvp t1_j3q74ms wrote

So the active part is the self-predictive part?

1

Mental-Swordfish7129 t1_j3q81e2 wrote

Active just means that it directly modifies its input stream. And, yes, it is also predicting what that input will be, so it is reasonable to say that it is, in part, self-predictive.

Crucially, its input stream also includes features that are not itself or have not been changed by itself. The proprioceptive signals help it learn which is which.

1