Viewing a single comment thread. View all comments

newDeckardCain t1_it3ihks wrote

This is interesting something that stability.ai should do. A further interesting iteration of this would be to associate an image i.e. the current frame in the video to the token and maybe that prompts the model to also have a world model.

Like what Yan LeCun has been advocating for.

4

visarga t1_it4qoq7 wrote

After text, image and video (+ audio) I think we got all the bases covered. Nobody can claim AI is not grounded anymore. And with this grounding comes a nuanced, semantic understanding of the world. It's like an upload, but not of a person, the whole culture gets to be uploaded at once.

3