newDeckardCain t1_it3ihks wrote on October 20, 2022 at 5:45 PM

This is interesting something that stability.ai should do. A further interesting iteration of this would be to associate an image i.e. the current frame in the video to the token and maybe that prompts the model to also have a world model.

Like what Yan LeCun has been advocating for.

visarga t1_it4qoq7 wrote on October 20, 2022 at 10:38 PM

After text, image and video (+ audio) I think we got all the bases covered. Nobody can claim AI is not grounded anymore. And with this grounding comes a nuanced, semantic understanding of the world. It's like an upload, but not of a person, the whole culture gets to be uploaded at once.