master3243 t1_irhlkct wrote on October 8, 2022 at 5:37 AM

Pretty cool, I just tested with the prompt

"a DSLR photo of a teddy bear riding a skateboard"

Here's the result:

https://media.giphy.com/media/eTQ5gDgbkD0UymIQD6/giphy.gif

Reading the paper and understanding the basics of how it worked, I would have guessed that it would have a tendency to create a Neural Radiance Field where the front of the object is duplicated over many different camera angles, since updating the NeRF from a different angle the diffusion model will output an image that closely matches an already created angle from before.

I think imagen can prevent this simply because of it's sheer power such that even if given a noisy image of the backside of a teddy bear it can figure out that it truly is the backside and not just the front again. Not sure if that made sense, I did a terrible job articulating the point.