Comments

You must log in or register to comment.

hardmaru OP t1_ircu02e wrote

The original paper from Google Brain was out less than a week ago, and discussed on r/MachineLearning:

  • DreamFusion: Text-to-3D using 2D Diffusion

https://old.reddit.com/r/MachineLearning/comments/xrny8s/r_dreamfusion_textto3d_using_2d_diffusion/

Within a week, someone made this working implementation in PyTorch, which uses Stable Diffusion in place of Imagen.

Saw more discussion about this project on Hacker News:

https://news.ycombinator.com/item?id=33109243

24

WashiBurr t1_ird47y0 wrote

Wow that was fast. Can't wait to play with it.

10

master3243 t1_irddrxw wrote

Is this an implementation of the model architecture/training or does it have a final/checkpoint model that I can use for generation right now?

3

master3243 t1_irdoq0o wrote

Empirical results don't necessarily prove theoretical results, in fact most Deeplearning research (mine included) is trying out different stuff based on intuition and past experiences on what worked until you have something that achieves really good results,

Then you attempt to formally and theoretically show why the thing you did is justified mathematically.

And often enough, once you start going through the formal math you get ideas on how to further improve or different paths to take on your model, and thus it's a back and forth.

However, someone could just as easily get good results with a certain architecture/loss and then fail to justify it formally or skip certain steps or take an invalid jump from one step to another, which results in theoretical work that is wrong but works great empirically.

17

thatpizzatho t1_irdrzbk wrote

This is amazing! But as a PhD student, I can't keep up anymore :/

58

master3243 t1_irhlkct wrote

Pretty cool, I just tested with the prompt

"a DSLR photo of a teddy bear riding a skateboard"

Here's the result:

https://media.giphy.com/media/eTQ5gDgbkD0UymIQD6/giphy.gif

Reading the paper and understanding the basics of how it worked, I would have guessed that it would have a tendency to create a Neural Radiance Field where the front of the object is duplicated over many different camera angles, since updating the NeRF from a different angle the diffusion model will output an image that closely matches an already created angle from before.

I think imagen can prevent this simply because of it's sheer power such that even if given a noisy image of the backside of a teddy bear it can figure out that it truly is the backside and not just the front again. Not sure if that made sense, I did a terrible job articulating the point.

3