Comments
master3243 t1_irddrxw wrote
Is this an implementation of the model architecture/training or does it have a final/checkpoint model that I can use for generation right now?
GOGaway1 t1_irdgxts wrote
master3243 t1_irdi8o7 wrote
In the paper, Appendix A.4 for deriving the loss and gradients,
I don't see how this is true (eq 14) https://i.imgur.com/ZuN2RC2.png
As the RHS seems to equal (2 * alpha_t) * LHS
I'm also unsure how in the same equation this happens https://i.imgur.com/DHixElF.png
dkangx t1_irdmnsp wrote
Well, someone’s gonna fire it up and test it out and we will see if it’s real
master3243 t1_irdoq0o wrote
Empirical results don't necessarily prove theoretical results, in fact most Deeplearning research (mine included) is trying out different stuff based on intuition and past experiences on what worked until you have something that achieves really good results,
Then you attempt to formally and theoretically show why the thing you did is justified mathematically.
And often enough, once you start going through the formal math you get ideas on how to further improve or different paths to take on your model, and thus it's a back and forth.
However, someone could just as easily get good results with a certain architecture/loss and then fail to justify it formally or skip certain steps or take an invalid jump from one step to another, which results in theoretical work that is wrong but works great empirically.
thatpizzatho t1_irdrzbk wrote
This is amazing! But as a PhD student, I can't keep up anymore :/
DigThatData t1_irds2na wrote
it's a method that utilizes pre-trained models, you can use it right now
[deleted] t1_iredowd wrote
[removed]
sparkinflint t1_irepk4y wrote
Imagine only having a bachelor's in traditional engineering 😅
Longjumping_Kale1 t1_irg17le wrote
I hope traditional stands for software 😂
master3243 t1_irhlkct wrote
Pretty cool, I just tested with the prompt
"a DSLR photo of a teddy bear riding a skateboard"
Here's the result:
https://media.giphy.com/media/eTQ5gDgbkD0UymIQD6/giphy.gif
Reading the paper and understanding the basics of how it worked, I would have guessed that it would have a tendency to create a Neural Radiance Field where the front of the object is duplicated over many different camera angles, since updating the NeRF from a different angle the diffusion model will output an image that closely matches an already created angle from before.
I think imagen can prevent this simply because of it's sheer power such that even if given a noisy image of the backside of a teddy bear it can figure out that it truly is the backside and not just the front again. Not sure if that made sense, I did a terrible job articulating the point.
sparkinflint t1_irhpz2q wrote
...industrial 🫠
hardmaru OP t1_ircu02e wrote
The original paper from Google Brain was out less than a week ago, and discussed on r/MachineLearning:
https://old.reddit.com/r/MachineLearning/comments/xrny8s/r_dreamfusion_textto3d_using_2d_diffusion/
Within a week, someone made this working implementation in PyTorch, which uses Stable Diffusion in place of Imagen.
Saw more discussion about this project on Hacker News:
https://news.ycombinator.com/item?id=33109243