Comments
TFenrir OP t1_j0dk5rr wrote
I mostly agree, but I think there is some opportunity here. Using img2img in real time to extend audio forever, and the relationship between images and audio in general are quite interesting - would a model that is only trained on these images provide a "better" result? Would different fine tuned models give different experiences? How is this impacted by other improvements to models?
Umbristopheles t1_j0dpnht wrote
It sings like The Sims....
aperrien t1_j0dsyu9 wrote
I can't believe that that running the Fourier sound transformations through Stable Diffusion and transforming them back into sound actually works. At this point, I really am calling into question what the SD model is actually capturing. Creativity? Pattern Consistency? This technology may have legs far beyond what I initially assumed.
xoexohexox t1_j0dv1fc wrote
It's just a table of weighted averages my dude
blueSGL t1_j0e5p87 wrote
I bet if you do a log plot it just destroys the bass.
Edit: Thinking on, this is one dimensional with a second dimension of time, you could slice the audio into three frequency bands and use RGB encoding to 3x the frequency range fidelity without having to change the context window size.
[deleted] t1_j0ej7y9 wrote
[deleted]
blueSGL t1_j0elvgk wrote
I've not got the hardware needed for fine tuning stable diffusion (or even dreambooth) so I can't test it.
I've only got 10gig of VRAM not the 16 minimum needed.
Sigura83 t1_j0h6f0k wrote
"Trance inspired by rain falling". OMG NON STOP MELODIES.
I'm living in the Future!!!
Oh yeah, a shit ton of stuff is spectrographic data. Things like Molecules for instance. This could be used for drug generation, I think... uh... damn my lack of skills...
visarga t1_j0mzjox wrote
Let me tell you one weird trick all artists hate. It's actually averages of gradients collected from training examples, not averages of the training examples themselves. Gradients represent what has been learned from each example, and can be added together regardless of the content of the examples without becoming all jumbled up.
For instance, one can add the gradient derived from an image of a duck to that derived from an image of a horse. This is only possible in the space of gradients, as opposed to the space of images. If it weren't for this trick we would not be discussing art in this sub.
But are gradients derived from an image subject to copyright restrictions, even when all mixed up over billions of examples? All individual influences are almost "averaged out" by the large numbers of examples. That's how SD breaks training examples into first principles and then can generate an astronaut on a horse even though it has never seen that - only possible if you go back to all the way to basic concepts.
Kinexity t1_j0de3x5 wrote
This sounds quite analogical to running Doom on a Samsung smart fridge or running a Turing Machine in Power Point. It's not useful but definitely pretty cool.