Atom_101

Atom_101 OP t1_j3b2vmr wrote

From what I read it doesn't package cuda, only python dependencies. Were you able to get cuda inside your pyinstaller executable? Also is it even possible to package cuda inside an executable? Cuda needs to go into a specific folder right? And it needs to be added to the path variable for pytorch or other libraries to see it. For example in Linux it goes inside /usr/local/cuda

2

Atom_101 OP t1_j3b291o wrote

It seems you can't lock a container. If the end user has root access they will be able to ssh into the container and see your source code. The solution seems to be to obfuscate your code using something like pyarmor, so that even if the user accesses the docker image, they won't easily figure out your source code.

2

Atom_101 OP t1_j384ogh wrote

I haven't used onnx before but have worked with torchscript. With torchscript I have had to change the models quite a bit with to make it scriptable. If onnx requires similar amount of effort I don't think it will be useful.

I don't want to go through the hassle of scripting because we might change the model architectures soon. I need a quick and possibly inefficient (space wise, not perf wise) way to package the models without exposing source code.

2

Atom_101 t1_it1fzij wrote

Where can I find Cohere's publications? Basically a way to get an idea of the things you work on.

>In fact, our one criterion for selection is that you cannot have published a machine learning research paper previously.

This is from the blog. Is this criterion strictly enforced. What if someone has publications but no first author publications? What if they have published in lower tier conferences? Why exclude such people, since many of them will be interested in the program.

6

Atom_101 t1_is7ldte wrote

I think VAEs are weak not because of scaling issues but , because of an overly strong bias that the latent manifold has to be a Gaussian distribution with a diagonal covariance matrix. This problem is reduced using things like variational quantization. Dalle-1 actually used this, before DMs came to be. But even then, I believe they are too underpowered. Another technique of image generation is normalising flows which also require heavy restrictions on model architecture. GANs and DMs are much more unrestricted and can model arbitrary data distributions.

Can you point to an example where you see GANs perform visibly worse? Although we can't really compare quality between sota GANs and sota DMs. The difference in scale is just too huge. There was a tweet thread recently, regarding Google imagen iirc, which showed that increasing model size drastically improves image quality for text-to-image DMs. Going from 1B to 10B params showed visible improvements. But if you compare photorealistic faces generated by stable diffusion and say stylegan3, I am not sure you would be able to see differences.

2

Atom_101 t1_is61h1l wrote

I doubt it's anywhere close to diffusion models though. Haven't worked with ttur and feature matching. But have tried spectral norm and wgan+gp. They can be unstable in weird ways. In fact, while wasserstein loss is definitely more stable, it massively slows down convergence compared to standard dcgan loss.

The biggan paper by Google tried to scale up GANs by throwing every known stabilization trick at them. They observed that even with these tricks you can't train beyond a point. BigGANs start degrading when trained too much. Granted it came out in 2018, but if this didn't hold true today we would have 100B parameter GANs already. I think the main advantage with DMs is that you can keep training them for an eternity without worrying about performance degradation.

3

Atom_101 t1_is4b95t wrote

Diffusion is inherently slower than GANs. It takes N forward passes vs only 1 for GANs. You can use tricks to make it faster, like latent diffusion which does N forward passes with a small part of the model and 1 forward pass with the rest. But as a method diffusion is slower.

34