Blutorangensaft OP t1_jd7jaor wrote on March 22, 2023 at 12:20 PM

Reply to comment by YouAgainShmidhoobuh in [D]: Vanishing Gradients and Resnets by Blutorangensaft

Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.

Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.