Submitted by Blutorangensaft t3_11wmpoj in MachineLearning
Blutorangensaft OP t1_jd7jaor wrote
Reply to comment by YouAgainShmidhoobuh in [D]: Vanishing Gradients and Resnets by Blutorangensaft
Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.
Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.
Viewing a single comment thread. View all comments