Viewing a single comment thread. View all comments

Blutorangensaft OP t1_jd7jaor wrote

Thank you for your comment. I have not worked with ResNets before, and the paper I used as a basis erroneously stated that they chose this architecture because of vanishing gradients. Wikipedia has the same error it seems.

Indeed, I am working with WGAN-GP. Unfortunately, implementing layer norm, while enabling me to scale the depth, completely changes the training dynamics. Training both G and C with the same learning rate and the same schedule (1:1), the critic seems to win, a behaviour I have never seen before in GANs. I suppose I will have to retune learning rates.

1