Viewing a single comment thread. View all comments

master3243 t1_irdi8o7 wrote

In the paper, Appendix A.4 for deriving the loss and gradients,

I don't see how this is true (eq 14) https://i.imgur.com/ZuN2RC2.png

As the RHS seems to equal (2 * alpha_t) * LHS

I'm also unsure how in the same equation this happens https://i.imgur.com/DHixElF.png

9

dkangx t1_irdmnsp wrote

Well, someone’s gonna fire it up and test it out and we will see if it’s real

2

master3243 t1_irdoq0o wrote

Empirical results don't necessarily prove theoretical results, in fact most Deeplearning research (mine included) is trying out different stuff based on intuition and past experiences on what worked until you have something that achieves really good results,

Then you attempt to formally and theoretically show why the thing you did is justified mathematically.

And often enough, once you start going through the formal math you get ideas on how to further improve or different paths to take on your model, and thus it's a back and forth.

However, someone could just as easily get good results with a certain architecture/loss and then fail to justify it formally or skip certain steps or take an invalid jump from one step to another, which results in theoretical work that is wrong but works great empirically.

17