Ecclestoned

Ecclestoned t1_is2mf31 wrote

Nice work, will definitely check it out. You're lucky that you didn't get dinged by reviewers for not citing recent works. Some examples:

GACT: Activation Compressed Training for Generic Network Architectures

ActNN: Reducing Training Memory Footprint via 2-Bit Activation Compressed Training

[AC-GC: Lossy Activation Compression with Guaranteed Convergence] (https://proceedings.neurips.cc/paper/2021/hash/e655c7716a4b3ea67f48c6322fc42ed6-Abstract.html)

7

Ecclestoned t1_irpictt wrote

I am not much of a mathmetician, but have published theoretical ML papers. To answer your second question, I had no idea that I would be able to solve the problem when I started out.

The process was a lot of trial and error, and iterative refinement. I started with the simplest form of the problem I could and made every simplifying assumption that I could (usually that various terms are insignificant). Once I got closer to solving the equations, I got a better idea of what form the initial problem must take for the final solution to be solvable. Then working backwards, I determined which assumptions are necessary, solved the problem, then checked the assumptions.

11