Bot-69912020 t1_ivxbxml wrote on November 11, 2022 at 8:30 AM

Reply to comment by jrkirby in [R] ZerO Initialization: Initializing Neural Networks with only Zeros and Ones by hardmaru

I don't know about each specific implementation, but via the definition of subgradients you can get 'derivatives' of convex but non-differentiable functions (which ReLU is).

More formally: A subgradient at a point x of a convex function f is any x' such that f(y) >= f(x) + < x', y - x > for all y. The set of all possible subgradients at a point x is called the subdifferential of f at x.

For more details, see here.