Viewing a single comment thread. View all comments

seb59 t1_iqwsnee wrote

First it is not linear. Relu functions (rectified linear functions) are piecewise linear model. By combining them you can make approximate any non linear function with a finite number of discontinuities as close as you want. Think of it as some form of tesselation. If you add more face, you can match the function better. Note that it is note mandatory to use piecewise linear function. You may use other nonlinearity such as tanh. But using such bounded linearities, the learning process becomes subject to vanishing gradient that makes the learning process very difficult. As ReLu are unbounded they avoid this problem.

1