Viewing a single comment thread. View all comments

Tgs91 t1_ir0n6hb wrote

You are missing the activation function, which is part of the neuron. They're sometimes passed a separate layer, but it's just a way to represent nested functions. So it isnt:

F(X) = WX + b

It is:

F(X) = A(WX + b), where A is a nonlinear function.

You could make A a polynomial function and it would be equivalent to your suggestion. However polynomials have poor convergence properties and are expensive to compute. Early neural nets used sigmoid activations for non-linearity, now various versions of ReLU are most popular. It turns out that basically any non-linear function gives the model enough freedom to approximate any non-linear relationship, because so many neurons are then recombined. In the case of ReLU, it's like using the Epcot Ball to approximate a sphere.

1