Submitted by MLNoober t3_xuogm3 in MachineLearning
Tgs91 t1_ir0n6hb wrote
Reply to comment by MLNoober in [D] Why restrict to using a linear function to represent neurons? by MLNoober
You are missing the activation function, which is part of the neuron. They're sometimes passed a separate layer, but it's just a way to represent nested functions. So it isnt:
F(X) = WX + b
It is:
F(X) = A(WX + b), where A is a nonlinear function.
You could make A a polynomial function and it would be equivalent to your suggestion. However polynomials have poor convergence properties and are expensive to compute. Early neural nets used sigmoid activations for non-linearity, now various versions of ReLU are most popular. It turns out that basically any non-linear function gives the model enough freedom to approximate any non-linear relationship, because so many neurons are then recombined. In the case of ReLU, it's like using the Epcot Ball to approximate a sphere.
Viewing a single comment thread. View all comments