Viewing a single comment thread. View all comments

Ricenaros t1_jeawpf3 wrote

It refers to the number of scalars needed to specify the model. At the heart of machine learning is matrix multiplication. Consider input vector x of size (n x 1). Here is a Linear transformation: y = Wx + b. In this case, the (m x n) matrix W(weights) and the (m x 1) vector b(bias) are the model parameters. Learning consists of tweaking W,b in a way that lowers the loss function. For this simple linear layer there are m*n + m scalar parameters (The elements of W and the elements of b).

Hyperparameters on the other hand are things like learning rate, batch size, number of epochs, etc.

Hope this helps.

2

sparkpuppy t1_jee9qj3 wrote

Hello, thank you so much for the detailed explanation! Yes, it definitely helps me have a clearer vision of the meaning of that expression. Have a nice day!

1