Submitted by Severe-Improvement32 t3_10ohqyw in deeplearning
so, I have been learning what DL is and how NN learns to do stuff. From what I understand is the repeated iteration will take random weights and at some point those weights will be kinda perfect for the given task (plz correct me if i'm wrong)
Ok, so lets take an example of a task like path finding AI, so we make a NN and train it to go from point A to point B, now it is trained and doing nice and goes to point b perfectly, SO here the weights are set to go from point A to point B right?
What if we give the point B somewhere else, How will the AI get perfect weights as the current weights are only perfect for current point B
What if we put an obstacle in between point A and B, how will the NN set weights, or is it something like a range of weights which are perfect for any given task for NN
​
IDK if I explained it right, plz comment if you have question about my question, and answer also💕
FastestLearner t1_j6exhli wrote
Corrections:
The weights are set to random only at the beginning (i.e. before iter=0). Every iteration onwards, the optimization algorithm (some form of gradient descent) kicks in and nudges the weights slightly in a way to make the whole network perform incrementally better at the task it’s being trained for. After hundreds of thousands of iterations, it is hoped that the weights reach an optimal state, where more nudging does not optimize the weights any further (and by extension it does not make the neural network learn any better). This is called convergence.
Coming to your example of path finding, first of all this is a reinforcement learning (RL) problem. RL is different from DL. DL or deep learning is a subset of machine learning algorithms which is mostly concerned with the training of deep neural networks (hence the name). RL is a particular method of training ‘any’ learning algorithm (doesn’t always have to be neural networks) using what are called reward functions. Think of it like training a dog (an agent) to perform tricks (a task) using biscuits (as rewards). Every time your dog does what you ask him to do and then you follow up by giving him a biscuit, you basically ‘reinforce’ his behavior, so he will do more of it when you ask him to do it again.
Now, the example of the path finding agent that you gave is silly. No RL agent is trained on one single scenario. If you do train an RL agent on just a single scenario, you get a condition called overfitting, meaning that your agent learns perfectly well on how to navigate that one scenario but it doesn’t generalize to any other unseen scenarios. In practice, we train an RL agent on hundreds of thousands of different scenarios, with each scenario being slightly different from the rest. Many of these scenarios can have different conditions like different lighting, differently structured environment, different geometries and different obstacles, etc. etc. What we hope to achieve is that after training, the RL agent learns a generalized navigation function that is adaptive to any scenario.
I suggest you watch some TwoMinutePapers videos on YT, of some OpenAI’s RL papers. There are some videos in which RL agents learn to fight in a boxing match, and in another one, several agents collaborate to play hide and seek. You’d get a feel for how RL works.