Neural Networks and Deep Learning – V : Gradient Descent

Gradient Descent

Here, we try to find the most appropriate ’W’ and ’b‘ values to minimize cost. The gradient descent method is a very powerful and very general optimization method.


Gradient descent is a popular method in machine learning because one of the aims of machine learning is to find the highest accuracy or to minimize the error rate, given the training data. The gradient landing is used to find the least error by minimizing the cost function Gr.

• Take a point J (w, b), at this point we will be the starting point for the gradient descent method.
• We take small steps in space to reduce the values of Continuous J. The main purpose of this method is to minimize the risk of error as well as the risk of observed using gradient descents. Reductions / updates are made to weights depending on the gradients in each iteration.
• Because J’s values have to continually redress; J => 0
• It has to come to a point at the end of this process. (infinite, so you can’t continue)

This method should be run with several random starting points and the best minimum should be selected.


Gradient Descent Algorithm: α = Learning Coefficient

Selecting Learning Coefficient (α)

1. Small α – slow convergence
2. Big α – back convergence
• To select a good α, you need to try several α values
• For these values, we need to study the performance of the gradual reduction method.

Typical α values will be tried:
1. α = 0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 1.

Leave a Reply

Your email address will not be published. Required fields are marked *