Gradient Descent


How To Use

In this applet, the function that is inside the "Function" box will be minimized by the gradient descent algorithm. The value that is inside of the "Learning Rate" box is the size of each gradient step (how fast the function will be minimized). If it is too big the steps will pass over the minimum and go back and forth around the minimum without reaching it. If it is too small it will take a very long time to find the minimum. The error is the current y value of the point, the smaller the error is, the better the minimizing is doing. The value of the "Iterations" label is how many gradient steps have been taken so far. You can reset this value by clicking the point on the graph.

How It Works

To minimize the function, the derivative of it is taken and stored as a function g(x). To figure out how to modify the position of the point so that it goes towards the minimum of the function, you just take the negative value of the derivative. Then, to adjust the speed the steps are taken this value is multiplied by the learning rate. Then this value is added to the current value of the point every iteration. So in pseudo-code: def function(x): return x ** 2 a = 25 learning_rate = 0.01 while True: # Each iteration: gradient = derivative(function) a += -learning_rate * gradient plot((a, function(a)))