Gradient descent is a technique for iteratively minimizing a convex function $$f: \mathbb{R}^n \rightarrow \mathbb{R}$$ by repeatedly taking steps along its gradient. We define the gradient of $$f$$ to be the unique function $$\nabla f$$ that satisfies:

$lim_{p \rightarrow 0} \frac{f(x+p) - f(x) - \nabla f(x)^{T}p}{\|p\|} = 0$

The gradient of $$f$$ is a vector which points in the direction of maximum increase of $$f$$. As a corollary, the negative gradient $$- \nabla f(x)$$ points in the direction of maximum decrease of $$f$$. Intuitively, the gradient descent procedure minimizes $$f$$ by repeatedly taking steps in this direction of maximum decrease. Formally we pick a starting point $$x_0$$ and a step size $$\alpha$$ and iterate the following procedure:

$x_{t+1} = x_t - \alpha * \nabla f(x_t)$

If $$\nabla f$$ is Lipschitz continuous and $$\alpha$$ is small enough then $$f(x_{t+1})$$ is guaranteed to be less than $$f(x_t)$$.

#### Gradient Descent as Euler’s Method

There is another perspective from which we can derive the gradient descent procedure. Say we have a differential equation of the form $$\frac{dx}{dt} = g(x)$$. We can use Euler’s method to solve this equation by choosing a starting point $$x_0$$ and iteratively applying the following approximation for a step size $$\alpha$$ until we reach a steady state $$x_s$$ such that $$g(x_s) \simeq 0$$:

$x_{t+\alpha} \simeq x_t + \alpha * \frac{dx}{dt} \simeq x_t + \alpha * g(x_t)$

Now say $$g(x) = -\nabla f(x)$$. Applying Euler’s method to this equation yields:

$x_{t+\alpha} = x_{t} - \alpha * \nabla f(x_t)$

Which is exactly the update step for gradient descent! Note that this differential equation reaches a steady state at $$x_s$$ such that $$\nabla f(x_s) = 0$$. By definition, this must be an extrema of $$f$$.