next up previous contents
Next: The optimal gradient Up: Multi-dimensional search Previous: Multi-dimensional search   Contents


The gradient direction: steepest (maximum) descent

The method of the steepest descent chooses at each iteration a new point in the decision space $ \mathbf{x}+d\mathbf{x}$ from the old point $ \mathbf{x}$, obviously such that:

$\displaystyle f(\mathbf{x}+d\mathbf{x}) < f(\mathbf{x})$    

This new point must also be chosen such that the variation of the function $ f$ is as more as possible. In other words, if $ dl$ is the length of the direction:

$\displaystyle dl=\sqrt{\sum_{i=1}^n (dx_i)^2},$    

the steepest descent maximizes the rate of change $ df/dl$.

The problem of minimize $ f$ becomes so the problem:

Problem 5.29 (Steepest descent)  

$\displaystyle \max \sum_{i=1}^n \frac{d\mbox{}f}{dl} = \max_{\tfrac{d x_i}{d l}} \sum_{i=1}^n \frac{\partial f}{\partial x_i}\frac{d x_i}{d l},$    

such that

$\displaystyle dl=\sqrt{\sum_{i=1}^n (dx_i)^2}.$    

This problem can be solved with the Lagrangian multipliers; from equations (4.3) and (4.4) (page [*]) we can write:

$\displaystyle \frac{d x_i}{d l} = \frac{1}{2\lambda}\frac{\partial f}{\partial x_i},$    

with

$\displaystyle \lambda = -\frac{1}{2}\biggl[\sum_{i=1}^n\left(\frac{\partial f}{\partial x_i}\right)^2\biggr]^{\frac{1}{2}}.$    

This means:

$\displaystyle \frac{d x_i}{d l}(\mathbf{x}) = - \frac{\dfrac{\partial f}{\parti...
...left(\frac{\partial f}{\partial x_i}(\mathbf{x})\right)^2\biggr]^{\frac{1}{2}}}$ (5.25)

The steepest descend algorithm chooses at each iteration a new point $ \mathbf{x}^{k+1}$ from the old point $ \mathbf{x}^k$ from the equation (4.9) (page [*])

$\displaystyle \mathbf{x}^{k+1} = \mathbf{x}^k - dl \nabla f(\mathbf{x}^k),\quad dl > 0$    

with $ dl$ chosen accordingly to the desired convergence rate: if $ dl$ is small the algorithm will closely approximate the minimum, with slow convergence, while if $ dl$ is large the convergence is fast but the algorithm can oscillate near the minimum. Thus some methods are necessary to reduce (or enlarge) the step $ dl$ at each iteration: large steps if we are far away from the minimum, small steps if we are close to the minimum. The scheme of choosing the proper step can affect greatly the convergence of the algorithm. The best choice is the method of the optimal gradient.


next up previous contents
Next: The optimal gradient Up: Multi-dimensional search Previous: Multi-dimensional search   Contents
marco+site@equars.com