next up previous contents
Next: Convergence considerations Up: Multi-dimensional search Previous: The gradient direction: steepest   Contents


The optimal gradient

This algorithm simply calculates the step $ dl$ according to:

$\displaystyle \min_{dl\in\mathbb{R}^+} f(\mathbf{x}^k-dl\nabla f(\mathbf{x}^k))$    

This is a one-dimensional optimization and it is usually performed with a method as shown previously. Strictly speaking, the optimization of $ f$ is always a multidimensional one, since we descend along the gradient path, but inner this process there are a lot of sub-optimization steps that found the optimal length of this descend.

If $ f\in C^2$, that is $ f$ is twice differentiable and its derivatives are continue, then a closed form for the optimum step $ dl$ is determinable; we expand $ f$ in Taylor series:

$\displaystyle f(\mathbf{x}^k+\Delta\mathbf{x})=f(\mathbf{x}^k)+ \left[\nabla f(...
...bf{x}+ \frac{1}{2}\Delta \mathbf{x}^k\mathbf{H}(\mathbf{x}^k)\Delta \mathbf{x},$    

where $ \mathbf{H}(\mathbf{x})$ is the Hessian10matrix of $ f$.

Along the gradient direction:

$\displaystyle \Delta \mathbf{x}=dl^k\nabla f(\mathbf{x}^k).$    

Thus:

\begin{displaymath}\begin{split}f(\mathbf{x}^k+dl^k\nabla f(\mathbf{x}^k))=&f(\m...
...t]^T \mathbf{H}(\mathbf{x}^k)\nabla f(\mathbf{x}^k) \end{split}\end{displaymath}    
$\displaystyle \frac{d f}{d l^k} = \left[\nabla f(\mathbf{x}^k)\right]^T\nabla f...
...nabla f(\mathbf{x}^k)\right]^T \mathbf{H}(\mathbf{x}^k)\nabla f(\mathbf{x}^k)=0$ (5.26)

and


$\displaystyle d l^k=\left[\frac{-\left[\nabla f(\mathbf{x}^k)\right]^T\nabla f(...
...f(\mathbf{x}^k)\right]^T\mathbf{H}(\mathbf{x}^k)\nabla f(\mathbf{x}^k)}\right].$    

From $ \frac{d f}{d l^k}(\mathbf{x}^{k+1}) = 0$, we can see that:

$\displaystyle \left(\nabla f(\mathbf{x}^k+d l^k\nabla f(\mathbf{x}^k)),\, \nabla f(\mathbf{x}^k)) \right) = 0,$    

that is $ \nabla f(\mathbf{x}^k))$ and $ \nabla f(\mathbf{x}^{k+1}))$ are orthogonal, or, the same, $ \mathbf{x}^k$ and $ \mathbf{x}^{k+1}$ are orthogonal. This means that successive steps of the optimal gradient algorithm are orthogonal.



Subsections
next up previous contents
Next: Convergence considerations Up: Multi-dimensional search Previous: The gradient direction: steepest   Contents
marco+site@equars.com