next up previous contents
Next: The conjugate direction method Up: The optimal gradient Previous: The optimal gradient   Contents

Convergence considerations

A general descend algorithm converges if:

$\displaystyle \lim_{k\rightarrow\infty} \nabla f(\mathbf{x}^k) = 0.$    

Property 5.30   The function $ f$ monotonically decreases along the (negative) gradient path.

Proof. From equation (4.9)

$\displaystyle \frac{d f}{d l} = \sum_{i=1}^n \frac{\partial f}{\partial x_i}\fr...
...sum_{i=1}^n\left(\frac{\partial f}{\partial x_i} \right)^2\biggr]^{\frac{1}{2}}$ (5.27)

Thus $ \dfrac{df}{dl} \leq 0$, or the function $ f$ decreases along the path $ dl$.

$ \qedsymbol$

Lemma 5.31   The convergence of a descend method along the gradient path can not be obtained in a finite number of steps.

Proof. From equation (4.12) (page [*])

$\displaystyle \frac{d f}{d l}=-\biggl[\displaystyle\sum_{i=1}^n\left(\frac{\partial f}{\partial x_i} \right)^2\biggr]^{\frac{1}{2}}$    

but when $ \mathbf{x}$ approaches the optimum $ \mathbf{x}^\star$, then

$\displaystyle \lim_{\mathbf{x}\rightarrow\mathbf{x}^\star} \frac{\partial f}{\partial x_i}(\mathbf{x})=0$    

so that

$\displaystyle \lim_{\mathbf{x}\rightarrow\mathbf{x}^\star} \frac{d f}{d l}(\mathbf{x})=0$    

meaning that the optimum is reached with a rate convergence that decreases.

$ \qedsymbol$

For the optimal gradient method the convergence is only linear11 in $ f(\mathbf{x}^k)$ and a halting criterion for the algorithm could be:

$\displaystyle f(\mathbf{x}^k)-f(\mathbf{x}^{k+1})\leq \epsilon;$    

alternatively from the necessary condition $ \nabla f(\mathbf{x})=0$

$\displaystyle \max_i \vert\frac{\partial f}{\partial x_i}(\mathbf{x}^k) \vert \leq \epsilon$   or$\displaystyle \qquad \sum_{i=1}^n\left(\frac{\partial f}{\partial x_i}(\mathbf{x}^k)\right)^2\leq \epsilon$    

Finally note that these methods, since they use a local gradient information, they find only a local minimum, and that the gradient algorithms are rather inefficient in the proximity of the optimum, due to the small step size.


next up previous contents
Next: The conjugate direction method Up: The optimal gradient Previous: The optimal gradient   Contents
marco+site@equars.com