next up previous contents
Next: Conclusions Up: Optimization Previous: Optimization   Contents


Mono-objective vs. Multiobjective

Mono-objective optimization (§4.1.1, page [*]) means to optimize (in our case always to decrease) a single objective, i.e. a well defined target, to the detriment of all the others possible targets.

Figure: Delay optimization of $ 0.7\,\mathrm{\mu m}$ gates.
[Delay amelioration] \includegraphics[width=\myfigwidtha]{figures/result/full_delay_07.eps} [Energy-dissipation deterioration] \includegraphics[width=\myfigwidtha]{figures/result/full_delay_07_p.eps}

Figure: Delay optimization of $ 0.25\,\mathrm{\mu m}$ gates.
[Delay amelioration] \includegraphics[width=\myfigwidtha]{figures/result/full_delay_025.eps} [Energy-dissipation deterioration] \includegraphics[width=\myfigwidtha]{figures/result/full_delay_025_p.eps}

Figure 7.4: Technology comparison of delay optimization.
[Delay variation] \includegraphics[width=\myfigwidtha]{figures/result/full_delay_tech_comp.eps} [Energy-dissipation variation] \includegraphics[width=\myfigwidtha]{figures/result/full_delay_tech_comp_p.eps}

The very first optimization policy applied to CMOS circuits was the delay optimization. The figures 7.2 and 7.3 sketch the delay optimization of the gates of table 7.1, respectively in $ 0.7\,\mathrm{\mu m}$ and $ 0.25\,\mathrm{\mu m}$ technology implementation, with arrows representing the delay and energy variation. The arrows start from the initial values (i.e. either the delay or the energy measured at the minimum technology width), and end to the values after the optimization.

As it can be expected, the delay has a sensible improvement (diminution, figures 7.2(a), 7.3(a)) while the energy dissipation has a very large increase (figures 7.2(b), 7.3(b)): to decrease the delay the optimizer augments the transistor widths, thus augmenting the overall power dissipation. Table 7.3 and figure 7.4 report the relative variation of delay and power (as minimum, maximum and mean value), for both technology: so, for the $ 0.7\,\mathrm{\mu m}$ technology the delay is, in average, decreased by $ 3.43$ times, while for the $ 0.25\,\mathrm{\mu m}$ technology it is decreased by $ 2.75$ times (figure 7.4(a)). On the contrary, the energy dissipation is increased by $ 20.42$ times in $ 0.7\,\mathrm{\mu m}$ and by $ 13.41$ in $ 0.25\,\mathrm{\mu m}$ (figure 7.4(b)).

The table 7.4 shows the total time taken by the optimization of each gate, together with the total number of function evaluations, that is the number of times the simulator (in this case HSPICE) of the circuit has been invoked. These numbers are quite reasonable per se, and moreover the optimization of a cell library ought to be performed only once, before the reuse of it. Furthermore, in the case of very large circuits, the modular architecture of the optimizer makes possible to switch from one simulator to another on the fly; thus we can use a very fast simulator (as FAST) in the earlier steps of optimization, and switch to a more precise but slower simulator (as HSPICE) in the later stages of the optimization process.


Table 7.3: Delay decreasing and energy increasing (both relative) in a delay optimization. 
             
21inTechnology Delay decreasing Energy increasing
  Max. Min. Mean Max. Min. Mean
             
$ 0.7\,\mathrm{\mu m}$ 8.43 3.43 4.80 43.61 8.89 20.42
$ 0.25\,\mathrm{\mu m}$ 7.78 2.75 3.16 35.25 6.17 13.41
             


Table 7.4: Elapsed time and total number of function evaluations for a full-delay optimization with HSPICE -- on a ULTRA-sparc 5  
         
Technology $ \mathbf{0.7}\,\boldsymbol{\mu}\mathbf{m}$ $ \mathbf{0.25}\,\boldsymbol{\mu}\mathbf{m}$
Gate El. time [s] Fun. eval. El. time [s] Fun. eval.
         
inv 332.6 12 212.4 13
and-n 1338.3 34 1675.6 36
and-p 1426.6 34 2449.5 41
or-n 1408.3 32 1950.0 34
or-p 1259.5 31 1355.5 27
latch-n 1286.5 32 1466.7 32
latch-p 1307.1 33 1574.7 31
and-or 5830.9 73 9280.3 91
and-static 786.5 25 729.6 31
or-static 651.6 21 626.1 24
parity 64098.2 159 35274.3 178
static-fa 27034.8 239 23794.1 180
tspc-fa1 2413.3 69 2881.2 70
tspc-fa2 16459.1 66 63485.2 121
         

These results are largely previsionable, since a hard delay optimization leads to a very large increase in transistor dimensions, thus leading to a great area occupancy and energy dissipation.

Moreover, another issue arises when optimizing an entire cell library: is it necessary to push at their limits every single cell? In a generic static circuit the total delay is, generally, the sum of the delay of each cell comprising the circuit, since this delay is bounded by the delay of the worst critical path and, moreover, it is possible to have a single critical path5from a primary input to a primary output of the circuit; so it has some sense to optimize every single cell to its best.
In a generic dynamic circuit, the global delay is still bounded by the delay of the worst critical path in the circuit, determining thus the minimum clock period. Since this critical path is contained in a single cell for a single-phase dynamic logic (where there are n-gate and p-gate alternated, working with different clock phases), the delay of the entire circuit is bounded by the delay of the worst library cell in circuit. It has no sense, thus, to optimize the basic library cells (that are present in every circuit) to their limits, when the delay of a generic circuit is bounded by the worst of them. It is, instead, more useful to try to optimize the worst cell in the library, while trying to reduce the delay of the other cells to the value obtained by the previous optimization. In this way a reduction of the dimensions of these cell is achieved, obtaining thus a reduction of the overall energy dissipation.

So the consequent idea is to try to optimize an entire (dynamic) cell library using a constrained optimization 6; the strategy for this purpose is:

i)
evaluate the delay for every cell at minimum width;
ii)
choose the worst cell (with regard to delay) among the previous;
iii)
optimize the delay of this cell as long as it is possible;
iv)
optimize all the other cell to have a delay not superior to the value obtained in the previous point.

As an example, the constrained optimization of dynamic $ 0.25\,\mu m$ gates is reported in table 7.5: this optimization has been performed with a constraint on every gate for not to have a delay greater than 125 ps. This value has been obtained by an unconstrained optimization of the worst (with respect of delay) cell, the TSPC type-p ``or'' gate (cfr. table 7.2). After this optimization the delay of this gate was 121.2ps, so the value chosen for the optimization of all the other gates was 125ps.


Table: Constrained delay optimization of a few $ 0.25\,\mathrm{\mu m}$ gates. 
Gate Delay pre-opt. [ps] Delay post-opt. [ps]
     
and-n 315.800 100.500
and-p 482.500 111.900
or-n 299.700 114.900
or-p 482.500 121.200
latch-n 293.300 88.080
latch-p 482.500 118.600
     
Average delay 392.72 109.20
Standard deviation 36.65 3.83
     

It is possible to see, from table 7.5 that the delays after the optimization have a standard deviation7 (3.83) far smaller than the standard deviation before the optimization (36.65). This means that all the cells have quite the same delay after the optimization, and that this value is an ``optimal'' one, since minimizes the delay of block constituted by these cells, and in the same time reduces the power dissipation and area occupancy with respect to a solution with all the cells optimized independently.

The procedure of a constrained optimization is useful only when we want to constraint a single target to a precise value. It is not useful when we want to constraint more than one target at the same time, for example delay and power together: such optimizations are not feasible as first they would require an evaluation of quantities to be constrained (in order to know if the constraints are reasonable), second it could not be possible for the optimizer to satisfy all the constraints.

A much more useful policy to take into account specifically more than one target is to perform a multi-objective optimization.

The figures 7.5 and 7.7 show four different multi-objective optimization, respectively, for the $ 0.7\,\mathrm{\mu m}$ and $ 0.25\,\mathrm{\mu m}$ technology (with figures 7.6, 7.8 that are, respectively, a zoom of the figures 7.5(b), 7.7(b). The four different optimizations performed are:

i)
full delay optimization, indicated with ``Delay=100% Power=0%'';
ii)
a delay optimization, taking slightly into account the power consumption, indicated with ``Delay=80% Power=20%'';
iii)
a delay-power optimization, taking into account the power dissipation in an equal measure, indicated with ``Delay=50% Power=50%'';
iv)
a delay optimization, taking strongly into account the power consumption, indicated with ``Delay=20% Power=80%'';

Figure: Several delay-power optimization policies of $ 0.7\,\mathrm{\mu m}$ gates.
[Delay variation] \includegraphics[width=\myfigwidtha]{figures/result/delay_comp_07.eps} [Energy-dissipation variation] \includegraphics[width=\myfigwidtha]{figures/result/power_comp_07.eps}

Figure 7.6: Energy-dissipation variation (zoom of figure 7.5(b))
\includegraphics[width=\myfigwidtha,clip]{figures/result/power_comp_07_zoom.eps}

Figure: Several delay-power optimization policies of $ 0.25\,\mathrm{\mu m}$ gates.
[Delay variation] \includegraphics[width=\myfigwidtha]{figures/result/delay_comp_025.eps} [Energy-dissipation variation] \includegraphics[width=\myfigwidtha]{figures/result/power_comp_025.eps}

Figure 7.8: Energy-dissipation variation (zoom of figure 7.7(b))
\includegraphics[width=\myfigwidtha,clip]{figures/result/power_comp_025_zoom.eps}

The percent numbers8 reported after delay and power, are, also, the coefficients $ \alpha$ and $ \beta$ of the equation 5.5 (page [*]) used as a cost function in the optimization algorithm.

Figure 7.9: Delay-power optimization ($ 50\%$-$ 50\%$) comparison of $ 0.7\,\mathrm{\mu m}$ and $ 0.25\,\mathrm{\mu m}$ gates.
[Delay variation] \includegraphics[width=\myfigwidtha]{figures/result/delay_power_tech_comp.eps} [Energy-dissipation variation] \includegraphics[width=\myfigwidtha]{figures/result/delay_power_tech_comp_p.eps}

From these figures we see the delay that reduces more and more with the increasing of its relative weight, while the increasing of the power dissipation is somewhat limited by the increase of its relative weight.

From all the optimization policies, the one that gives the most useful results is the optimization of delay and power with the same weights, that is the one indicated with ``Delay=50% Power=50%'' in the previous figures. These results are reported also in figure 7.9, as a particular case.
This is, probably, the most useful optimization since it still reduces a lot the delay, but it contains the increasing of the power dissipation to a more acceptable value.

Figure 7.10: Delay and power trajectory during 4 different multi-objective optimizations for the and-or gate of figure 5.14 (page [*])
[ $ 0.25\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_and_or_025.eps} [ $ 0.7\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_and_or_07.eps}

Figure 7.11: Delay and power trajectory during 4 different multi-objective optimizations for the parity gate of figure 5.15 (page [*])
[ $ 0.25\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_netparity_025.eps} [ $ 0.7\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_netparity_07.eps}

Figure 7.12: Delay and power trajectory during 4 different multi-objective optimizations for the static full-adder of figure 5.16 (page [*])
[ $ 0.25\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_fa07_st.eps} [ $ 0.25\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_fa025_st.eps}

Figure 7.13: Delay and power trajectory during 4 different multi-objective optimizations for the dynamic full-adder of figure 5.17 (page [*])
[ $ 0.25\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_fa025_dy.eps} [ $ 0.7\,\mathrm{\mu m}$] \includegraphics[width=\myfigwidtha]{figures/result/tr_fa07_dy.eps}

The figures 7.10, 7.11, 7.12 and 7.13, show the same four optimizations by means of the trajectory in the space delay-power during the optimization process. In these figures each marked point is a step in the optimization process. It is so possible to see how augmenting the relative weight of the delay in the cost function (and thus reducing the energy relative weight), leads the optimizer to go further in the trajectory reducing the delay and augmenting the energy dissipation.


Table 7.6: Delay worsening and energy-dissipation improvement between a full delay optimization and delay-power optimization 
             
Technology $ \mathbf{0.7}\,\boldsymbol{\mu}\mathbf{m}$ $ \mathbf{0.25}\,\boldsymbol{\mu}\mathbf{m}$
Gate $ \Delta\,$Delay $ \Delta\,$Energy $ \Delta\,$Area $ \Delta\,$Delay $ \Delta\,$Energy $ \Delta\,$Area
             
inv 39.3% -20.1% -40.9% 15.7% -10.4% -21.1%
and-n 27.8% -92.2% -87.4% 6.3% -36.3% -42.1%
and-p 48.4% -81.0% -80.4% 1.1% -39.4% -49.2%
or-n 33.1% -67.2% -76.2% 46.9% -77.5% -66.9%
or-p 33.1% -77.8% -69.9% 11.8% -35.5% -21.6%
latch-n 31.5% -71.1% -84.7% 41.3% -22.0% -46.4%
latch-p 28.3% -75.2% -76.1% 14.6% -69.5% -72.1%
and-or 29.5% -91.2% -89.2% 6.7% -81.2% -79.1%
and-static 28.7% -67.3% -79.3% 14.4% -42.1% -53.3%
or-static 21.4% -30.4% -28.8% -3.4% 18.4% -12.8%
parity 7.7% -78.1% -81.2% 2.5% -50.3% -51.0%
static-fa 33.3% -87.2% -86.7% 5.9% -81.9% -82.4%
tspc-fa1 11.0% -29.3% -27.4% 15.4% -48.1% -48.3%
tspc-fa2 12.3% -72.5% -71.2% 8.6% -41.9% -44.1%
             
average +27.5% -67.2% -69.9% +13.4% -44.1% -49.3%
             

From these figures it can be clearly seen again that the multi-objective optimization ``Delay=50% Power=50%'' has the best results with respect to delay optimization and, at the same time, to containing the energy dissipation within reasonable value. These results are summarized in table 7.6: in this table are showed the percent variation of delay and energy dissipation between the values obtained after a full delay optimization and the values obtained after a delay-power optimization. The average worsening in the delay (i.e. the difference between the delay value after a full delay optimization and the same value after a delay-power optimization) is $ +27.5\%$ for the $ 0.7\,\mu m$ technology and just $ +13.6\%$ for the $ 0.25\,\mu m$ technology. Despite these low rate of worsening, the average energy-dissipation reduction is $ -67.2\%$ for the $ 0.7\,\mu m$ technology and $ -44.1\%$ for the $ 0.25\,\mu m$ technology, while the area occupancy reductions are, respectively, $ -69.9\%$ and $ -49.3\%$ This means that accepting a slight degradation in the delay figure, leads to a great reduction of the overall energy-dissipation and area occupancy.


next up previous contents
Next: Conclusions Up: Optimization Previous: Optimization   Contents
marco+site@equars.com