)
means to optimize (in our case always
to decrease) a single objective, i.e. a well defined target, to the
detriment of all the others possible targets.
|
[Delay variation]
[Energy-dissipation variation]
|
The very first optimization policy applied to CMOS circuits
was the delay optimization. The figures 7.2 and 7.3
sketch the delay optimization of the gates of table 7.1,
respectively in
and
technology implementation, with arrows representing the delay and energy
variation. The arrows start from the initial values (i.e. either
the delay or the energy measured at the minimum technology width), and end
to the values after the optimization.
As it can be expected, the delay has a sensible improvement (diminution,
figures 7.2(a), 7.3(a))
while the energy dissipation has a very large increase (figures
7.2(b), 7.3(b)): to decrease the
delay the optimizer augments the transistor widths, thus augmenting the
overall power dissipation. Table 7.3 and
figure 7.4 report the relative
variation of delay and power (as minimum, maximum and mean value),
for both technology: so, for the
technology the delay
is, in average, decreased by
times, while for the
technology it is decreased by
times (figure
7.4(a)). On the
contrary, the energy dissipation is increased by
times in
and by
in
(figure 7.4(b)).
The table 7.4 shows the total time taken by the optimization of each gate, together with the total number of function evaluations, that is the number of times the simulator (in this case HSPICE) of the circuit has been invoked. These numbers are quite reasonable per se, and moreover the optimization of a cell library ought to be performed only once, before the reuse of it. Furthermore, in the case of very large circuits, the modular architecture of the optimizer makes possible to switch from one simulator to another on the fly; thus we can use a very fast simulator (as FAST) in the earlier steps of optimization, and switch to a more precise but slower simulator (as HSPICE) in the later stages of the optimization process.
| Technology |
|
|
||
| Gate | El. time [s] | Fun. eval. | El. time [s] | Fun. eval. |
| inv | 332.6 | 12 | 212.4 | 13 |
| and-n | 1338.3 | 34 | 1675.6 | 36 |
| and-p | 1426.6 | 34 | 2449.5 | 41 |
| or-n | 1408.3 | 32 | 1950.0 | 34 |
| or-p | 1259.5 | 31 | 1355.5 | 27 |
| latch-n | 1286.5 | 32 | 1466.7 | 32 |
| latch-p | 1307.1 | 33 | 1574.7 | 31 |
| and-or | 5830.9 | 73 | 9280.3 | 91 |
| and-static | 786.5 | 25 | 729.6 | 31 |
| or-static | 651.6 | 21 | 626.1 | 24 |
| parity | 64098.2 | 159 | 35274.3 | 178 |
| static-fa | 27034.8 | 239 | 23794.1 | 180 |
| tspc-fa1 | 2413.3 | 69 | 2881.2 | 70 |
| tspc-fa2 | 16459.1 | 66 | 63485.2 | 121 |
These results are largely previsionable, since a hard delay optimization leads to a very large increase in transistor dimensions, thus leading to a great area occupancy and energy dissipation.
Moreover, another issue arises when optimizing an entire cell library:
is it necessary to push at their limits every single cell? In a generic
static circuit the total
delay is, generally, the sum of the delay of each cell comprising the circuit, since this
delay is bounded by the delay of the worst critical path and, moreover, it is
possible to have a single critical path5from a primary input to a primary output of the circuit; so it has some
sense to optimize every single cell to its best.
In a generic dynamic circuit, the global delay is still bounded by the delay of the worst
critical path in the circuit, determining thus the minimum clock period. Since
this critical path
is contained in a single cell for a single-phase dynamic logic (where there are
n-gate and p-gate alternated, working with different clock phases), the delay of the
entire circuit is bounded by the delay of the worst library cell in circuit. It has no sense,
thus, to optimize the basic library cells (that are present in every circuit)
to their limits, when the delay of a
generic circuit is bounded by the worst of them. It is, instead, more useful to try
to optimize the worst cell in the library, while trying to reduce the delay of the other
cells to the value obtained by the previous optimization. In this way a reduction of the
dimensions of these cell is achieved, obtaining thus a reduction of the overall
energy dissipation.
So the consequent idea is to try to optimize an entire (dynamic) cell library using a constrained optimization 6; the strategy for this purpose is:
As an example, the constrained optimization of dynamic
gates is
reported in table 7.5: this optimization has been
performed with a constraint on every gate for not to have a delay
greater than 125 ps. This value has been obtained by an
unconstrained optimization of the worst (with respect of delay) cell,
the TSPC type-p
``or'' gate (cfr. table 7.2). After this optimization the
delay of this gate was 121.2ps, so the value chosen for the optimization
of all the other gates was 125ps.
It is possible to see, from table 7.5 that the delays after the optimization have a standard deviation7 (3.83) far smaller than the standard deviation before the optimization (36.65). This means that all the cells have quite the same delay after the optimization, and that this value is an ``optimal'' one, since minimizes the delay of block constituted by these cells, and in the same time reduces the power dissipation and area occupancy with respect to a solution with all the cells optimized independently.
The procedure of a constrained optimization is useful only when we want to constraint a single target to a precise value. It is not useful when we want to constraint more than one target at the same time, for example delay and power together: such optimizations are not feasible as first they would require an evaluation of quantities to be constrained (in order to know if the constraints are reasonable), second it could not be possible for the optimizer to satisfy all the constraints.
A much more useful policy to take into account specifically more than one target is to perform a multi-objective optimization.
The figures 7.5 and 7.7 show four different
multi-objective optimization, respectively, for the
and
technology (with figures 7.6,
7.8 that are, respectively,
a zoom of the figures 7.5(b), 7.7(b). The four different optimizations
performed are:
|
[Delay variation]
[Energy-dissipation variation]
|
|
[Delay variation]
[Energy-dissipation variation]
|
The percent numbers8
reported after delay and power, are, also, the
coefficients
and
of the equation 5.5
(page
) used
as a cost function in the optimization algorithm.
|
[Delay variation]
[Energy-dissipation variation]
|
From these figures we see the delay that reduces more and more with the increasing of its relative weight, while the increasing of the power dissipation is somewhat limited by the increase of its relative weight.
From all the optimization policies, the one that gives the most
useful results is the optimization of delay and power with the same
weights, that is the one indicated with ``Delay=50% Power=50%'' in the
previous figures. These results are reported also in
figure 7.9, as a particular case.
This is, probably, the most useful optimization since it still
reduces a lot the delay, but it contains the increasing of the power
dissipation to a more acceptable value.
|
[
[
|
|
[
[
|
|
[
[
|
|
[
[
|
The figures 7.10, 7.11, 7.12 and 7.13, show the same four optimizations by means of the trajectory in the space delay-power during the optimization process. In these figures each marked point is a step in the optimization process. It is so possible to see how augmenting the relative weight of the delay in the cost function (and thus reducing the energy relative weight), leads the optimizer to go further in the trajectory reducing the delay and augmenting the energy dissipation.
| Technology |
|
|
||||
| Gate | ||||||
| inv | 39.3% | -20.1% | -40.9% | 15.7% | -10.4% | -21.1% |
| and-n | 27.8% | -92.2% | -87.4% | 6.3% | -36.3% | -42.1% |
| and-p | 48.4% | -81.0% | -80.4% | 1.1% | -39.4% | -49.2% |
| or-n | 33.1% | -67.2% | -76.2% | 46.9% | -77.5% | -66.9% |
| or-p | 33.1% | -77.8% | -69.9% | 11.8% | -35.5% | -21.6% |
| latch-n | 31.5% | -71.1% | -84.7% | 41.3% | -22.0% | -46.4% |
| latch-p | 28.3% | -75.2% | -76.1% | 14.6% | -69.5% | -72.1% |
| and-or | 29.5% | -91.2% | -89.2% | 6.7% | -81.2% | -79.1% |
| and-static | 28.7% | -67.3% | -79.3% | 14.4% | -42.1% | -53.3% |
| or-static | 21.4% | -30.4% | -28.8% | -3.4% | 18.4% | -12.8% |
| parity | 7.7% | -78.1% | -81.2% | 2.5% | -50.3% | -51.0% |
| static-fa | 33.3% | -87.2% | -86.7% | 5.9% | -81.9% | -82.4% |
| tspc-fa1 | 11.0% | -29.3% | -27.4% | 15.4% | -48.1% | -48.3% |
| tspc-fa2 | 12.3% | -72.5% | -71.2% | 8.6% | -41.9% | -44.1% |
| average | +27.5% | -67.2% | -69.9% | +13.4% | -44.1% | -49.3% |
|---|---|---|---|---|---|---|
From these figures it can be clearly seen again that the multi-objective
optimization ``Delay=50% Power=50%'' has the best results with respect
to delay optimization and, at the same time, to containing the energy
dissipation within reasonable value. These results are summarized in
table 7.6: in this table are showed the percent variation
of delay and energy dissipation between the values obtained
after a full delay optimization and the values obtained after a
delay-power optimization. The average worsening in the delay
(i.e. the difference
between the delay value after a full delay optimization and the
same value after a delay-power optimization)
is
for the
technology and just
for the
technology. Despite these low rate of worsening, the average
energy-dissipation reduction is
for the
technology
and
for the
technology, while the area occupancy
reductions are, respectively,
and
This means that accepting a
slight degradation in the delay figure, leads to a great reduction of the
overall energy-dissipation and area occupancy.