文章基本信息

标题：Hybrid architecture for a single-precision arithmetic processor.
作者：Jurca, Lucian ; Gontean, Aurel ; Alexa, Florin 等
期刊名称：Annals of DAAAM & Proceedings
印刷版ISSN：1726-9679
出版年度：2008
期号：January
语种：English
出版社：DAAAM International Vienna
摘要：The increase of integration density has permitted the development of the logarithmic-number-system (LNS) processors out of which we mention (Coleman et al., 2000) and (Arnold, 2001), but in these the main difficulty was to implement the addition and subtraction operations. Avoiding this disadvantage and at the same time keeping the qualities of both floating-point (FP) and LNS can be achieved through the design of a hybrid unit which combines the attributes of the FP processor with logarithmic arithmetic. A solution in this direction had already been proposed in (Lai, 1993), where addition and subtraction were performed in FP and multiplication, division, square root and all the other operations in LNS. But the format conversions FP-LNS and LNS-FP were slow because in the linear-interpolation algorithm only non-redundant adders were used.
关键词：Algorithms

Hybrid architecture for a single-precision arithmetic processor.

Jurca, Lucian ; Gontean, Aurel ; Alexa, Florin 等

1. INTRODUCTION

The increase of integration density has permitted the development of the logarithmic-number-system (LNS) processors out of which we mention (Coleman et al., 2000) and (Arnold, 2001), but in these the main difficulty was to implement the addition and subtraction operations. Avoiding this disadvantage and at the same time keeping the qualities of both floating-point (FP) and LNS can be achieved through the design of a hybrid unit which combines the attributes of the FP processor with logarithmic arithmetic. A solution in this direction had already been proposed in (Lai, 1993), where addition and subtraction were performed in FP and multiplication, division, square root and all the other operations in LNS. But the format conversions FP-LNS and LNS-FP were slow because in the linear-interpolation algorithm only non-redundant adders were used.

To improve the format conversions, a new architecture was proposed in (Jurca et al., 2007) where the method of redundant summation of partial products with other inputs was applied because a multiplication and a series of additions occur in sequence in the conversion algorithm. Moreover, corrections of one or two LSB were applied in some memory locations of the log and antilog look-up tables content. In this way, the conversions became two times more accurate than in (Lai, 1993), e.g. the conversion error was kept under 1.5 x [10.sub.-7], while the FP single-precision error is still 1.2 x [10.sub.-7].

The aim of this paper is to offer an alternative of classical FP units because, for some particular applications and with new improvements, the logarithmic and hybrid units can run faster than the floating-point ones.

Thus, in section 2 we present the pipelined architecture of the logarithmic subunit and the conversion algorithms FP-LNS and LNS-FP.

For implementing the floating-point addition/subtraction, a classical 3-stage pipelined subunit synchronized with the logarithmic one (4ns per stage) was designed. This means that in our case we don't need a high level of parallelism for the data paths and thus we can save area. These aspects will be discussed in section 3. To facilitate the comparison with related works, in our design the propagation delays through different gates were adopted corresponding to 0.5-um CMOS technology and all the PSpice models in our digital simulations were settled according to this.

In section 4 we will give an example of how we can implement a simple DSP algorithm using a hybrid processor, we will also show the main directions of our future work and we will conclude the paper.

2. LOGARITHMIC SUBUNIT

The stages of the logarithmic subunit and the conversion algorithms will be briefly presented in this section because a more detailed description can be found in (Jurca et al., 2007).

With the same hardware, the 6-stage pipelined structure presented in Fig.1 can perform either multiplication or division for two FP operands, A and B, depending on the bit-line SOP that acts on the third stage, as we can see in equations (1):

A x B = antilog(logA + logB)

A / B = antilog(logA--logB) (1)

The linear interpolation algorithm for the direct conversion is based on equation (2):

log (1+y) [congruent to] y + [E.sub.y] [+ or -] [DELTA][E.sub.y] x [y.sub.2] (2)

where y is the fractional part of the 23-b significand, out of which the least significant 12 bits represent [y.sub.2]; the values Ey of the function log(1+y)-y for [2.sup.23-Ny2] = [2.sup.Ny1] = [2.sup.11] points were memorized in internal ROM, as well as the values [DELTA][E.sub.y], for its derivative function too, memorized in internal ROM'. For the conversion LNS-FP the algorithm is similar and only the ROM content changes.

The unit includes two format converters FP-LNS for the two operands and one LNS-FP converter which produce the operation result in floating-point format. In fact the direct conversions are fused with ALU operation in the very purpose to reduce the non-redundant additions to the one performed in ALU. The Wallace trees are built with 4:2 compressors.

[FIGURE 1 OMITTED]

Of course, at the output of the second stage the latched data are in carry-save form ([A.sub.1], [A.sub.2], [B.sub.1], [B.sub.2]) and the binary logarithms will never be produced explicitly among the pipelined structure. The exponents [E.sub.A] and [E.sub.B] are concatenated as integer part of the data ([A.sub.1] , [B.sub.1]) which is to be operated in fixed-point in ALU. The digital simulations proved that the propagation delays across all stages were 4ns.

3. FLOATING-POINT SUBUNIT

The first stage of the floating-point subunit (the alignment of mantissas) is shown in Fig.2. The adders "ExpA-ExpB" and "ExpB-ExpA" work simultaneously and the MSB of "ExpA-ExpB" (whose fan-out was multiplicated with the block Mlt) will select the bigger exponent and the associated significand (mantissa) at the output of the stage. It will select the positive exponent difference also to command the barrel-shifter BS which shifts to the right the other significand.

The second stage is an adder/subtracter circuit (Fig.3). Adder1 and Adder2 are 25-b adders because the sign bit and the implicit 1 are attached to the 23-b significand. In our design we used very fast 2-level hybrid adders (Jurca & Maranescu, 2004), with five (1+2x2) 8-b carry look-ahead adders (CLA, with input carry 0 and 1 respectively) plus in the most significant position two 9-b CLA, these ones on the 1st level and a carry select mechanism on the 2nd level. The same type of adder was used in the third stage of the logarithmic subunit.

[FIGURE 2 OMITTED]

[FIGURE 3 OMITTED]

The normalization of the final result and the adjusting of the exponent are done in the third stage (Fig.4). In the case of subtraction, the number of leading zeros, counted with LZC, is subtracted from the transmitted exponent and the output of the barrel-shifter BS will be selected. In the case of addition, the data will be either shifted one bit to the right, or unshifted.

[FIGURE 4 OMITTED]

4. APPLICATIONS. CONCLUSIONS

A DSP algorithm can be very easily implemented in the hybrid unit as we can see in Fig.5 where the 20th and 29th steps of a 20-term Livermore-loop computing are presented. The total time is only 140ns (35 cycles), but our aim is to improve further the hybrid unit, by reducing its latency and by implementing 3-operand multiplication and shift capability in logarithmic ALU for square root or any power capabilities. A remarkable feature of this unit is that it performs a division faster than the most recent FPUs (Nikmehr & al., 2007).

[FIGURE 5 OMITTED]

5. REFERENCES

Arnold, M. (2001). A Pipelined LNS ALU, Workshop on VLSI. Proceedings, pp. 155-161, ISBN: 0-7695-1056-6, Orlando, FL., USA, April 19-20, 2001, IEEE Computer Society.

Coleman, J.N.; Chester, J.E.; Softley, C. & Kadlec, J. (2000). Arithmetic on the European Logarithmic Microprocessor, IEEE Transactions on Computers, Special Edition on Computer Arithmetic, Vol. 49, No. 7, pp.702-715, July 2000, ISSN: 0018-9340.

Jurca, L. & Maranescu, V. (2004). A New Way to Build a Very Fast Binary Adder, Scientific Bulletin of the Politehnica University of Timisoara. Transactions on Electronics and Communications, Tom 49 (63), Fascicola 1, 2004, pp. 193-198, ISSN: 1583-3380.

Jurca, L.; Gontean, A.; Alexa, F. & Curiac, D.I. (2007). Proposal to Improve Data Format Conversions for a Hybrid Number System Processor, Proceedings of the 11th WSEAS International Conference on COMPUTERS, Agios Nikolaos, Crete Island, Greece, July 26-28, 2007, ISSN: 1790-5117, ISBN: 978-960-8457-95-9.

Lai, F. (1993). The Efficient Implementation and Analysis of a Hybrid Number System Processor, IEEE Transactions on Circuits and Systems, Part II, Vol. 40, No. 6, June 1993, pp. 382-392, ISSN 1057-7130.

Nikmehr, H.; Phillips, B. & Lim, C.C. (2007). A Fast Radix-4 Floating-Point Divider with Quotient Digit Selection by Comparison Multiples, The Computer Journal, Vol. 50 Issue 1, pp.81-92, Jan. 2007, Oxford University Press. ISSN: 0010-4620.