文章基本信息

标题：Recursive Algorithm and Systolic Architecture for the Discrete Cosine Transform
本地全文：下载
作者：Anjali Sahu ; S S Nayak
期刊名称：International Journal of Electronics and Computer Science Engineering
电子版ISSN：2277-1956
出版年度：2012
卷号：1
期号：4
页码：2481-2485
出版社：Buldanshahr : IJECSE
摘要：A novel VLSI architecture for computing the discrete cosine transform of variable length is proposed. By using some mathematical techniques, any general length DCT can be converted in to a recursive equation and this structure can be realized using software, hardware and VLSI techniques. This algorithm can be implemented using regular and parallel VLSI structures, so that the computational complexity is greatly reduced. It can also be extended to implement a two dimensional DCT in a straight forward way. I.INTRODUCTIONAmongst all the discrete orthogonal transforms, the discrete cosine transform (DCT) is most favourable for the compression of speech and image data. The performance of the DCT in data compression applications is comparable to the statistically optimal Karhunen-Loeve transform (KLT) [1] for the purpose of data compression, feature extraction and filtering applications. Therefore, several fast algorithms have been reported in the literature in the last two decades for computing the DCT in general - purpose computers [2-5]. Besides, several versions of the DCT such as even DCT (EDCT), odd DCT (ODCT), symmetric DCT (SDCT) and Hadamard-structured DCT (HDCT) etc., have been proposed by researchers with a view to achie ving optimality in performance or for the sake of computational simplicity [6-8]. Along with the growth of the integrated circuit technology, high performance application-specific dedicated processors are evolving ra pidly for digital signal processing applications [9, 10]. VLSI systems yield high throughput of results by maximising the processing concurrency, so that they provide less expensive and more suitable alternatives to general-purpose computers, for real-time a nd on-line applications. Systolic architectures are established as the most popular and dominant class of VLSI structures due to the simplicity of their processing elements (PE), modularity of their structure, regular and nearest-neighbour interconnections between the PEs, high level of pipelinability , small chip area and low power dissipation[11]. In systolic architecture, the desired data are pumped rhythmically in a regular interval across the PEs for yielding high throughput by fully pipelined processing. The fast algorithms for general purpose computers [2-5] are however not suitable for VLSI implementation due to global communication requirement. Therefore, the development of algorithms and systolic architectures for efficient VLSI implementation of the DCT is a subject of current interest. Cho and Lee [12] have suggested implementing the DCT on the existing VLSI structures for computing the discrete Fourier transform (DFT) [13] and prime factor DFT [14]. Chakrabarti and Ja' Ja' [15] have presented a systolic architecture to compute the DCT from the discrete Hartley transform (DHT). Gou et al. [16] have developed an algorithm and a systolic architecture for prime length DCT, using input/output data permutation and symmetry property of cosine Kernels for achieving high throughput and saving the hardware cost. Lee and Huang [17] have suggested systolic architectures for prime - factor DCT which involve relatively fewer area and time complexities compared with those of Gou et al. In [18], Goertzel proposed an algorithm to evaluate a finite trigonometric series and the algorithm requires only N multiplications and ~ 2N additions to obtain the results. This algorithm can easily be implemented to compute the discrete fourier transform and it also been extended to calculate the DCT [19]. The advantage of this algorithm is its regular structure and parallelism, which make it suitable for implementation using VLSI techniques. Chau and Siu proposed a novel formulation for parallel computation of DCT [20]. We present an efficient architecture, which is suitable for parallel computation and no permutation is required for the input / output sequences