MP3 decoding IMDCT hardware accelerator solution

This article refers to the address: http://

Abstract It is more effective to improve the speed of IMDCT operation by adding a small dedicated circuit inside the system-on-chip, which is responsible for processing the IM-DCT part. According to the characteristics of the embedded system, this paper introduces a new IMDCT transform algorithm to optimize the superposition operation in the IMDCT operation process, which provides a solution for implementing the IMDCT hardware accelerator at high speed and low cost.
Keywords MP3 IMDCT hardware accelerator decoding


MPEG-l/2 Audio Layer 3 (MP3 for short) is a lossy compression algorithm designed specifically for music and voice data. Driven by market demand, more and more embedded systems will support MP3 applications. Therefore, how to realize the real-time decoding of MP3 with limited computing power in embedded systems has become a problem worthy of attention. The MP3 decoding algorithm flow mainly includes: frame synchronization and sideband information decoding, Huffman decompression, inverse quantization, stereo decoding, anti-aliasing, IMDCT and sub-band synthesis operations. The data obtained by the decoding software efficiency evaluation shows that the calculation amount of the IMDCT process accounts for 19% of the total decoding operation, and it is envisaged to add a small dedicated circuit inside the system-on-chip to deal with the IMDCT part of the MP3 decoding process. Operation. We call this small dedicated circuit "IMDCT hardware accelerator". This part of the code is implemented in hardware and performs these operations at hardware speed, which can effectively improve the decoding performance of the system.
In this paper, a hardware and hardware co-design method is proposed, and a hardware accelerator scheme for IMDCT operation is proposed. The purpose is to optimize the two factors of speed and cost in the MP3 decoding process.


1 IMDCT operation
    In the MP3 decoding algorithm, the inverse modified cosine transform IMDCT (Inverse Modified Discrete Cosine Transform) is used to transform the input data from the frequency domain to the cosine domain, and the sub-band filtering is compensated, as follows:


Where: n is 36 in the long window type frame; n is 12 in the short window type frame.
After the IMDCT transformation is completed, the result xi must be multiplied by the window function Wi. The window function is determined by the value of the bllk_type bit in the sideband information. The long window type fork can be further divided into three subtypes: aormal, start, and stop according to the frame header definition, and the IMDCT transform obtains 36 result data; and in the short window type, the MP3 decoder performs 3 times IM-DCT transformation to generate 12 After outputting the results and then superimposing each other on the zeros, 36 data is also obtained, similar to the output of the long window type. The 36 data are superimposed with the previous result to obtain 18 final output results of the IMDCT transformation. The audio data is divided into mono and dual channels, including several particles; each particle has 576 data items, including a total of 32 data blocks, which need to be separately IMDCT transformed. After a particle transformation is completed, the 18 sub-bands obtained after a relatively simple frequency inversion (each sub-band contains 32 data) can be used as input information for sub-band synthesis. We refer to the entire process from 18 input data to 18 output data (including IMDCT transformation, data windowing, superposition) as "IMDCT operation", as shown in Figure 1. The IMDCT hardware accelerator discussed in this article is a dedicated circuit that implements this part of the functionality.

2 IMDCT transform algorithm selection
    In the ISO standard decoding code, the algorithm of the IMDCT transform is not optimized. When n=12 in equation (1), it takes 72 multiplications and 66 additions to do an IMDCT. When n=36, it takes 648 multiplications and 630 additions to do an IMDCT. It can be seen that the IMDCT transformation takes up a lot of CPU time and becomes one of the main performance bottlenecks in the MP3 decoding process. Making IMDCT a hardware acceleration module also requires a faster IMDCT algorithm to further increase speed. A new type of IMDCT algorithm is introduced here. Using this algorithm, when n=12 in equation (1), it takes only 13 multiplications and 39 additions to do an IMDCT. When n=36, it takes 47 multiplications and 165 additions to do an IMDCT (see references). ). At the same time, the cosine lookup table is used instead of the actual cos() function operation to speed up the conversion speed of cos 36 under the long window and cos 12 under the short window. After the IMDCT algorithm is improved, the operation process is simplified, and the number of multiplications is greatly reduced, thereby improving system performance.


3 The optimization speed and cost of the overlay operation are the two major elements of the design. The area of ​​the IMDCT hardware unit should be tightly controlled, taking into account hardware costs. Here, an optimization algorithm for superposition operation is proposed, by which 2×31×18 words of storage circuit units can be saved. The optimization algorithm is described in detail below.
The main data structure of the IMDCT operation is as follows:


It can be seen that for two-channel stereo data (stereo=2), the common algorithm is to perform IMDCT operation on 2 particles (2 × 32 × 18 data items in total) of the previous block, and save 2 × 32 × 18 The upper 18 items of the word are subjected to IMDCT operation on 2×32×18 data items of the next block, and the lower 18 items of 2×32×18 words obtained are compared with the previously saved previous block. The high 18 items of data are superimposed and the output is obtained. When the data superimposing part of the IMDCT operation is implemented in hardware, a storage circuit of size 2 × 32 × 18 words (prevbuf [2] [32] [18]) is required to store the upper 18 items of data for the next step. Superimposed operation. For embedded SoCs, reducing the need for memory circuits means reducing chip area. Therefore, the algorithm is improved by the method of software and hardware co-design, and the amount of IMDCT transform output data is relatively reduced by changing the input order of the data required for the IMDCT operation. In this way, the memory unit of the hardware accelerator is effectively reduced, and the circuit area is reduced.
First, in the decoding software, the anti-aliasing operation of 4 pieces of granular data in 2 blocks is completed, and the result is stored in a continuous memory area. The specific practices are as follows:


Then, using the method of interleaving the data, the sub-bands that are in the same position of the block and need to be superimposed are first subjected to IMDCT operation. In this way, the prevbuf storing the intermediate result can be reduced to 2×18 words, which greatly reduces the demand for the memory unit and reduces the circuit area. The specific implementation is as follows:

4 hardware implementation As can be seen from Figure 1, the IMDCT operation mainly includes three parts: IMCCT transformation, data windowing operation and superposition operation. The IMDCT transform part mainly multiplies and accumulates the data IN obtained by the anti-aliasing operation and the cos coefficient, and puts the final result into the register SUM; the windowing operation multiplies the calculated SUM by the windowing coefficient Wi; The superimposed part is to superimpose the data after the windowing operation. The overall structure is shown in Figure 2.

As can be seen from Figure 2, through a reasonable process design and the use of multiple selectors (MUX), the entire hardware accelerator requires only one multiplier and one adder, greatly reducing the cost of implementing IMDCT operations through hardware.


Conclusion This paper introduces a new IMDCT transform algorithm to optimize the superposition operation in the IMDCT operation process, speeds up the overall operation speed, reduces the demand for memory cells, and provides a solution for implementing IMDCT hardware accelerators at high speed and low cost. .

Wind Generator Pole

Wind Generator,Power Pole,Polygonal Pole

Futao Metal Structural Unit Co., Ltd. , http://www.yxsteelpole.com