Analysis of AVS-M and H.264 (Baseline) Video Decoder Structure

This article refers to the address: http://

H.264 is an international standard drafted by JVT organization covering multiple applications and for multiple transmission environments. It specifies three grades, Baseline profile, Main profile and Extended profile. The basic level uses I and P slices to support intra and inter coding, and supports entropy coding (CAVLC) using context-based adaptive variable length coding, mainly for conference television, video telephony, and wireless communication. Real-time video communication.


The AVS (Audio video coding) standard is an audio and video codec technical standard independently developed by China and possessing independent intellectual property rights. AVS-M (Mobile video) is the seventh part of the AVS series of standards - mobile video for digital storage media, broadband video services, remote monitoring and video telephony.


AVS-M and H.264 (Baseline) video decoders are very similar in structure, but each has its core idea. This paper analyzes the key technologies of the two decoders based on JM10.2 and WM3.3 source code. There is no separate H.264 (Baseline) code in the program, which is extracted from JM10.2. The following H.264 indicates that it is considered from the basic level.

Two standard decoder structures


Both AVS-M and H.264 video decoders only need to consider I and P frames (for better comparison of two standard video decoders, this article only considers frame images for H.264, regardless of field image), overall The idea is to solve the header information from the bit stream, and generate a prediction block. The quantized coefficients obtained by entropy decoding are inverse quantized and inversely transformed to obtain a residual block. After the prediction block and the residual block are added, the reconstructed image can be obtained through the filter. Its structural block diagram is shown in Figure 1.

Figure 1 decoder overall framework

In practical applications, the decoders used in these two standards have different application fields. Now we analyze the differences between the two standards in the following aspects.

1 bit stream information


1NALU (Network Abstract Layer Unit): The bit streams in both standards are in NAL units, and each NAL unit contains one RBSP. The header information of the NALU defines the type of the RBSP. Types generally include sequence parameter set (SPS), image parameter set (PPS), enhanced information (SEI), slice (Slice), etc., where SPS and PPS belong to the parameter set, and the two standards adopt the parameter set mechanism in order to be important. The sequence and image parameters (decoded image size, number of slices, reference frame number, quantization and filter parameter markers, etc.) are separated from other parameters and decoded first by the decoder. In addition, in order to enhance the sharpness of the image, the AVS-M adds picture head information. In the process of reading the NALU, there is a start code 0x000001 before each NALU. To prevent the internal 0x000001 sequence from competing, the H.264 encoder inserts a new byte - 0x03 before the last byte, so the decoder detects In this sequence, 0x03 needs to be deleted, and AVS-M only needs to recognize the start code 0x000001.


2 Read macroblock type (mb type) and macroblock coding template (cbp): The codec image is divided by macroblocks, one macroblock consists of a 16*16 luma block and a corresponding 8*8cb and an 8*8cr chroma Block composition.


(a) The division of macroblocks between two standard intra and inter predictions is different. In H.264, the I_slice luminance block has two modes: Intra_4*4 and Intra_16*16, and the chroma block has only 8*8 mode; the P_slice macroblock is divided into 16*16, 16*8, 8*16, 8*8, There are 7 modes in 8*4, 4*8 and 4*4. In AVS-M, the I_slice luminance block has two modes of I_4*4 and I_Direct, and the division of the macroblock in P_slice is consistent with the division in H.264.


(b) The calculation of the cbp values ​​of the two standard macroblocks is also different. In H.264, the luminance (chrominance) cbp of the Intra_16*16 macroblock is directly obtained by reading the mb type; the luminance of the non-Intra_16*16 macroblock is cbp=coded_block_pattern%16, and the chrominance cbp=coded_block_pattern/16. Wherein, the lowest 4 bits of the luminance cbp are valid, and each bit determines whether the residual coefficient of the corresponding macroblock is 0; when the chroma cbp is 0, the corresponding residual coefficient is 0, and when the cbp is 1, the DC residual coefficient is not 0. When the AC coefficient is 0 and cbp is 2, the DC and AC residual coefficients are not zero. In AVS-M, when the macroblock type is not P_skip, the index value of cbp is obtained directly from the code stream, and the codenum value is obtained by looking up the table with the index value, and then the intra/interframe cbp is obtained by looking up the table by codenum. The cbp is 6 bits, and each bit represents a non-zero coefficient when the macroblock is divided by 8*8. When the transform coefficient is not 0, the value of each bit in cbp_4*4 needs to be further read to determine 4 in an 8*8 block. Whether the coefficient of 4*4 blocks is 0.

2 intra prediction


Intra_16*16 luma block and 8*8 chroma block in H.264 have 4 prediction modes (vertical, horizontal, DC, plane), and there are only 3 8*8 chroma blocks in AVS-M (vertical, horizontal, DC), 4*4 luma blocks in Intra_4*4 and AVS-M in H.264 have 9 prediction modes, but the order is different. The modes of 4*4 luma blocks in Intra_4*4 and AVS-M in H.264 can be predicted by the intra mode of the neighboring block, and the prediction methods are different. In H.264, the most probable mode of the current luma block is determined by the smaller of the left block (A) and the upper block (B). If the neighboring block does not exist, the mode of the A and B modules is set to DC. The prediction mode is selected by looking at the flag information prev_intra4*4_pred_mode in the code stream. When the flag is 1, the most probable mode is used. When the flag is not 0, the parameter rem_intra4*4_pred_mode is also required, if it is less than the most probable mode. , the prediction mode is rem_intra4*4_pred_mode, otherwise it is rem_intra4*4_pred_mode+1. In AVS-M, the prediction mode of the left block (A) and the upper block (B) (if it is not present, set to -1), after forming a table, the most probable mode of the current block can be obtained by looking up the table. For I_Direct, the prediction mode is the most probable mode; for I_4*4, the flag information pred_mode_flag is required. When the flag is 0, the prediction mode is the most probable mode; when the flag is 1, when the code stream is read, The intra_luma_pred_mode is smaller than the most probable mode, and the prediction mode is the most probable mode. Otherwise, the prediction mode is intra_luma_pred_mode+1. In addition, the Intra_16*16 and chrominance prediction modes in H.264 are read from the code stream, and the chrominance prediction mode of AVS-M is also read from the code stream.

3 inter prediction


The motion vector of the luma block under the two criteria is equal to the predicted motion vector (MVPred) plus the motion vector difference (MVD) read in the bitstream. Since the luminance MV precision is 1/4 pixel and the chromaticity precision is 1/8 pixel, the motion vector of the chrominance block is equal to twice the luminance block. The spatial positions of the current luminance block E and the adjacent blocks A, B, C, and D of AVS-M and H.264 are as shown in Figs. 2 and 3, respectively. The size of E can be 16*16, 16*8, 8*16, 8*8, 8*4, 4*8 or 4*4. Obviously, in AVS-M, A is the block immediately adjacent to the lower left corner sample of E, the block immediately adjacent to the upper left corner sample of B, D and E, and C is the block immediately adjacent to the upper right corner sample of E. In H.264, A is the block immediately adjacent to the upper left corner sample of E, the block immediately adjacent to the upper left corner sample of B, D and E, and C is the block immediately adjacent to the upper right corner sample of E.

Figure 2 AVS-M prediction block neighbor block position

Figure 3 H.264 prediction block neighbor block position

4 Entropy decoding


In H.264, variable-length coding (CAVLC) based on upper and lower adaptive is adopted. The principle is as follows: 4*4 block residual data is integer-transformed and quantized. The non-zero coefficients are mainly concentrated in the low-frequency part, and the high-frequency coefficients are mostly zero. And the non-zero coefficient values ​​at the high frequency position are mostly +1 and -1. AVS-M entropy coding also uses variable length coding techniques. In the AVS-M entropy coding process, all syntax elements and residual data are mapped into binary bit streams in the form of exponential Golomb codes.

5 loop filtering


Both standards use block-based residual coefficients inverse transform and inverse quantization. The quantization process is relatively rough. The inverse quantized restored transform coefficients must bring errors. On the other hand, the motion compensation block may come from interpolated sample blocks at different frame positions. Will cause the boundary to be discontinuous, so loop filtering is needed to eliminate the distortion caused by the block prediction error. In H.264, the filter strength is determined according to the neighboring block mode, the reference index, the motion vector, and the decoding block. The filter strength parameter Bs is 0 to 4, and the Bs is 1 to 3 using a 4-tap filter, and for Bs is 4 A 6-tap filter is used. The filters in H.264 can be adapted to the needs of slice, boundary and sample levels. In the AVS-M, the intra (inter) filter is selected according to the intra macro (inter) coding macroblock according to the current macroblock. When the filtering condition is satisfied, the boundary is first vertical and then horizontally filtered by the 4-tap filter. Compared with H.264, the AVS-M filter has fewer pixels and weaker intensity, but can greatly reduce the filtering time while eliminating the square effect.


Application prospects


Live satellite TV and HDTV are among the fastest growing projects. Interacting with the whole machine manufacturer, the AVS standard group can get feedback from the actual application, and timely modify and improve the standard, implementation algorithm, software, IP core, and dedicated chip, so as to truly adapt to the needs of the industry.


The AVS encoder in the satellite TV experimental system project hosted by SVA has adopted the implementation of "transcoder + DSP". The system is highly compatible with MPEG-2, which is due to the fact that MPEG-2 has more programs, so AVS has a fusion-substitution process with it. Other supporting systems such as encryption, user management, charging system, and editing system remain unchanged.

to sum up


From the above analysis, the two video decoder structures have much in common. At present, there are many methods for optimizing H.264 video decoders, hardware migration methods, and applications thereof, which can also be adopted by AVS-M. This paper has made software optimization for AVS-M according to some methods in the related literature of optimizing H.264. The optimization on the algorithm mainly includes the optimization of interpolation, loop filtering and entropy decoding. When interpolating, pixels can be divided into internal pixels and boundary pixels to avoid repeated judgment. Each of the 4*4 blocks has the same boundary threshold for loop filtering, and the corresponding filtering operation can be completed four times. When entropy decoding, the conversion program can be reduced by rebuilding the table. The optimization on the code mainly includes program structure optimization, loop expansion, data type selection and data movement. For example, in the Decode_one_macroblock function, different functions can be used to decode according to the macroblock type, and the temporary buffer area can be simplified. In addition, functions involving interpolation operations (which can be converted into matrix operations), inverse quantization, and inverse transformations, which involve matrix operations, can be optimized using the MMX/SSE instruction set.


H.264 is an international standard, and AVS-M is a self-developed standard. Therefore, the emergence of two standard-compatible video decoders is inevitable. The next work of this subject is to combine the two sets of codes on the basis of the comparative analysis of existing structures, and reuse the same parts of the two decoder structures. Different parts are selected by switches to realize the code streams of two different formats. Identify and decode in real time

MC01 Nail Dryer

Nail Dryers,Toe Nail Dryers,Hair Care Product

Hair Curler Hair Straightener Co., Ltd. , http://www.nshaircurler.com

Posted on