Introduction to HEVC
High Efficiency Video Coding (HEVC) is a video compression standard, a successor to H.264/MPEG-4 AVC (Advanced Video Coding), currently under joint development by the ISO/IEC Moving Picture Experts Group (MPEG) and ITU-T Video Coding Experts Group (VCEG) as ISO/IEC 23008-2 MPEG-H Part 2 and ITU-T H.HEVC. MPEG and VCEG have established a Joint Collaborative Team on Video Coding (JCT-VC) to develop the HEVC standard. HEVC is said to improve video quality, double the data compression ratio compared to H.264/MPEG-4 AVC, and can support 8K UHD and resolutions up to 8192×4320.
HEVC Coding Efficiency
(Source: Wikipedia / IEEE)
The design of most video coding standards is primarily aimed at having the highest coding efficiency. Coding efficiency is the ability to encode video at the lowest possible bit rate while maintaining a certain level of video quality. There are two standard ways to measure the coding efficiency of a video coding standard which is to use an objective metric, such as peak signal-to-noise ratio (PSNR), or to use subjective assessment of video quality. Subjective assessment of video quality is the most important way to measure a video coding standard since humans perceive video quality subjectively.
HEVC benefits from the use of larger Coding Tree Block (CTB) sizes. This has been shown in PSNR tests with a HM-8.0 HEVC encoder where it was forced to use progressively smaller CTB sizes. For all test sequences when compared to a 64×64 CTB size it was shown that the HEVC bitrate increased by 2.2% when forced to use a 32×32 CTB size and increased by 11.0% when forced to use a 16×16 CTB size. In the Class A test sequences, where the resolution of the video was 2560×1600, when compared to a 64×64 CTB size it was shown that the HEVC bitrate increased by 5.7% when forced to use a 32×32 CTB size and increased by 28.2% when forced to use a 16×16 CTB size. The tests showed that large CTB sizes become even more important for coding efficiency with higher resolution video. The tests also showed that it took 60% longer to decode HEVC video encoded at 16×16 CTB size than at 64×64 CTB size. The tests showed that large CTB sizes increase coding efficiency while also reducing decoding time.
The HEVC Main Profile (MP) has been compared in coding efficiency to H.264/MPEG-4 AVC High Profile (HP), MPEG-4 Advanced Simple Profile (ASP), H.263 High Latency Profile (HLP), and H.262/MPEG-2 Main Profile (MP). The video encoding was done for entertainment applications and twelve different bitrates were made for the nine video test sequences with a HM-8.0 HEVC encoder being used. Of the nine video test sequences five were at HD resolution while four were at WVGA (800×480) resolution. The bit rate reductions for HEVC were determined based on PSNR.
HEVC MP has also been compared to H.264/MPEG-4 AVC HP for subjective video quality. The video encoding was done for entertainment applications and four different bitrates were made for nine video test sequences with a HM-5.0 HEVC encoder being used. The subjective assessment was done at an earlier date than the PSNR comparison and so it used an earlier version of the HEVC encoder that had slightly lower performance. The bit rate reductions were determined based on subjective assessment using mean opinion score values. The overall subjective bitrate reduction for HEVC MP compared to H.264/MPEG-4 AVC HP was 49.3%.
École Polytechnique Fédérale de Lausanne (EPFL) did a study to evaluate the subjective video quality of HEVC at resolutions higher than HDTV. The study was done with three videos with resolutions of 3840×1744 at 24 fps, 3840×2048 at 30 fps, and 3840×2160 at 30 fps. The five second video sequences showed people on a street, traffic, and a scene from the open source computer animated movie Sintel. The video sequences were encoded at five different bitrates using the HM-6.1.1 HEVC encoder and the JM-18.3 H.264/MPEG-4 AVC encoder. The subjective bit rate reductions were determined based on subjective assessment using mean opinion score values. The study compared HEVC MP with H.264/MPEG-4 AVC HP and showed that for HEVC MP the average bitrate reduction based on PSNR was 44.4% while the average bitrate reduction based on subjective video quality was 66.5%.
(Source: Wikipedia / IEEE)
HEVC was designed to substantially improve coding efficiency compared to H.264/MPEG-4 AVC HP, i.e. to reduce bitrate requirements by half with comparable image quality, at the expense of increased computational complexity. Depending on the application requirements HEVC encoders can trade off computational complexity, compression rate, robustness to errors, and encoding delay time. Two of the key features where HEVC was improved compared to H.264/MPEG-4 AVC was support for higher resolution video and improved parallel processing methods.
HEVC is targeted at next-generation HDTV displays and content capture systems which feature progressive scanned frame rates and display resolutions from QVGA (320×240) to 4320p (8192×4320), as well as improved picture quality in terms of noise level, color gamut, and dynamic range.
HEVC video coding layer
The HEVC video coding layer uses the same “hybrid” approach used in all modern video standards, starting from H.261, in that it uses inter-/intra-picture prediction and 2D transform coding. A HEVC encoder first proceeds by splitting a picture into block shaped regions for the first picture, or the first picture of a random access point, which uses intra-picture prediction. Intra-picture prediction is when the prediction of the blocks in the picture is based only on the information in that picture. For all other pictures inter-picture prediction is used in which prediction information is used from other pictures. After the prediction methods are finished and the picture goes through the loop filters the final picture representation is stored in the decoded picture buffer. Pictures stored in the decoded picture buffer can be used for the prediction of other pictures.
HEVC was designed with the idea that progressive scan video would be used and no coding features are present specifically for interlaced video. HEVC instead sends meta-stream data that tells how the interlaced video is sent. Interlaced video may be sent either by coding each field as a separate picture or by coding each frame as a different picture. This allows interlaced video to be sent with HEVC without needing special interlaced decoding processes to be added to HEVC decoders.
HEVC coding tools
HEVC prediction block size
HEVC replaces macroblocks, which were used with previous standards, with a new coding scheme that uses larger block structures of up to 64×64 pixels and can better sub-partition the picture into variable sized structures. HEVC initially divides the picture into coding tree units (CTUs) which are then divided for each luma/chroma component into coding tree blocks (CTBs). A CTB can be 64×64, 32×32, or 16×16 with a larger block size usually increasing the coding efficiency. CTBs are then divided into coding units (CUs). The arrangement of CUs within a CTB is known as a quadtree since a subdivision results in four smaller regions. CUs are then divided into prediction units (PUs) of either intra-picture or inter-picture prediction type which can vary in size from 64×64 to 4×4 (prediction units coded using 2 reference blocks, known as bipredictive coding, are limited to 8×4 or 4×8 so as to save on memory bandwidth). The prediction residual is then coded using transform units (TUs) which contain coefficients for spatial block transform and quantization. A TU can be 32×32, 16×16, 8×8, or 4×4.
At the July 2012 HEVC meeting it was decided, based on proposal JCTVC-J0334, that HEVC level 5 and higher would be required to use CTB sizes of either 32×32 or 64×64. This was added to HEVC in the Draft International Standard as a level limit for the Log2MaxCtbSize variable. Log2MaxCtbSize was renamed CtbSizeY in the October 2012 HEVC draft.
HEVC internal bit depth increase
Internal bit depth increase (IBDI) allows for pictures to be internally processed at a bit depth that is higher than the bit depth they are encoded at. IBDI can be done at up to 14-bits and is processed at that bit depth up until the point where the pictures are fed into the loop filters.
HEVC parallel processing tools
- Tiles allow for the picture to be divided up into a grid of rectangular regions that can independently be decoded/encoded and the main purpose of tiles is to allow for parallel processing. Tiles can be independently decoded and can even allow for random access to specific regions of a picture in a video stream.
- Wavefront parallel processing (WPP) is when a slice is divided into rows of CTUs in which the first row is decoded normally but each additional row requires that decisions be made in the previous row. WPP has the entropy encoder use information from the preceding row of CTUs and allows for a method of parallel processing that may allow for better compression than tiles.
- Slices can for the most part be decoded independently from each other with the main purpose of tiles being re-synchronization in case of data loss in the video stream. Slices can be defined as self-contained in that prediction is not made across slice boundaries. When in-loop filtering is done on a picture though information across slice boundaries may be required. Slices are CTUs decoded in the order of the raster scan and different coding types can be used for slices such as I types, P types, or B types.
- Dependent slices can allow for data related to tiles or WPP to be accessed more quickly by the system than if the entire slice had to be decoded. The main purpose of dependent slices is to allow for low delay video encoding due to its lower latency.
HEVC entropy coding
HEVC uses a context-adaptive binary arithmetic coding (CABAC) algorithm that is fundamentally similar to CABAC in H.264/MPEG-4 AVC. CABAC is the only entropy encoder method that is allowed in HEVC while there are two entropy encoder methods allowed by H.264/MPEG-4 AVC. CABAC in HEVC was designed for higher throughput. For instance, the number of context coded bins have been reduced by 8x and the CABAC bypass-mode has been improved in terms of its design to increase throughput. Another improvement with HEVC is that the dependencies between the coded data has been changed to further increase throughput. Context modeling in HEVC has also been improved so that CABAC can better select a context that increases efficiency when compared to H.264/MPEG-4 AVC.
HEVC intra prediction
HEVC specifies 33 directional modes for intra prediction compared to the 8 directional modes for intra prediction specified by H.264/MPEG-4 AVC. HEVC also specifies planar and DC intra prediction modes. The intra prediction modes use data from neighboring prediction blocks that have been previously decoded.
HEVC motion compensation
HEVC uses half-sample or quarter-sample precision with a 7-tap or 8-tap filter while in comparison H.264/MPEG-4 AVC uses half-sample precision and a 6-tap filter. For 4:2:0 video chroma is filtered with eighth-sample precision and a 4-tap filter while in comparison H.264/MPEG-4 AVC uses a 2-tap filter. Weighted prediction in HEVC can be either uni-prediction in which a single prediction value is used or bi-direction in which the prediction values from two prediction blocks are used.
HEVC motion vector prediction
HEVC defines a signed 16-bit range for both horizontal and vertical motion vectors (MVs). This was added to HEVC at the July 2012 HEVC meeting with the mvLX variables. HEVC horizontal/vertical MVs have a range of -32768 to 32767 which given the quarter pixel precision used by HEVC allows for a MV range of -8192 to 8191.75 luma samples. This compares to H.264/MPEG-4 AVC which allows for a horizontal MV range of -2048 to 2047.75 luma samples and a vertical MV range of -512 to 511.75 luma samples.
HEVC allows for two MV modes which are Advanced Motion Vector Prediction (AMVP) and merge mode. AMVP uses data from the reference picture and can also use data from adjacent prediction blocks. The merge mode allows for the MVs to be inherited from neighboring prediction blocks. Merge mode in HEVC is similar to “skipped” and “direct” motion inference modes in H.264/MPEG-4 AVC but with two improvements. The first improvement is that HEVC uses index information to select one of several available candidates. The second improvement is that HEVC uses information from the reference picture list and reference picture index.
HEVC inverse transforms
HEVC specifies four transform units (TUs) sizes of 4×4, 8×8, 16×16, and 32×32 to code the prediction residual. A CTB may be recursively partitioned into 4 or more TUs. TUs use integer basis functions that are similar to the discrete cosine transform (DCT). In addition 4×4 luma transform blocks that belong to an intra coded region are transformed using an integer transform that is derived from discrete sine transform (DST). This provides a 1% bit rate reduction but was restricted to 4×4 luma transform blocks due to marginal benefits for the other transform cases. Chroma uses the same TU sizes as luma so there is no 2×2 transform for chroma.
HEVC loop filters
HEVC specifies two loop filters that are applied in order with the deblocking filter (DBF) applied first and the sample adaptive offset (SAO) filter applied afterwards. Both loop filters operate during the inter-picture prediction loop.
HEVC deblocking filter
The DBF is similar to the one used by H.264/MPEG-4 AVC but with a simpler design and better support for parallel processing. In HEVC the DBF only applies to a 8×8 sample grid while with H.264/MPEG-4 AVC the DBF applies to a 4×4 sample grid. DBF uses a 8×8 sample grid since it causes no noticeable degradation and significantly improves parallel processing because the DBF no longer causes cascading interactions with other operations. Another change is that HEVC only allows for three DBF strengths of 0 to 2. HEVC also requires that the DBF first apply horizontal filtering for vertical edges to the picture and only after that does it apply vertical filtering for horizontal edges to the picture. This allows for multiple parallel threads to be used for the DBF.
HEVC sample adaptive offset
The SAO filter is applied after the DBF and is made to allow for better reconstruction of the original signal amplitudes by using offsets from a transmitted look up table. Per CTB the SAO filter can be disabled or applied in one of two modes: edge offset mode or band offset mode.The edge offset mode operates by comparing the value of a sample to two of its eight neighbors using one of four directional gradient patterns. Based on a comparison with these two neighbors, the sample is classified into one of five categories: minimum, two types of edges, maximum, or neither. For each of the first four categories an offset is applied. The band offset mode applies an offset based on the amplitude of a single sample. The sample is categorized by its amplitude into one of 32 bands. Offsets are specified for four consecutive of the 32 bands, because in flat areas which are prone to banding artifacts, samples amplitudes tend to be clustered in a small range. The SAO filter was designed to increase picture quality, reduce banding artifacts, and reduce ringing artifacts.