MPEG-4 shape error concealment based on watermark technology

This article refers to the address: http://
l Introduction
With the rapid development of network and multimedia technologies, people have put forward new requirements for video applications. The content-based interactive coding standard MPEG-4 is proposed to meet this requirement. The MPEG-4 video coding standard is oriented to content coding, and video data is compressed, transmitted, edited, and retrieved in a content-based manner. The main difference from the previous video coding standards is the concept of the object. The input video is no longer a pixel, but A video object that implements all the functions of traditional encoding with the video object as the unit of operation. A video object consists of a spatio-temporal relationship, but the foreground and background objects of the scene are independently encoded. As shown in Figure 1, there are two basic methods for composing video scenes based on video objects. Each scene can be separated directly from the video sequence. The composition of the video objects ((a) separate scenes) can also be combined with existing video objects ((b) composing scenes). In addition, it is also possible that the scene is composed of the above two methods.
MPEG-4 video sequences are interpreted and processed according to video objects defined by motion information, texture information, and shape information. MPEG-4 video packets are usually encoded based on data separation mode. Shape information and motion information are independent of texture information and are transmitted separately. If the texture information is lost, texture error concealment can be performed using the correctly decoded shape information and motion information. If the shape and motion information are lost, the entire video package is discarded.
Figure
1 Separation, composition of the composition of the scene
The shape information is represented by an Alpha mask plane, defined by a binary value (1 means opaque, 0 means transparent), or defined by gray level (pixel transparency is between 0 and 255, 1 means opaque, 0 means Transparent). Typically, a binary mask plane is used, and each pixel position of the video object is defined as completely transparent or completely opaque. The binary shape information is sensitive to errors occurring on the network, and it is easy to cause error diffusion, which will affect the decoding of video objects of consecutive frames. The existing texture and motion information error concealment techniques are all correctly obtaining shape information. Based on [1], this shows that shape error concealment is necessary.
2 shape error concealment technology review
The MPEG-4 coding standard proposes error concealment techniques such as insertion of synchronization codes, data segmentation, and invertible long coding. But these technologies are not satisfactory for today's communications . With the development of error concealment technology, and the shape error concealment has gradually attracted the attention of many experts, some error concealment techniques about shapes have also been proposed. Some shape error concealment techniques in recent years have been proposed in the literature [2-7].
The proposed techniques are nothing more than error concealment according to the natural attributes of the image, error concealment in the spatial domain and error concealment in the time domain. The spatial domain is mainly for the shape information of the video object of the I frame, while the main domain of the time domain Is the shape information of the video object for the P frame and the B frame. The error concealment technique in the time domain is also correctly decoded based on the shape information of the I frame, so the spatial error concealment is more meaningful. The method proposed in [2] is to use the maximum a posteriori estimation (MAP) model of adaptive Markov domain to pre-estimate the image. Markov is designed for binary shape information, and the parameters are based on adjacent blocks. The information is adaptively determined. Experiments show that this method can recover the shape information of shape loss very accurately. Compared with the median filtering method, the proposed method can recover 20% of lost information and obtain better objective quality. The literature [3, 4] is a simpler curve interpolation method than the adaptive Markov method. The characteristics of the Hermite curve and the Bayesian curve are used to hide the boundary error block according to the spatial continuity of the image. Literature [5-7] uses time and motion information for error concealment.
These methods all fix the error block on the decoder side, and also achieve good results, but the error rate of these methods is limited. Once a very serious error occurs, a large packet loss rate is It is difficult to accurately restore the correct information. Not only that, if the loss is the details of the use of the characteristics of the curve and can not restore the information very accurately, these are very detrimental to the decoding of the video object, and if the shape information of the I frame is not restored, then the use of time domain error concealment technology I also don't get the desired results.
3 algorithm of this paper
A novel data hiding based approach is proposed for this problem. This method is inspired by the digital watermarking technology. Digital watermarking is a kind of information hiding technology. It is widely used in copyright issues such as image, video and audio. It has transparency, robustness and provability. Therefore, digital watermarking technology is increasingly used in content authentication. Other areas. This paper combines the characteristics of digital watermarking with the error concealment technique of shape, which is also the main innovation of this paper. This paper mainly focuses on the error concealment of the I frame of the scene video. The main idea is to generate the watermark information to be embedded according to the shape information, and the background object with less attention is used as the embedded host.
The digital watermark is divided into time domain/space domain watermark and frequency domain/transform domain watermark according to the embedded process. In general, the frequency domain watermark has stronger robustness and transparency than the time domain watermark. This article uses these two different implementation methods. The two methods are described separately below.
3.1 Using frequency domain watermark embedding method
The frequency domain method proposed in this paper is in the DCT transform domain. The specific implementation method is as follows:
(1) First, the binary mask image is sampled and reduced to 1/4 of the original image. According to the principle of digital watermarking technology, the greater the amount of embedded information, the worse the transparency. The purpose of this is to not affect the objective quality of the host image too much.
(2) Secondly, the host image is selected. In this paper, the background object is selected as the host. The background object of the general video can be converted into three components of RGB. According to the research, the green component has strong robustness to lossy compression [8]. In order to completely embed the binary mask, the background object separating the scenes is also interpolated, using the simplest horizontal interpolation method, using the average of two non-zero values â€‹â€‹adjacent to the zero pixel of each row to zero. The value pixel bits are filled. The filled background image is used as the final host image.
(3) Based on the previous two steps, this paper chooses to embed the watermark information into the frequency domain information of the host image, divide the background image into 2Ã—2 image blocks, perform DCT transform on each block, and embed the watermark into DCT. In the intermediate frequency coefficient of the coefficient, the value of the watermark is directly used instead of the selected intermediate frequency coefficient.
(4) Finally, the watermark is extracted, and the binary mask image is restored. The extraction is the inverse process of embedding. The received background image is directly divided into 2Ã—2 image blocks, DCT transform is performed on each block, and the selected intermediate frequency coefficient is directly extracted, and the extracted binary image is enlarged. Up to 4 times the original, so that the restored binary mask image is obtained.
3.2 Using spatial domain watermark embedding method
The watermarking algorithm used in this paper is in X. Based on the algorithm proposed by Kang et al. [9], the shape information as the watermark is firstly the original mask binary image without any change. Methods as below:
(1) The obtained watermark image to be embedded is applied to the equation (1) for embedding.
Where rhod is a modulo operation, [Î±/4, 3Î±/4] is the best pair of parameters, he guarantees both 0 and 1
With an equal maximum decision range, the difference between f' and / after the embedded watermark is between [a 0.5Î±, 0.5Î±]. When w(m,n)=1, f(m,n)mod Î±=3Î±/4; when Ï‰(m,n)=0, f(m/n)modÎ±=Î±/4. Therefore, when f* satisfies f*(m,n)mod Î±>Î±/2 when extracting the watermark, the extracted watermark value is Ï‰*(m,n)=l, otherwise it is 0.
(2) The watermark embedding algorithm used in this paper belongs to blind watermark detection, and the extraction process does not require the participation of the original carrier image. The embedded watermark data Ï‰*(m, n) is extracted according to the formula (2).
The extracted watermark data Ï‰*(m,n) is the restored mask binary image. This watermark embedding method will have a certain degree of loss to the image quality of the background, and the degree of loss is related to the parameter Î± selected when embedding the watermark. But the extraction of the watermark is also related to this parameter. The r can correctly recover the watermark value, and the absolute error of f* and / (caused by image distortion) must be less than Î±/4, and the proper selection of the parameter Î± here can well compromise the transparency of the watermark and The contradiction between robustness. The loss of the Î± large image is large, and the small is not conducive to the robustness of the watermark. According to the simulation experiment test, the value of a here is 20. According to the experimental comparison, the loss of image quality when the Î± parameter is 20 is still tolerated by the town.

4 simulation results
The algorithm proposed in this paper uses Matlab simulation tool to experiment on the 50th frame image of the classic video sequence "Foreman" according to the above method. The original video frame is shown in Fig. 2(a), Fig. 2(b) is the shape information of the frame image MPEG-4 encoding, and Fig. 2(c) is the error shape information obtained by the receiving end due to an error in transmission.
As already explained in the second part, the existing method is likely to be a good solution to this situation. If a very serious transmission error or a high packet loss rate occurs, the shape information will be severely damaged, affecting the correct decoding of the video object. Using the two algorithms proposed in this paper, the shape information as the watermark is extracted from the received background green component. The premise of this method is the correct transmission of the background. Figure 3 (a) and Figure 3 (b) are the effect of the video background object correctly decoded at the decoding end after embedding in the frequency domain method, and the correct extraction of the restored binary mask, Figure 3 (c) and Figure 3 (d) is an effect map embedded in the spatial domain method and a binary mask map extracted and restored.
The similarity measure is used to compare the restored mask map with the original mask map to see how the effect of this paper is. The similarity measure is equation (3). The experimental results show that the similarity between the mask pattern recovered by the frequency domain method and the spatial domain method is very close to that of the original image. The effect of the spatial domain method is better than that of the frequency domain method. The similarity of the background image of the spatial domain method is also slightly better than that of the frequency domain method. The signal-to-noise ratio of the background image after embedding by the frequency domain method is 31.14, and the signal-to-noise ratio of the background image after embedding by the spatial domain method is 34.88. In comparison, the airspace method proposed in this paper is better than the frequency domain method. People's requirements for video pictures are often lower than those of still pictures, and people pay less attention to video background objects in object-based coding, so the visual loss of video background objects can be tolerated.
5 Conclusion
The proposed algorithm is a novel shape information error concealment method combined with digital watermarking. If the shape information loss of the receiving end is serious, especially the shape mask of I-VOP, if it is damaged, the subsequent VOP prediction damage will be severe. According to the two watermark-based algorithms proposed in this paper, a mask with a high degree of similarity to the original mask can be obtained, and an object with no approximation and no loss can be obtained. However, the algorithm still has some shortcomings: the method is based on the premise that the background area has no loss or a certain packet loss rate, and there is a certain quality loss to the image of the background area. Trying to find a more robust embedding method and a more tolerable embedding transparency is the next step in research.