scispace - formally typeset
Search or ask a question

Showing papers on "Inter frame published in 2019"


Proceedings ArticleDOI
01 Oct 2019
TL;DR: This work presents an inter-frame compression approach for neural video coding that can seamlessly build up on different existing neural image codecs and proposes to compute residuals directly in latent space instead of in pixel space to reuse the same image compression network for both key frames and intermediate frames.
Abstract: While there are many deep learning based approaches for single image compression, the field of end-to-end learned video coding has remained much less explored. Therefore, in this work we present an inter-frame compression approach for neural video coding that can seamlessly build up on different existing neural image codecs. Our end-to-end solution performs temporal prediction by optical flow based motion compensation in pixel space. The key insight is that we can increase both decoding efficiency and reconstruction quality by encoding the required information into a latent representation that directly decodes into motion and blending coefficients. In order to account for remaining prediction errors, residual information between the original image and the interpolated frame is needed. We propose to compute residuals directly in latent space instead of in pixel space as this allows to reuse the same image compression network for both key frames and intermediate frames. Our extended evaluation on different datasets and resolutions shows that the rate-distortion performance of our approach is competitive with existing state-of-the-art codecs.

162 citations


Journal ArticleDOI
TL;DR: This work proposes a novel and efficient video eye fixation detection model that outperforms all 11 state-of-the-art methods across a number of publicly available datasets by combining the memory Information on the time axis with the motion information on the space axis while storing the saliency information of the current frame.
Abstract: Data-driven saliency detection has attracted strong interest as a result of applying convolutional neural networks to the detection of eye fixations. Although a number of image-based salient object and fixation detection models have been proposed, video fixation detection still requires more exploration. Different from image analysis, motion and temporal information is a crucial factor affecting human attention when viewing video sequences. Although existing models based on local contrast and low-level features have been extensively researched, they failed to simultaneously consider interframe motion and temporal information across neighboring video frames, leading to unsatisfactory performance when handling complex scenes. To this end, we propose a novel and efficient video eye fixation detection model to improve the saliency detection performance. By simulating the memory mechanism and visual attention mechanism of human beings when watching a video, we propose a step-gained fully convolutional network by combining the memory information on the time axis with the motion information on the space axis while storing the saliency information of the current frame. The model is obtained through hierarchical training, which ensures the accuracy of the detection. Extensive experiments in comparison with 11 state-of-the-art methods are carried out, and the results show that our proposed model outperforms all 11 methods across a number of publicly available datasets.

71 citations


Journal ArticleDOI
TL;DR: The proposed two-step forensic technique to detect frame insertion, deletion and duplication types of video forgery, based on Haralick coded frame correlation, outperforms the state-of-the-art with an average F1 score of 0.97.
Abstract: With the immensely growing rate of cyber forgery today, the integrity and authenticity of digital multimedia data are highly at stake In this work, we deal with forensic investigation of cyber forgery in digital videos The most common types of inter-frame forgery in digital videos are frame insertion, deletion and duplication attacks A number of significant researches have been carried out in this direction, in the past few years In this paper, we propose a two-step forensic technique to detect frame insertion, deletion and duplication types of video forgery In the first step, we detect outlier frames, based on Haralick coded frame correlation; and in the second step, we perform a finer degree of detection, to eliminate false positives, hence to optimize the forgery detection accuracy Our experimental results prove that the proposed method outperforms the state–of–the–art with an average F1 score of 097 in terms of inter–frame video forgery detection accuracy

29 citations


Journal ArticleDOI
TL;DR: This study establishes a framework that can simultaneously detect all aspects of inter-frame forgery and achieves higher accuracy and lower computational time for detecting inter- frame forgery.
Abstract: Inter-frame forgery marks a central type of forgery in surveillance videos, and involves three aspects - frame duplication, insertion, and deletion - under temporal domain. However, this forgery type has received little attention from scholars. More efforts have been on detecting only a single aspect of inter-frame forgery. Furthermore, studies have confirmed that previous methods did not achieve high accuracy for all forgeries types with low computational loads at the same time. In this study, the proposed method establishes a framework that can simultaneously detect all aspects of inter-frame forgeries. During the decoding process, the authors extract residue data of each frame from a video stream. Then spatial and temporal energies are exploited to illustrate data flow, and abnormal points are determined to detect forged frames. Noise ratios of forged and original frames are estimated for differentiating insertion from duplication attacks. Experimental results indicate that the proposed method achieves higher accuracy and lower computational time for detecting inter-frame forgery.

21 citations


Journal ArticleDOI
TL;DR: A spatiotemporal consistent embedding algorithm for the holographic video watermarking that exhibits superior performance compared to several methods in the literature, especially the robustness against additive noise and compression attacks.
Abstract: The interest in a three-dimensional (3-D) holographic display is increasing daily because it can generate full depth cue without utilizing any special glasses. However, the content of 3-D holographic videos currently without any protection can be illegally distributed and maliciously manipulated. Therefore, there is an urgent need to protect the copyright of holographic videos against malicious copying. Since watermarking is an effective way to protect the ownership of holographic sequence, this paper proposes a spatiotemporal consistent embedding algorithm for the holographic video watermarking. Imperceptibility requirement in the holographic video watermarking is more challenging compared with static holograms because of the temporal dimension existing in videos. The embedding algorithm should not only consider spatially embedding strength for each frame of the video, but also take the temporal dimension into account in order to guarantee the visual quality of the moving object. Before embedding, to defend the imperceptibility of the watermark from the holographic moving object, the embedding parameters are evaluated by the salient object from the interframe and intraframe. Different from the previous video watermarking algorithm, in order to ensure the robustness, in our paper, instead of two-dimensional watermark, 3-D watermark converted data (QR code) are embedded in the cellular automata (CA) domains using 3-D CA filters. Finally, the QR codes can be extracted from the watermarked holographic frames, and the final 3-D watermark can be digitally reconstructed with different depths’ cue using the computational integral imaging reconstruction algorithm. The experimental results demonstrate that the proposed method exhibits superior performance compared to several methods in the literature, especially the robustness against additive noise and compression attacks.

15 citations


Journal ArticleDOI
TL;DR: An intraframe pixel-row-interleaved error concealment algorithm that interleaves pixel rows to generate high similarity in different parts of a frame, thereby achieving intraframe error resilience, and can provide reliable and robust video transmission for UAVs.
Abstract: The wireless video transmission environment of unmanned aerial vehicles (UAVs) is complex and unstable given the high mobility and changeable working conditions of UAVs, which lead to burst and consecutive errors and high error rates A compressed video stream is extremely sensitive to transmission errors, such that even a single bit error sharply degrades the video quality Hence, we propose an intraframe pixel-row-interleaved error concealment algorithm that interleaves pixel rows to generate high similarity in different parts of a frame, thereby achieving intraframe error resilience Subsequently, we suggest an interframe time-field-interleaved alternative motion-compensated prediction that allows for automatic error elimination and recovers at least four consecutive frames in wireless video communications The experiments demonstrate that the proposed algorithms recover frames with excellent subjective and objective effects Moreover, these algorithms can provide reliable and robust video transmission for UAVs

14 citations


Journal ArticleDOI
Peng Liu1, Guoyu Wang1, Zhibin Yu1, Xinchang Guo1, Weigang Lu1 
TL;DR: The results of experiments conducted indicate that only the proposed algorithm achieves a balance in terms of both speed and accuracy, resulting in satisfactory tracking accuracy and meeting real–time requirements.

13 citations


Journal ArticleDOI
TL;DR: A highly efficient method to store just the relevant data required for the ToF computation using the shifted inter-frame histogram, which is demonstrated by Matlab simulations and FPGA implementation using input data from a SPAD camera prototype.
Abstract: Time-of-flight (ToF) image sensors based on single-photon detection, i.e., SPADs, require some filtering of pixel readings. Accurate depth measurements are only possible if the jitter of the detector is mitigated. Moreover, the time stamp needs to be effectively separated from uncorrelated noise, such as dark counts and background illumination. A powerful tool for this is building a histogram of a number of pixel readings. Future generation of ToF imagers are seeking to increase spatial and temporal resolution along with the dynamic range and frame rate. Under these circumstances, storing the complete histogram for every pixel becomes practically impossible. Considering that most of the information contained by the histogram represents noise, we propose a highly efficient method to store just the relevant data required for the ToF computation. This method makes use of the shifted inter-frame histogram. It requires a memory as low as 128 times smaller than storing the complete histogram if the pixel values are coded on up to 15 bits. Moreover, a fixed 28 words memory is enough to process histograms containing up to 215 bins. In exchange, the overall frame rate only decreases to one half. The hardware implementation of this algorithm is presented. Its remarkable robustness for a low SNR of the ToF estimation is demonstrated by Matlab simulations and FPGA implementation using input data from a SPAD camera prototype.

13 citations



Journal ArticleDOI
TL;DR: A hardware-efficient block matching algorithm with an efficient hardware design that is able to reduce the computational complexity of motion estimation while providing a sustained and steady coding performance for high-quality video encoding is presented.
Abstract: Variable block size motion estimation has contributed greatly to achieving an optimal interframe encoding, but involves high computational complexity and huge memory access, which is the most critical bottleneck in ultra-high-definition video encoding. This article presents a hardware-efficient block matching algorithm with an efficient hardware design that is able to reduce the computational complexity of motion estimation while providing a sustained and steady coding performance for high-quality video encoding. A three-level memory organization is proposed to reduce memory bandwidth requirement while supporting a predictive common search window. By applying multiple search strategies and early termination, the proposed design provides 1.8 to 3.7 times higher hardware efficiency than other works. Furthermore, on-chip memory has been reduced by 96.5% and off-chip bandwidth requirement has been reduced by 39.4% thanks to the proposed three-level memory organization. The corresponding power consumption is only 198mW at the highest working frequency of 500MHz. The proposed design is attractive for high-quality video encoding in real-time applications with low power consumption.

10 citations


Posted Content
TL;DR: In this article, a two-stream recurrent network is proposed to combine micro- and macro-motion features to improve video emotion recognition with a two stream recurrent network, named MIMAMO (Micro-Macro-Motion) Net.
Abstract: Spatial-temporal feature learning is of vital importance for video emotion recognition. Previous deep network structures often focused on macro-motion which extends over long time scales, e.g., on the order of seconds. We believe integrating structures capturing information about both micro- and macro-motion will benefit emotion prediction, because human perceive both micro- and macro-expressions. In this paper, we propose to combine micro- and macro-motion features to improve video emotion recognition with a two-stream recurrent network, named MIMAMO (Micro-Macro-Motion) Net. Specifically, smaller and shorter micro-motions are analyzed by a two-stream network, while larger and more sustained macro-motions can be well captured by a subsequent recurrent network. Assigning specific interpretations to the roles of different parts of the network enables us to make choice of parameters based on prior knowledge: choices that turn out to be optimal. One of the important innovations in our model is the use of interframe phase differences rather than optical flow as input to the temporal stream. Compared with the optical flow, phase differences require less computation and are more robust to illumination changes. Our proposed network achieves state of the art performance on two video emotion datasets, the OMG emotion dataset and the Aff-Wild dataset. The most significant gains are for arousal prediction, for which motion information is intuitively more informative. Source code is available at this https URL.

Proceedings ArticleDOI
01 Jul 2019
TL;DR: This paper presents an extremely efficient error detection solution for CNN based on the observation that, in the absence of errors, the differences between the input frames and the detection provided by the CNN should be strictly correlated.
Abstract: Object detection, a critical feature for autonomous vehicles, is performed today using Convolutional Neural Networks (CNNs). Errors in a CNN execution can modify the way the vehicle sense the surrounding environment, potentially causing accidents or unexpected behaviors. The high computational requirements of CNNs combined with the need to perform detection in real-time allow little margin for implementing error detection. In this paper, we present an extremely efficient error detection solution for CNN based on the observation that, in the absence of errors, the differences between the input frames and the detection provided by the CNN should be strictly correlated. In other words, if the image between two subsequent frames does not change significantly, the detection should also be very similar. Similarly, if the detection varies considerably from a frame to the next, then the input image should also have been different. Whenever input images and output detection don’t correlate we can detect a error. After formalizing and evaluating the inter-frame and output correlation thresholds, we implement and validate the detection strategy, utilizing data from previous radiation experiments. Exploiting the intrinsic efficiency in processing images of devices used to execute CNNs, we can detect up to 80% of errors while adding low overhead.

Journal ArticleDOI
TL;DR: A content-adaptive mode decision algorithm to reduce the SHVC complexity at the ELs is proposed and can reduce the coding time for ELs by 62%–67% with less than a 1.5% Bjontegaard rate increase compared to the original SHVC encoder.
Abstract: The scalable video coding extensions of the High Efficient Video Coding (HEVC) standard (SHVC) have adopted a new quadtree-structured coding unit (CU). The SHVC test model (SHM) needs to test seven intermode sizes and one intramode size at depth levels of “0,” “1,” “2,” and four intermode sizes and two intramode sizes at a depth level of “3” for interframe CUs. It checks all possible depth levels and prediction modes to find the one with the lowest rate distortion cost using the Lagrange multiplier method in the mode decision procedure to achieve high coding efficiency at the expense of computational complexity. Furthermore, it utilizes the conventional approach for the base layer (BL) and enhancement layer (EL) coding to support SNR/spatial scalable coding. Both the intralayer and interlayer predictions should be performed for each EL CU. Although there is a large amount of interlayer redundancy that can be exploited to speed up the EL encoding, the mode decision procedure is independently performed for the BL and the ELs. In this paper, we propose a content-adaptive mode decision algorithm to reduce the SHVC complexity at the ELs. When the major characteristics of the CUs, such as mode complexity and motion activity, can be estimated early and used for adjusting the mode decision procedure, unnecessary mode and CU size searches can be avoided. First, an experimental analysis is performed to study the interlayer and spatiotemporal correlations in the coding information and the interlevel correlations among the quadtree structures. Based on these correlations, three parameters, including the conditional probability of a SKIP/Merge mode, motion activity, and mode complexity, are defined to describe the video content and are further utilized to adaptively adjust the EL mode decision procedure. The experimental results show that the proposed algorithm can reduce the coding time for ELs by 62%–67% with less than a 1.5% Bjontegaard rate increase compared to the original SHVC encoder.

Journal ArticleDOI
TL;DR: This paper proposes a complete video coding framework using reference clips, and investigates the problems including how to generate reference clips as either singular content clips or repetitive content clips, how to manage the clips,How to utilize the clips for inter prediction, and how to allocate bit-rate among clips, in a systematic manner.
Abstract: Inter prediction is a fundamental technology in video coding to remove the temporal redundancy between video frames. Traditionally, the reconstructed frames are directly put into a reference frame buffer to serve as references for inter prediction. Using multiple reference frames increases the accuracy of inter prediction, but also incurs a waste of memory of the buffer since the content of reference frames is highly similar. To address this problem, we propose to organize the references at clip level in addition to frame level, i.e. the reference buffer stores not only reference frames, but also reference clips that are cropped regions selected from the reconstructed frames. Using clip-level references, we can manage the reference content more economically, since the content of multiple reference frames is divided into the singular content of each frame as well as the repetitive content that appears in multiple frames. For the repetitive content, only one copy is stored in reference clips so as to avoid duplicate. Moreover, using reference clips also facilitates the bit-rate allocation among reference content, i.e. the quality of each clip can be decided adaptively to achieve the rate-distortion optimization. In this paper, we propose a complete video coding framework using reference clips, and investigate the problems including how to generate reference clips as either singular content clips or repetitive content clips, how to manage the clips, how to utilize the clips for inter prediction, and how to allocate bit-rate among clips, in a systematic manner. The proposed video coding framework is implemented upon the state-of-the-art video coding scheme, High Efficiency Video Coding (HEVC). Experimental results show that our scheme achieves on average 5.1% and 5.0% BD-rate reduction than the HEVC anchor, in low-delay B and low-delay P settings, respectively. We believe that reference clip opens up a new dimension for optimizing inter prediction in video coding, and thus is worthy of further study.

Patent
23 May 2019
TL;DR: In this paper, a multidimensional enhanced beam refinement protocol MAC and PHY frame designs that extend the MAC packet and the PPDU format with or without backwards compatibility are presented.
Abstract: Systems and methods for multidimensional beam refinement procedures and signaling for millimeter wave WLANs. In some embodiments, there are multi-dimensional enhanced beam refinement protocol MAC and PHY frame designs that extend the MAC packet and the PPDU format with or without backwards compatibility. The multiple dimensions may be supported jointly or separately. In other embodiments, the increased data signaled in the eBRP frame designs may be more efficiently signaled with reduced BRP frame sizes, such as through a training type dependent BRP minimum duration selection procedure or use of null data packet BRP frames. In further embodiments, the maximum duration of the interframe spacing between BPR packets may be varied to improve the efficiency of BRP operation.

Journal ArticleDOI
TL;DR: A novel perceptual distortion threshold model (PDTM) is proposed to reveal the relationship between the mode selection of inter- frame prediction and coding distortion threshold and a new fast inter-frame prediction algorithm in MVC is developed aimed at minimizing computational complexity for dependent view coding.
Abstract: Multi-view video coding (MVC) utilizes hierarchical B picture prediction structure and adopts many coding techniques to remove spatiotemporal and inter-view redundancies at the cost of high computational complexity. In this paper, a novel perceptual distortion threshold model (PDTM) is proposed to reveal the relationship between the mode selection of inter-frame prediction and coding distortion threshold. Based on the proposed PDTM, a new fast inter-frame prediction algorithm in MVC is developed aimed at minimizing computational complexity for dependent view coding. Then the fast MVC algorithm is incorporated into the multi-view High Efficiency Video Coding (MV-HEVC) software to improve MVC coding efficiency. In practical coding, the mode selection for inter-frame prediction of dependent views may be early terminated based on the thresholds derived from the PDTM, thereby reducing the coding time complexity. Experimental results demonstrate that the proposed algorithm can reduce the computational complexity of the dependent views by 52.9% compared with the HTM14.1 algorithm under the coding structure of hierarchical B pictures. Moreover, the bitrate is increased by 0.9% under the same subjective quality and only increased by 1.0% under the same objective quality peak signal-to-noise ratio (PSNR). Compared with the state-of-the-art fast algorithm, the proposed algorithm can save more coding time, while the bitrate under the same PSNR increases slightly.

Journal ArticleDOI
TL;DR: Using this new encoding/decoding scheme, a failed decoded frame can be decoded again with extra information which corrects the errors in its highly unreliable unfrozen bits and the probability of successful decoding is improved.
Abstract: A new inter-frame correlated polar coding scheme is proposed, where two consecutive frames are correlated-encoded and assist each other during decoding. The correlation is achieved by dynamic configuration of the frozen bits. The frozen bits of the second frame partially depend on the unfrozen bits of the first frame in encoding and the number of bits that are viewed as frozen by decoder is alterable in different decoding modes. Using this new encoding/decoding scheme, a failed decoded frame can be decoded again with extra information which corrects the errors in its highly unreliable unfrozen bits. Thus, the probability of successful decoding is improved. Simulation results show that the performance of the proposed polar codes outperforms that of the classical counterpart significantly with negligible memory and complexity increment.

Journal ArticleDOI
TL;DR: A method for generating inter-frame video images based on spatial continuity generative adversarial networks (SC-GANs) to smooth the playing of low-frame rate videos and to clarify blurry image edges caused by the use of traditional methods to improve the video frame rate is proposed.
Abstract: This paper proposes a method for generating inter-frame video images based on spatial continuity generative adversarial networks (SC-GANs) to smooth the playing of low-frame rate videos and to clarify blurry image edges caused by the use of traditional methods to improve the video frame rate. Firstly, the auto-encoder is used as a discriminator and Wasserstein distance is applied to represent the difference between the loss distribution of the real sample and the generated sample, instead of the typical method of generative adversarial networks to directly match data distribution. Secondly, the hyperparameter between generator and discriminator is used to stabilize the training process, which effectively prevents the model from collapsing. Finally, taking advantage of the spatial continuity of the image features of continuous video frames, an optimal value between two consecutive frames is found by Adam and then mapped to the image space to generate inter-frame images. In order to illustrate the authenticity of the generated inter-frame images, PSNR and SSIM are adopted to evaluate the inter-frame images, and the results show that the generated inter-frame images have a high degree of authenticity. The feasibility and validity of the proposed method based on SC-GAN are also verified.

Patent
05 Apr 2019
TL;DR: In this article, an inter frame prediction method for video images and associated products is proposed, which consists of determining an inter-frame prediction mode used for performing interframe prediction on a current image block.
Abstract: The invention discloses an inter frame prediction method for video images and associated products. The inter frame prediction method comprises that an inter frame prediction mode used for performing inter frame prediction on a current image block is determined, the inter frame prediction mode is an inter frame prediction mode in a candidate inter frame prediction mode set, and the candidate interframe prediction mode set comprises multiple inter frame prediction modes used for non-directional motion fields and/or multiple inter frame prediction modes used for directional motion fields; and inter frame prediction is performed on the current image block according to the determined inter frame prediction mode. Embodiments of the present application further disclose a motion information prediction method based on different inter frame prediction modes. Via schemes of the embodiments of the present application, the prediction accuracy of motion information (e.g., motion vector) of image blocks is improved, bitrate for the same video quality is used, and the coding/decoding performance is improved.

Patent
18 Apr 2019
TL;DR: In this article, a point cloud encoder including an input interface to accept a dynamic point cloud including a sequence of point cloud frames of a scene is presented, where a processor encodes blocks of a current point cloud frame to produce an encoded frame.
Abstract: A point cloud encoder including an input interface to accept a dynamic point cloud including a sequence of point cloud frames of a scene. A processor encodes blocks of a current point cloud frame to produce an encoded frame. Wherein, for encoding a current block of the current point cloud frame, a reference block is selected similar to the current block according to a similarity metric to serve as a reference to encode the current block. Pair each point in the current block to a point in the reference block based on values of the paired points. Encode the current block based on a combination of an identification of the reference block and residuals between the values of the paired points. Wherein the residuals are ordered according to an order of the values of the points in the reference block. A transmitter transmits the encoded frame over a communication channel.

Proceedings ArticleDOI
01 Mar 2019
TL;DR: A novel inter frame feature discriminating the individual gait characteristics is proposed that has better recognition accuracy in comparison with existing features.
Abstract: Researches showed that gait is unique for individuals and human gait recognition gained much attention nowadays. The sequence of gait silhouettes extracted from the video sequences has its own significance for gait recognition performance. In this paper, a novel inter frame feature discriminating the individual gait characteristics is proposed. Consecutive frames within a gait cycle are divided into equal number of blocks and corresponding block differences are calculated. It can preserve the minute temporal variations of the different body parts within each block and the cumulative difference provide a unique feature capable of discriminating individuals. To avoid synchronization problems, secondary statistical features are extracted from the primary inter frame variations. Finally, feature level fusion schemes are applied on these statistical features with existing features extracted from CEI representation. The efficiency of the proposed feature is evaluated on widely adopted CASIA gait dataset B using subspace discriminant analysis. The experimental results show that our proposed feature has better recognition accuracy in comparison with existing features.

Patent
23 Apr 2019
TL;DR: In this article, the authors proposed an inter-frame prediction method and apparatus, and a storage medium, which consists of the following steps: determining at least one reference coded block that is spatially adjacent to a current prediction unit of a current coded block; determining a target AMVP from the corresponding AMVPs of the current prediction units under the reference frames; and comparing the candidate reference frame with the first reference frame, and if the candidate this article.
Abstract: The embodiment of the invention provides an inter-frame prediction method and apparatus, and a storage medium. The method comprises the following steps: determining at least one reference coded blockthat is spatially adjacent to a current prediction unit of a current coded block; for preset reference frames, respectively determining corresponding AMVPs of the current prediction unit under the reference frames; determining a target AMVP from the corresponding AMVPs of the current prediction unit under the reference frames, and using the reference frame corresponding to the target AMVP as a candidate reference frame; and comparing the candidate reference frame with the first reference frame, and if the candidate reference frame is different from the first reference frame, respectively performing motion estimation on the current prediction unit via the candidate reference frame and the first reference frame, and determining a target reference frame from the candidate reference frame andthe first reference frame according to coding costs corresponding to the candidate reference frame and the first reference frame in the motion estimation. By adoption of the inter-frame prediction method and apparatus provided by the embodiment of the invention, the processing complexity of the target reference frame selection can be reduced, and the efficiency of video coding can be improved.

Patent
Shao Jiyang, Bi Yuxin, Sun Jian, Zhang Hao, Zi Feng 
23 Apr 2019
TL;DR: In this paper, an image frame prediction method consisting of the following steps is presented: performing inter-frame motion vector calculation on two adjacent source frames, so that a frame motion vector of the two adjacent sources is obtained, wherein the source frames are rendered frames; according to at least two frame motion vectors, performing interframe motion vectors and processing a source frame the nearest to the predicted value of the frame vectors.
Abstract: The embodiment of the invention provides an image frame prediction method and a device as well as avatar display equipment. The image frame prediction method comprises the following steps: performinginterframe motion vector calculation on two adjacent source frames, so that a frame motion vector of the two adjacent source frames is obtained, wherein the source frames are rendered frames; according to at least two frame motion vectors, performing interframe motion vector prediction, so that a predicted value of the frame motion vectors is obtained; and according to the predicted value of the frame motion vectors, processing a source frame the nearest to the predicted value of the frame motion vectors, so that a predicted frame is obtained.

Proceedings ArticleDOI
01 Sep 2019
TL;DR: This work proposes an extension of the BM3D method for the scenario where the authors have multiple images of the same scene that outperforms all the existing trivial and non-trivial extensions of patch-based denoising methods for multi-frame images.
Abstract: The 3D block matching (BM3D) method is among the state-of-art methods for denoising images corrupted with additive white Gaussian noise. With the help of a novel inter-frame connectivity strategy, we propose an extension of the BM3D method for the scenario where we have multiple images of the same scene. Our proposed extension outperforms all the existing trivial and non-trivial extensions of patch-based denoising methods for multi-frame images. We can achieve a quality difference of as high as 28% over the next best method without using any additional parameters. Our method can also be easily generalised to other similar existing patch-based methods.

Proceedings ArticleDOI
26 Mar 2019
TL;DR: Adaptive quantization parameter (QP) selection algorithm for global RDO is proposed by modeling the function between change of distortion propagation and QP change (ΔQP), as well as change of bitrate and ΔQP.
Abstract: In video coding, inter-frame motion prediction eliminates temporal correlation greatly however bring about strong dependency characterized by inter-frame distortion propagation, which makes currently independent rate-distortion optimization (RDO) non-optimal any more. This paper proposes adaptive quantization parameter (QP) selection algorithm for global RDO by modeling the function between change of distortion propagation (ΔD) and QP change (ΔQP) as well as change of bitrate (ΔR) and ΔQP. Experimental results show that the proposed algorithm achieves promising BD-BR performance.

Journal ArticleDOI
TL;DR: An efficient template matching (TM) based similarity measure is used to recognize gesture within the inter frame sequence of video frames, and an acceptable recognition accuracy of up to 95% is achieved under variant observation conditions of three hand gesture classes dataset.
Abstract: A new systolic approach of real-time vision-based hand gesture recognition is proposed. Hand gesture recognition is classified into two type, static and dynamic. The proposed system can be employed in both types. The system can detect and classify either a static hand shape in one image or a static hand shape at the start, within, and the end of gesture in a dynamic sequence in video stream. An efficient template matching (TM) based similarity measure is used to recognize gesture within the inter frame sequence of video frames. The template can be used to store the static class of different hand gesture shapes, while fast scanning approach of window over an image frame dynamically detects or recognizes the hand gesture. A suitable hardware systolic model of TM is designed. The systolic architecture is expanded to be used with multiple windows (k-templates) simultaneously. The required templates are equal to the number (k) of gestures that is to be recognized. They are stored in FPGA RAM blocks (BRAMs). To increase the bandwidth of templates BRAMs, the BRAM is multi-ported. A considerable reduction of memory access operations is achieved for the parallel systolic model in comparison to sequential/parallel software models. Consequently, the hand gesture is detected, tracked and recognized within the inter frame space of video. An acceptable recognition accuracy of up to 95% is achieved under variant observation conditions of three hand gesture classes dataset. A high frame rate video can be processed in real time.

Patent
09 Jul 2019
TL;DR: In this article, a PHY converting chip is built in the near-end device, the base frame includes a first number of super groups, each super group includes a second number of base groups, and each base group includes media access control (MAC) frame structure data and an interframe gap.
Abstract: A method and an apparatus for transmitting frame data between a near-end device and a remote device are provided. The method includes: generating, by the near-end device, a base frame in a user-defined frame format, wherein a PHY converting chip is built in the near-end device, the base frame includes a first number of super groups, each super group includes a second number of base groups, and each base group includes media access control (MAC) frame structure data and an interframe gap; matching duration of the MAC frame structure data and the interframe gap with an output timing sequence of the PHY converting chip; and converting the base frame into an optical fiber signal through the PHY converting chip, and sending the optical fiber signal to the remote device.

Patent
15 Mar 2019
TL;DR: In this paper, an automatic generation method for a high-precision map based on high precision positioning and lane line recognition is proposed, which consists of the following steps: enabling high- precision positioning data acquired each time to be synthesized with lane line data; establishing map frames by using synthesized data and storing the map frames into a map frame database.
Abstract: The invention discloses an automatic generation method for a high-precision map based on high precision positioning and lane line recognition. The automatic generation method comprises the following steps: enabling high precision positioning data acquired each time to be synthesized with lane line data; establishing map frames by using synthesized data and storing the map frames into a map frame database; matching newly-acquired lane line data with all established map frames; if the matching is failed, establishing a new map frame; if the matching succeeds, updating information of the established map frame; carrying out interframe smooth processing on all existing map frames in the established map frames so as to obtain cubic curves subjected to smooth processing; splicing the cubic curvesto generate the high-precision map. The automatic generation method disclosed by the invention is used for generating the high-precision map which is accurate to lane line and can automatically splice the lane lines based on the high precision positioning; the complexity of generating the high precision map is reduced, and the problems that a traditional high precision map is generated by consuming a large amount of labor and has a high error rate are avoided.

Patent
06 Sep 2019
TL;DR: In this article, the authors present an inter-frame prediction method and device for video coding based on motion vector acquisition and motion vectors of the co-located blocks of a current coding block.
Abstract: The invention discloses a time domain motion vector acquisition, an inter-frame prediction method and device, and a video coding method and device. The method comprises the following steps: determining at least one co-located frame of a current coding block according to a preset method; determining at least one co-location block in the co-location frame according to the search order of the candidate location blocks of the current coding block; obtaining motion vectors of the co-located blocks; and zooming the motion vector of the co-bit block by using the distance between the current frame andthe reference frame of the current frame and the distance between the reference frames of the co-bit frame and the co-bit frame to obtain the time domain motion vector of the current coding block. Inthis way, the inter-frame prediction method and device can improve the accuracy of inter-frame prediction.

Patent
29 Nov 2019
TL;DR: In this article, an intra-frame and inter-frame joint prediction method was proposed, which consists of two steps: determining at least one joint intraframe prediction mode of a current coding block and obtaining a joint intra frame prediction value.
Abstract: The invention discloses an intra-frame and inter-frame joint prediction method, an intra-frame and inter-frame joint prediction device, a codec and a storage device. The intra-frame and inter-frame joint prediction method comprises the following steps: determining at least one joint intra-frame prediction mode of a current coding block by utilizing an intra-frame prediction mode of at least one coded block of the current frame; obtaining a joint intra-frame prediction value of the current coding block by using at least one joint intra-frame prediction mode; and obtaining at least one intra-frame and inter-frame joint prediction value by using the at least one candidate motion vector of the current coding block and the joint intra-frame prediction value. In this way, the prediction accuracycan be improved.