scispace - formally typeset
Search or ask a question

Showing papers on "Inter frame published in 2020"


Journal ArticleDOI
TL;DR: An efficient method based on histogram of oriented gradients (HOG) and motion energy image (MEI) that can detect all inter-frame forgeries and achieve higher accuracy with lower execution time is proposed.
Abstract: Inter-frame forgery is a common type of video forgery to destroy the video evidence. It occurs in the temporal domain such as frame deletion, frame insertion, frame duplication, and frame shuffling. These forms of forgery are more frequently produced in a surveillance video because the camera position and the scene are relatively stable, where the tampering process is easy to operate and imperceptible. In this paper, we propose an efficient method for inter-frame forgery detection based on histogram of oriented gradients (HOG) and motion energy image (MEI). HOG is obtained from each image as a discriminative feature. In order to detect frame deletion and insertion, the correlation coefficients are used and abnormal points are detected via Grabb’s test. In addition, MEI is applied to edge images of each shot to detect frame duplication and shuffling. Experimental results prove that the proposed method can detect all inter-frame forgeries and achieve higher accuracy with lower execution time.

22 citations


Journal ArticleDOI
TL;DR: A CNN VPN is designed to generate a virtual reference frame (VRF), which is synthesized using previously coded frames, to improve coding efficiency and exploit the PU-wise multi-hypothesis prediction techniques in HEVC.
Abstract: In this paper, we propose a novel Convolutional Neural Network (CNN) based video coding technique using a video prediction network (VPN) to support enhanced motion prediction in High Efficiency Video Coding (HEVC). Specifically, we design a CNN VPN to generate a virtual reference frame (VRF), which is synthesized using previously coded frames, to improve coding efficiency. The proposed VPN uses two sub-VPN architectures in cascade to predict the current frame in the same time instance. The VRF is expected to have higher temporal correlation than a conventional reference frame, and, thus it is substituted for a conventional reference frame. The proposed technique is incorporated into the HEVC inter-coding framework. Particularly, the VRF is managed in a HEVC reference picture list, so that each prediction unit (PU) can choose a better prediction signal through Rate-Distortion optimization without any additional side information. Furthermore, we modify the HEVC inter-prediction mechanisms of Advanced Motion Vector Prediction and Merge modes adaptively when the current PU uses the VRF as a reference frame. In this manner, the proposed technique can exploit the PU-wise multi-hypothesis prediction techniques in HEVC. Since the proposed VPN can perform both the video interpolation and extrapolation, it can be used for Random Access (RA) and Low Delay B (LD) coding configurations. It is shown in experimental results that the proposed technique provides −2.9% and −5.7% coding gains, respectively, in RA and LD coding configurations as compared to the HEVC reference software, HM 16.6 version.

16 citations


Journal ArticleDOI
TL;DR: The proposed automatic shot detection method, by employing the fast feature descriptor of Oriented FAST and Rotated BRIEF fused with SSIM, can outperform the existing shot detection methods, including the rule-based and learning-based methods, by testing on the video sequences from the Open-video project and RAI dataset.
Abstract: Shots are the basic units for analyzing and retrieving video, and also the essential elements in creating video datasets. The traditional methods of shot detection exhibit unsatisfactory performance for being too sensitive to motion or too much time-consuming. This paper proposes an automatic shot detection method, by employing the fast feature descriptor of Oriented FAST and Rotated BRIEF (ORB) fused with Structural Similarity (SSIM). Firstly, ORB descriptor is used to preselect candidate segments with a high tolerance for rapidly extracting the features of twenty-frame intervals in video sequences. Then, the cut transition is detected by comparing ORB features, fused with SSIM, of consecutive frames in the candidate segment. Finally, the gradual transition is detected by determining the maximum amount of the continuous increasing/decreasing interframe differences in the candidate segment without cut transition. Experimental result indicates that the proposed method can achieve an F1-Score of 92.5% and five times of real-time speed with one CPU on 106049 test frames from the Open-video project, YouTube, and YOUKU. In addition, the proposed method can outperform the existing shot detection methods, including the rule-based and learning-based methods, by testing on the video sequences from the Open-video project and RAI dataset.

16 citations


Journal ArticleDOI
TL;DR: A novel rate control scheme designed for screen content videos coding (SCC) is proposed in this paper and experimental results show that the proposed rate control algorithm respectively achieves 0.88 dB BDPSNR and 1.54 dBBDPSNR increase under the Low Delay coding structure on average.
Abstract: The particular characteristics of contents generated by computers and the frequent scene change in screen content videos make the exiting R- $\lambda $ model in the rate control be not suitable to encode screen content videos. A novel rate control scheme designed for screen content videos coding (SCC) is proposed in this paper. Text blocks, screen image blocks and nature image blocks in screen content videos can result in different bitrate-distortion (R-D) relationships. A content-based rate control method is put forward at Coding Tree Unit (CTU) level for this reason. Three independent parameter update modes are adopted for three different types of CTUs according to their corresponding R-D relationship. Furthermore, frequent scene change in screen content videos needs a novel bit allocation scheme for precise rate control. In view of this, the frames in the screen content videos are classified into scene changed frames and scene unchanged frames. Different bit allocation methods are adopted for different types of frames on the basis of inter frame and intra frame complexity. Besides, a region level bit allocation algorithm considering the inter frame continuity and the numbers of different types of CTUs is added between frame level and CTU level bit allocation schemes. Experimental results show that our proposed rate control algorithm respectively achieves 0.88 dB BDPSNR and 1.54 dB BDPSNR increase under the Low Delay coding structure on average, compared with the default rate control algorithm with hierarchical and non-hierarchical bit allocation in the HEVC-SCC.

14 citations


Proceedings ArticleDOI
01 Feb 2020
TL;DR: As shown in Fig. 14.1, adjacent activation frames of typical video applications are similar to each other most of the time, providing an opportunity to reduce both computing and data transmission complexity significantly.
Abstract: Convolutional Neural Networks (CNNs) have become widely used in image signal processing, such as tracking, classification and post-processing. Modern CNNs use millions of weights and activations, leading to critical challenges for both computation and data transmission. Video applications, such as autopilot and surveillance cameras, have to process a large number of sequential images/frames within limited time, making the situation even worse. As shown in Fig. 14.2.1, adjacent activation frames of typical video applications are similar to each other most of the time, providing an opportunity to reduce both computing and data transmission complexity significantly.

14 citations


Journal ArticleDOI
TL;DR: A low bit rate underwater video compression coding method based on the preprocessing process of wavelet transform and coefficient down-sampling, the visual redundancy of underwater image is removed and the computational coefficients and coding bits are reduced to meet the requirements of underwater acoustic channel transmission rate.
Abstract: In view of the limited bandwidth of underwater video image transmission, a low bit rate underwater video compression coding method is proposed. Based on the preprocessing process of wavelet transform and coefficient down-sampling, the visual redundancy of underwater image is removed and the computational coefficients and coding bits are reduced. At the same time, combined with multi-level wavelet decomposition, inter frame motion compensation, entropy coding and other methods, according to the characteristics of different types of frame image data, reduce the number of calculations and improve the coding efficiency. The experimental results show that the reconstructed image quality can meet the visual requirements, and the average compression ratio of underwater video can meet the requirements of underwater acoustic channel transmission rate.

12 citations


Journal ArticleDOI
TL;DR: A complex pseudo-spectrum (CPS) method is proposed for VF-TBD to improve the performance of weak target detection, based on the utilization of phase information during energy integration.

10 citations


Journal ArticleDOI
TL;DR: In this paper, a comprehensive review of the state-of-the-art techniques for HEVC inter-frame coding from three aspects, namely fast inter coding solutions, implementation on different hardware platforms as well as advanced inter coding techniques is provided.
Abstract: High Efficiency Video Coding (HEVC) has doubled the video compression ratio with equivalent subjective quality as compared to its predecessor H.264/AVC. The significant coding efficiency improvement is attributed to many new techniques. Inter-frame coding is one of the most powerful yet complicated techniques therein and has posed high computational burden thus main obstacle in HEVC-based real-time applications. Recently, plenty of research has been done to optimize the inter-frame coding, either to reduce the complexity for real-time applications, or to further enhance the encoding efficiency. In this paper, we provide a comprehensive review of the state-of-the-art techniques for HEVC inter-frame coding from three aspects, namely fast inter coding solutions, implementation on different hardware platforms as well as advanced inter coding techniques. More specifically, different algorithms in each aspect are further subdivided into sub-categories and compared in terms of pros, cons, coding efficiency and coding complexity. To the best of our knowledge, this is the first such comprehensive review of the recent advances of the inter-frame coding for HEVC and hopefully it would help the improvement, implementation and applications of HEVC as well as the ongoing development of the next generation video coding standard.

10 citations


Proceedings ArticleDOI
22 May 2020
TL;DR: An approach is presented that allows performing complex primary processing of data obtained by a group of sensors operating in the visible and cameras capturing data in the infrared range using nVidia Jetson 1 as a device used to calculate operations performed by an unmanned vehicle.
Abstract: When analyzing data recorded on mobile devices (including unmanned aerial vehicles and cars), additional errors arise due to the interframe blur effect and relative interframe displacement. The elimination of the noise component in such conditions is very difficult since the processing can be carried out the only frame by frame eliminating the possibility of multi-frame analysis. The paper presents an approach that allows performing complex primary processing of data obtained by a group of sensors operating in the visible and cameras capturing data in the infrared range. As a device used to calculate operations performed by an unmanned vehicle, including control, data collection and initial processing on nVidia Jetson 1 is used. A pair of cameras with a resolution of 1980x1080 pixels with a frequency of 10 frames on second, as well as SEAK thermal imaging camera with a resolution of 320x240 pixels, are used as data sensors. The algorithm presented in the work is based on a step-by-step identification of stationary areas on a series of images, the search for their correspondence and simplification. At the next stage, a local reduction of the noise component is performed using the method based on multicriterial smoothing processing. At the final stage, the operation of the complex application of parameters to filter on a series of images obtained in various electromagnetic ranges is performed.

7 citations


Journal ArticleDOI
TL;DR: Experimental results showed the proposed ViBe video detection method can effectively remove of "ghosting" phenomenon that occurs in traditional ViBe method and realise accurate and complete detection of the moving vehicle in video.
Abstract: It is difficult with traditional methods to realise real-time and robust detection of moving vehicles under complex traffic scenes. In this paper, a moving vehicle video detection method that combines ViBe and inter-frame difference is proposed. The proposed method improves the background update efficiency of the traditional ViBe method by adding a multi-threshold comparison step to the inter-frame difference method. The improved background update strategy can judge whether the detected pixel point belongs to the foreground or background, and dynamically adjusts the background update rate according to the inter-frame difference results. Experimental results showed the proposed method can effectively remove of "ghosting" phenomenon that occurs in traditional ViBe method and realise accurate and complete detection of the moving vehicle in video.

7 citations


Proceedings ArticleDOI
01 Oct 2020
TL;DR: This paper presents a lossless inter geometry coder of voxelized point clouds, extending the contexts used in the arithmetic coding to include voxels from a reference point cloud, hence the name, 4D contexts, and proposes a fast decision method to avoid encoding each slice with both contexts.
Abstract: This paper presents a lossless inter geometry coder of voxelized point clouds. We build upon our previous work, extending the contexts used in the arithmetic coding to include voxels from a reference point cloud, hence the name, 4D contexts. We show that considering both 3D and 4D contexts leads to a substantial gain compared to considering each one on its own, and we also propose a fast decision method to avoid encoding each slice with both contexts. The proposed codec has the same complexity as our previous intra codec, but it shows an equal or superior performance for all point clouds tested. Results show that the proposed method outperforms all intra and inter state-of-the-art coders on the public available datasets tested.

Journal ArticleDOI
TL;DR: Comparison analysis of newly developed techniques of forgery detection and helpful for finding the difficulties and brings out opportunities in the field of forgeries detection are presented.
Abstract: In the recent year video forgery detection is a major problem in video forensics. Unauthorised changes in video frames causing degradation of authenticity and integrity of originality. With the advancement in technology video processing tools and techniques are available for altering the videos for forgery. The modification or changes in current video is vital to detect, since this Video can be used in the authentication process. Video credibility hence required to be verified. There are different ways by which Video can be tempered. For example: frame insertion, deletion, duplication, copy and move, splicing etc. This paper presents analysis of forgery detection techniques like inter frame forgery detection & intra frame forgery detection that can be used for video tampering detection. In this paper present comparative analysis of newly developed techniques of forgery detection and helpful for finding the difficulties and brings out opportunities in the field of forgery detection.

Journal ArticleDOI
TL;DR: A structural group sparsity model for use in the initial reconstruction phase and a weight-based group sparse optimization algorithm acting in joint domains are introduced and a coarse-to-fine optical flow estimation model with successive approximation is introduced foruse in the interframe prediction stage.

Posted Content
TL;DR: This paper proposes a novel temporal feature extraction method, named Attentive Correlated Temporal Feature (ACTF), by exploring inter-frame correlation within a certain region and has the advantage of achieving performance comparable to or better than optical flow-based methods while avoiding the introduction of optical flow.
Abstract: Temporal feature extraction is an important issue in video-based action recognition. Optical flow is a popular method to extract temporal feature, which produces excellent performance thanks to its capacity of capturing pixel-level correlation information between consecutive frames. However, such a pixel-level correlation is extracted at the cost of high computational complexity and large storage resource. In this paper, we propose a novel temporal feature extraction method, named Attentive Correlated Temporal Feature (ACTF), by exploring inter-frame correlation within a certain region. The proposed ACTF exploits both bilinear and linear correlation between successive frames on the regional level. Our method has the advantage of achieving performance comparable to or better than optical flow-based methods while avoiding the introduction of optical flow. Experimental results demonstrate our proposed method achieves the state-of-the-art performances of 96.3% on UCF101 and 76.3% on HMDB51 benchmark datasets.

Journal ArticleDOI
TL;DR: The proposed random access with an inter-frame successive interference cancellation (RA-ISIC) for stationary IoT networks can effectively improve the resource efficiency and can be suitable to accommodate more IoT devices with a smaller amount of resources, compared to the conventional one.
Abstract: In 5G cellular networks, it is required to accommodate a massive number of Internet-of-Things (IoT) devices in a resource-efficient way. In this letter, we propose a random access with an inter-frame successive interference cancellation (RA-ISIC) for stationary IoT networks. In our proposed scheme, the BS does not discard the collided packets and attempts to recover them through inter-frame SIC operations. We evaluate the performance of our proposed scheme in terms of resource efficiency. Results show that a certain level of system load is required to maximize the efficiency of our proposed scheme, which can be achieved through a probabilistic retransmission (PR) policy. Consequently, our proposed scheme can effectively improve the resource efficiency and can be suitable to accommodate more IoT devices with a smaller amount of resources, compared to the conventional one.


Journal ArticleDOI
TL;DR: An adaptive QP selection algorithm for global RDO is proposed based on the modeling function between Δ D and Δ Q P in this paper, which can decrease Bj⊘tegaard Delta BitRate (BD-BR) by about 1.62% at the random-access (RA) configuration and 1.13%" at the low-delay (LD) configuration.
Abstract: Massive inter predictive modes are adopted in latest High Efficiency Video Coding (HEVC) standard to eliminate temporal redundancies, which results in stronger inter-frame dependency among neighboring frames than previous standards like H.264. The inter-frame dependency makes currently independent rate-distortion optimization (RDO) non-optimal any more. Quantization parameter (QP) selection algorithm taking inter-frame dependency into consideration is supposed to optimize RDO based rate control greatly. According to our research, the inter-frame dependency is reflected by the linear relationship between QP change (ΔQP) and the resulting change of distortion (ΔD). An adaptive QP selection algorithm for global RDO is proposed based on the modeling function between ΔD and ΔQP in this paper. Firstly, based on intensive statistic analysis, three parameters (initial QP ($\overline {QP}$), the length of pictures of group (GOP), and the average of SATD of one frame) are used to formulate the relationship between ΔD and ΔQP. Secondly, the resulting rate change ΔR relative to ΔQP is also formulated similarly. Thirdly, optimized Lagrangian multiplier (λ) is calculated with these two mathematic models. Finally, we refine QP values based on the optimized λ in terms of dependent RDO. The experimental results show that the proposed frame-level QP selection algorithm can decrease Bj⊘tegaard Delta BitRate (BD-BR) by about 1.62% at the random-access (RA) configuration and 1.13% at the low-delay (LD) configuration, respectively. At the same time, it doesn’t increase complexity significantly.


Patent
20 Feb 2020
TL;DR: In this paper, a method for multi-channel audio or speech signal processing is proposed, which includes determining a variation between a first mismatch value and a second mismatch value, and comparing the variation with a first threshold that may have a pre-determined value or may be adjusted based on a frame type or a smoothing factor.
Abstract: A method for multi-channel audio or speech signal processing includes receiving a reference channel and a target channel, determining a variation between a first mismatch value and a second mismatch value, and comparing the variation with a first threshold that may have a pre-determined value or may be adjusted based on a frame type or a smoothing factor. The method also includes adjusting a set of target samples of the target channel based on the variation and based on the comparison to generate an adjusted set of target samples. Adjusting the set of target samples includes selecting one among a first interpolation and a second interpolation based on the variation. The method further includes generating at least one encoded channel based on a set of reference samples and the adjusted set of target samples. The method also includes transmitting the at least one encoded channel to a second device.

Journal ArticleDOI
TL;DR: In this article, the authors proposed a parameterized protocol for extrema estimation in large-scale radio-frequency identification (RFID) systems, where there are thousands of tags and each tag contains a finite value.
Abstract: We consider the extrema estimation problem in large-scale radio-frequency identification (RFID) systems, where there are thousands of tags and each tag contains a finite value. The objective is to design an extrema estimation protocol with the minimum execution time. Because the standard binary search protocol wastes much time due to interframe overhead, we propose a parameterized protocol and treat the number of slots in a frame as an unknown parameter. We formulate the problem and show how to find the best parameter to minimize the worst-case execution time. Finally, we propose two rules to further reduce the execution time. The first is to find and remove redundant frames. The second is to concatenate a frame from minimum value estimation with a frame from maximum value estimation to reduce the total number of frames. Simulations show that, in a typical scenario, the proposed protocol reduces execution time by 79% compared with the standard binary search protocol.

Proceedings ArticleDOI
01 Jun 2020
TL;DR: The above example search process is enhanced to allow fractional-pel positions for more accurate probability modeling and model parameters that control shapes of the Gaussian functions are numerically optimized so the resulting coding rate can be a minimum.
Abstract: We previously proposed a lossless video coding method based on intra/inter-frame example search and probability model optimization. In this method, several examples, i.e. a set of pels whose neighborhoods are similar to a local texture of the target pel to be encoded, are searched from already encoded areas of the current and previous frames with integer pel accuracy. Probability distribution of an image value at the target pel is then modeled as weighted sum of the Gaussian functions whose peaked positions are given by the individual examples. Furthermore, model parameters that control shapes of the Gaussian functions are numerically optimized so that the resulting coding rate can be a minimum. In this paper, the above example search process is enhanced to allow fractional-pel positions for more accurate probability modeling.

Patent
26 Mar 2020
TL;DR: In this article, an optical flow-based bidirectional prediction on the plurality of first image blocks to obtain a predicted value of each first image block, and combining the predicted values of the plurality for each image block to obtain the predicted value for the image block.
Abstract: The present application discloses an inter-frame prediction method and a device, wherein the method comprises: according to a preset image division width, a preset image division height, and a width and a height of an image block to be processed, determining a plurality of first image blocks in the image block to be processed; performing an optical flow-based bidirectional prediction on the plurality of first image blocks to obtain a predicted value of each first image block; and combining the predicted values of the plurality of first image blocks to obtain a predicted value of the image block to be processed. The device comprises a determination module, a prediction module and a combination module. The present application can reduce inter-frame prediction implementation complexity and improve processing efficiency.

Patent
31 Jan 2020
TL;DR: In this article, the authors proposed a credible playing method for generating associated abstract based on intra-frame extraction, which relates to the technical field of digital security and relates to video and audio credible playing.
Abstract: The invention discloses a video and audio credible playing method for generating an associated abstract based on intra-frame extraction, and relates to the technical field of digital security. The method comprises the following steps: in the transcoding service process of the video streaming media, a publishing end extracts data segments with a certain length in each video frame of a video clip atintervals according to a set intra-frame extraction rule to form a data set corresponding to the video frame, and generates a real-time associated abstract corresponding to the video frame by using aone-way hash algorithm; the real-time associated abstract is utilized to obtain a digital signature aggregate; the client receives the video clip embedded with the digital signature assembly, and extracts the digital signature assembly and the original video clip; the digital signature assembly is decrypted to generate a receiving association abstract corresponding to the video frame;and the received associated abstract is compared with the real-time associated abstract, and a player is controlled to play the video clip content according to a comparison result. According to the method, the security policy for ensuring credible playing is transferred to the streaming media receiving end and the streaming media transmitting end from the complex intermediate network, and the engineering overhead of the security policy is reduced.

Proceedings ArticleDOI
28 Sep 2020
TL;DR: The accuracy of inter-frame differencing improves when the proposed method using contour processing is applied and the results in low-illumination images become smaller when the conventional inter- Frame subtraction method is applied.
Abstract: We investigate the frame subtraction method in the research of the security surveillance system for old people by artificial intelligence using deep neural networks. The results in low-illumination images become smaller when the conventional inter-frame differencing method is applied. Therefore, we propose an optimal method of inter-frame differencing in various illuminance environments. The accuracy of inter-frame differencing improves when the proposed method using contour processing is applied.

Journal ArticleDOI
25 Dec 2020-Sensors
TL;DR: In this article, the complex associating and mapping problem is investigated and modeled as a multilayer optimization problem to realize low drift localization and point cloud map reconstruction without the assistance of the GNSS/INS navigation systems.
Abstract: To achieve the ability of associating continuous-time laser frames is of vital importance but challenging for hand-held or backpack simultaneous localization and mapping (SLAM). In this study, the complex associating and mapping problem is investigated and modeled as a multilayer optimization problem to realize low drift localization and point cloud map reconstruction without the assistance of the GNSS/INS navigation systems. 3D point clouds are aligned among consecutive frames, submaps, and closed-loop frames using the normal distributions transform (NDT) algorithm and the iterative closest point (ICP) algorithm. The ground points are extracted automatically, while the non-ground points are automatically segmented to different point clusters with some noise point clusters omitted before 3D point clouds are aligned. Through the three levels of interframe association, submap matching and closed-loop optimization, the continuous-time laser frames can be accurately associated to guarantee the consistency of 3D point cloud map. Finally, the proposed method was evaluated in different scenarios, the experimental results showed that the proposed method could not only achieve accurate mapping even in the complex scenes, but also successfully handle sparse laser frames well, which is critical for the scanners such as the new Velodyne VLP-16 scanner’s performance.

Proceedings ArticleDOI
04 May 2020
TL;DR: This paper focuses on making using of the rich information in latest consecutive frames to improve the feature representation of initial template frame, which is then point-wise multiplied by the features of first frame to obtain the updated template.
Abstract: Siamese series of tracking networks have shown great potentials in achieving balanced accuracy and beyond real-time speed. However, most of existing siamese trackers only consider appearance features of first frame, and hardly benefit from interframe information. The lack of latest temporal transformation degrades the tracking performance during challenges such as deformation and partial occlusion. In this paper we focus on making using of the rich information in latest consecutive frames to improve the feature representation of initial template frame. Specifically, the latest frames after 3d convolution are used to generate an attention map, which is then point-wise multiplied by the features of first frame to obtain the updated template. With the attention map, the template can adaptively cope with the deformation and occlusion of the target. Since the first frame is always used as the basis of the template, there is no cumulative error when using the latest frames for attention. Due to the shared 2d convolution of all frames, the feature map results can be reused so that the added module has almost no time-consuming effects. This module is easily embedded into different siamese trackers. Through verification, the module has significantly improved tracking performance in different backbone situations.

Patent
04 Feb 2020
TL;DR: In this article, a video encoding method and a method and device for determining an inter-frame encoding method are presented, which can reduce the memory access bandwidth by constraining the size of the coding sub-block.
Abstract: The present disclosure provides a video encoding method and a method and a device for determining an inter-frame encoding method. The method includes: obtaining a video image frame and determining a coding unit CPU in the video image frame; dividing the CPU using inter prediction coding into sub-blocks based on a constraint condition, wherein the constraint condition includes: for unidirectional prediction coding and/or bidirectional prediction coding, the size of the smallest sub-block is larger than the size of the smallest sub-block agreed upon in the protocol; encoding the divided sub-blocks by means of the inter-frame prediction coding. This video coding method can reduce the memory access bandwidth by constraining the size of the coding sub-block.

Posted Content
TL;DR: In this article, a decoder-side cross resolution synthesis (CRS) module is proposed to pursue better compression efficiency beyond the latest Versatile Video Coding (VVC), where they encode intra frames at original high resolution (HR), compress inter frames at a lower resolution (LR), and then super-resolve decoded LR inter frames with the help from preceding HR intra and neighboring LR interframes.
Abstract: This paper proposes a decoder-side Cross Resolution Synthesis (CRS) module to pursue better compression efficiency beyond the latest Versatile Video Coding (VVC), where we encode intra frames at original high resolution (HR), compress inter frames at a lower resolution (LR), and then super-resolve decoded LR inter frames with the help from preceding HR intra and neighboring LR inter frames. For a LR inter frame, a motion alignment and aggregation network (MAN) is devised to produce temporally aggregated motion representation (AMR) for the guarantee of temporal smoothness; Another texture compensation network (TCN) inputs decoded HR intra frame, re-sampled HR intra frame, and this LR inter frame to generate multiscale affinity map (MAM) and multiscale texture representation (MTR) for better augmenting spatial details; Finally, similarity-driven fusion synthesizes AMR, MTR, MAM to upscale LR inter frame for the removal of compression and resolution re-sampling noises. We enhance the VVC using proposed CRS, showing averaged 8.76% and 11.93% Bjontegaard Delta Rate (BD-Rate) gains against the latest VVC anchor in Random Access (RA) and Low-delay P (LDP) settings respectively. In addition, experimental comparisons to the state-of-the-art super-resolution (SR) based VVC enhancement methods, and ablation studies are conducted to further report superior efficiency and generalization of proposed algorithm. All materials will be made to public at this https URL for reproducible research.

Patent
23 Sep 2020
TL;DR: In this article, the concept of auxiliary frames was introduced to reduce or remove the need of copying data, for reference encoding purposes, between encoders which encode different parts of an image frame.
Abstract: The present invention relates to the field of image encoding. In particular, it relates to methods and devices where the concept of auxiliary frames may be employed to reduce or remove the need of copying data, for reference encoding purposes, between encoders which encode different parts of an image frame. This purpose is achieved by spatially modifying (S104) original image data before encoding (S106, S108) it using the encoders, and using (S110) the encoded image data as image data of an auxiliary frame. The auxiliary frame is referenced by an inter frame comprising motion vectors corresponding to a restoration of the auxiliary frame image data back to a spatial arrangement of the original image data.

Patent
Ding Zhiming1
11 Feb 2020
TL;DR: In this paper, a measurement result obtained by the requester by performing time measurement with each responder includes time stamps t1, t2, t3, and t4, where t1 is a time when the responder sends a measurement frame, t 2 is the time at which the requesters receives the measurement frame and t 3 is a response to the measurement frames.
Abstract: A method includes: separately performing, by a requester, time measurement with a plurality of responders, and calculating a location of the requester based on a measurement result. A measurement result obtained by the requester by performing time measurement with each responder includes time stamps t1, t2, t3, and t4, where t1 is a time when the responder sends a measurement frame, t2 is a time at which the requester receives the measurement frame, t3 is a time when the requester sends an acknowledgement frame in response to the measurement frame, and t4 is a time when the responder receives the acknowledgement frame. The acknowledgement frame is sent by the requester after waiting for a randomly generated short interframe spacing after receiving a last symbol of the measurement frame. The randomly generated short interframe spacing is randomly generated within a specified fluctuation range of a nominal short interframe spacing.