scispace - formally typeset
Search or ask a question

Showing papers on "Inter frame published in 2018"


Journal ArticleDOI
TL;DR: A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame of a video, with a proposed technique achieving a state-of-the-art F1-score recognition performance.
Abstract: Despite recent interest and advances in facial micro-expression research, there is still plenty of room for improvement in terms of micro-expression recognition. Conventional feature extraction approaches for micro-expression video consider either the whole video sequence or a part of it, for representation. However, with the high-speed video capture of micro-expressions (100–200 fps), are all frames necessary to provide a sufficiently meaningful representation? Is the luxury of data a bane to accurate recognition? A novel proposition is presented in this paper, whereby we utilize only two images per video, namely, the apex frame and the onset frame. The apex frame of a video contains the highest intensity of expression changes among all frames, while the onset is the perfect choice of a reference frame with neutral expression. A new feature extractor, Bi-Weighted Oriented Optical Flow (Bi-WOOF) is proposed to encode essential expressiveness of the apex frame. We evaluated the proposed method on five micro-expression databases—CAS(ME) 2 , CASME II, SMIC-HS, SMIC-NIR and SMIC-VIS. Our experiments lend credence to our hypothesis, with our proposed technique achieving a state-of-the-art F1-score recognition performance of 0.61 and 0.62 in the high frame rate CASME II and SMIC-HS databases respectively.

212 citations


Journal ArticleDOI
TL;DR: The experimental results on 80 videos from two datasets indicate the superior performance of the proposed key frame extraction approach, which aims to find the representative frames of the video and filter out similar frames from the representative frame set.
Abstract: Key frame extraction is an efficient way to create the video summary which helps users obtain a quick comprehension of the video content. Generally, the key frames should be representative of the video content, meanwhile, diverse to reduce the redundancy. Based on the assumption that the video data are near a subspace of a high-dimensional space, a new approach, named as key frame extraction in the summary space, is proposed for key frame extraction in this paper. The proposed approach aims to find the representative frames of the video and filter out similar frames from the representative frame set. First of all, the video data are mapped to a high-dimensional space, named as summary space. Then, a new representation is learned for each frame by analyzing the intrinsic structure of the summary space. Specifically, the learned representation can reflect the representativeness of the frame, and is utilized to select representative frames. Next, the perceptual hash algorithm is employed to measure the similarity of representative frames. As a result, the key frame set is obtained after filtering out similar frames from the representative frame set. Finally, the video summary is constructed by assigning the key frames in temporal order. Additionally, the ground truth, created by filtering out similar frames from human-created summaries, is utilized to evaluate the quality of the video summary. Compared with several traditional approaches, the experimental results on 80 videos from two datasets indicate the superior performance of our approach.

42 citations


Journal ArticleDOI
TL;DR: This paper mainly calculates H-S and S-V color histograms of every frame in a video shot and compares the similarity between histograms to detect and locate tampered frames in the shot and utilizes SURF feature extraction and FLANN matching to confirm the forgery types in the tampered locations.
Abstract: Frame insertion, deletion and duplication are common inter-frame tampering operations in digital videos. In this paper, based on similarity analysis, a passive-blind forensics scheme for video shots is proposed to detect inter-frame forgeries. This method is composed of two parts: HSV (Hue-Saturation-Value) color histogram comparison and SURF (Speeded Up Robust Features) feature extraction together with FLANN (Fast Library for Approximate Nearest Neighbors) matching for double-checking. We mainly calculate H-S and S-V color histograms of every frame in a video shot and compare the similarity between histograms to detect and locate tampered frames in the shot. Then we utilize SURF feature extraction and FLANN matching to further confirm the forgery types in the tampered locations. Experimental results demonstrate that the proposed detection method is efficient and accurate in terms of forgery identification and localization. In contrast to other inter-frame forgery detection methods, our scheme can detect three kinds of forgery operations and has its own superiority and applicability as a passive-blind detection method.

39 citations


Journal ArticleDOI
TL;DR: An adaptive frame-level QP selection algorithm is proposed for the H.265/HEVC random access coding by taking into account the inter-frame dependency, and results show that in comparison with HM-16.0, the proposed algorithm reduces the BD-rate by 3.49% with negligible increase of encoding time.
Abstract: Rate-distortion optimization (RDO) is widely applied in video coding, which aims at minimizing the coding distortion at a target bitrate. Conventionally, RDO is performed independently on each individual frame to avoid high computational complexity. However, extensive use of temporal/spatial predictions result in strong coding dependencies among neighboring frames, which make the current RDO be non-optimally used. To further improve video coding performance, it would be desirable to perform global RDO among a group of neighboring frames while maintaining approximately the same coding complexity. In this paper, the problem of global RDO is studied by jointly determining the quantization parameters (QPs) for a group of neighboring frames. Specifically, an adaptive frame-level QP selection algorithm is proposed for the H.265/HEVC random access coding by taking into account the inter-frame dependency. To measure the inter-frame dependency, a model based on the energy of prediction residuals is first established. With the help of the model, the problem of global RDO is then analyzed for the hierarchical coding structure in H.265/HEVC. Finally, the QP and the corresponding Lagrangian multiplier for each coding frame are determined adaptively by considering the total impact of its coding distortion on that of future frames in the encoding order. Experimental results show that in comparison with HM-16.0, the proposed algorithm reduces, on average, the BD-rate by 3.49% with negligible increase of encoding time. In addition, the quality fluctuation of the coded video by the proposed algorithm is lower than that by HM-16.0.

29 citations


Journal ArticleDOI
TL;DR: This study proposes the use of graph matching (GM) to enable 3D motion capture for Indian sign language recognition and demonstrates that the approach increases the accuracy of recognizing signs in continuous sentences.
Abstract: A machine cannot easily understand and interpret three-dimensional (3D) data. In this study, we propose the use of graph matching (GM) to enable 3D motion capture for Indian sign language recognition. The sign classification and recognition problem for interpreting 3D motion signs is considered an adaptive GM (AGM) problem. However, the current models for solving an AGM problem have two major drawbacks. First, spatial matching can be performed on a fixed set of frames with a fixed number of nodes. Second, temporal matching divides the entire 3D dataset into a fixed number of pyramids. The proposed approach solves these problems by employing interframe GM for performing spatial matching and employing multiple intraframe GM for performing temporal matching. To test the proposed model, a 3D sign language dataset is created that involves 200 continuous sentences in the sign language through a motion capture setup with eight cameras.The method is also validated on 3D motion capture benchmark action dataset HDM05 and CMU. We demonstrated that our approach increases the accuracy of recognizing signs in continuous sentences.

25 citations


Journal ArticleDOI
TL;DR: This paper proposes a method for zooming detection and it is incorporated in video tampering detection, capable of differentiating various inter-frame tamper events and its localization in the temporal domain.

20 citations


Book ChapterDOI
17 Dec 2018
TL;DR: This paper proposes a deep learning based digital forensic technique using 3D Convolutional Neural Network (3D-CNN) for detection of the above form of video forgery, and proves the performance efficiency of the proposed 3D CNN model is \(97\%\) on an average, and is applicable to a wide range of video quality.
Abstract: With the present-day rapid growth in use of low-cost yet efficient video manipulating software, it has become extremely crucial to authenticate and check the integrity of digital videos, before they are used in sensitive contexts. For example, a CCTV footage acting as the primary source of evidence towards a crime scene. In this paper, we deal with a specific class of video forgery detection, viz., inter-frame forgery detection. We propose a deep learning based digital forensic technique using 3D Convolutional Neural Network (3D-CNN) for detection of the above form of video forgery. In the proposed model, we introduce a difference layer in the CNN, which mainly targets to extract the temporal information from the videos. This in turn, helps in efficient inter-frame video forgery detection, given the fact that, temporal information constitute the most suitable form of features for inter-frame anomaly detection. Our experimental results prove that the performance efficiency of the proposed deep learning 3D CNN model is \(97\%\) on an average, and is applicable to a wide range of video quality.

17 citations


Journal ArticleDOI
01 Aug 2018
TL;DR: The proposed STTFR algorithm aims to verify video integrity through the creation of a 128 bit message digest from the input video of variable length that will be unique to that video and acts as a fingerprint.
Abstract: This paper discusses a novel approach to detect inter frame and intra frame video forgery using content based signature. A novel technique called the “Spatio Temporal Triad Feature Relationship” (STTFR) is employed to generate a unique content based signature – value for any given video sequence. The proposed STTFR algorithm aims to verify video integrity through the creation of a 128 bit message digest from the input video of variable length that will be unique to that video and acts as a fingerprint. Change in the video sequence, either at the spatial or at the temporal level will result in a different fingerprint than the one obtained originally. The knowledge of the signature will not enable any person/entity to recreate the original video as the signature is generated by combining spatial and temporal fingerprints in an orderly and systematic approach. We have verified our technique with standard datasets and found accurate results.

16 citations


Journal ArticleDOI
TL;DR: This paper proposed a fusion of audio forensics detection methods for video inter-frame forgery by extracting the results of the audio channel and the video frame sequence channel and using the QDCT feature to fine detect the suspected forgery location.

14 citations


Journal ArticleDOI
TL;DR: A region-based multiple description coding scheme is proposed for robust 3-D video communication in this paper, in which two descriptions are formed by setting the left and right view as dominant in the first and second description, respectively.
Abstract: Interframe and interview predictions are widely employed in multiview video coding. This technique improves the coding efficiency, but it also increases the vulnerability of the coded bitstream. Thus, one packet loss will affect many subsequent frames in the same view and probably in other referenced views. To address this problem, a region-based multiple description coding scheme is proposed for robust 3-D video communication in this paper, in which two descriptions are formed by setting the left and right view as dominant in the first and second description, respectively. This approach exploits the fact that most regions in the reference view could be synthesized from the base view. Hence, these regions could be skipped or only coarsely encoded. In our work, the disoccluded regions, illumination-affected regions, and remaining regions are first determined and extracted. By assigning different quantization parameters for these three different regions according to the network status, an efficient multiple description scheme is formed. Experimental results demonstrate that the proposed scheme achieves considerably better performance compared with the traditional approach.

14 citations


Journal ArticleDOI
TL;DR: It is found that, in MDC streams, the best policy is to encode selective frames as I-frame instead of coding some macroblocks of frames in intra mode, and a cost function based on which intra/inter frame type is decided is developed.
Abstract: Multiple description coding (MDC) is a technique for video transmission over error prone networks where the descriptions are routed over multiple paths. Intra coding such as MDC provides error resiliency but coding in this mode must be decided with care since it degrades the compression ratio. In this paper, we present our investigation results for a new intra coding approach in MDC. We have found that, in MDC streams, the best policy is to encode selective frames as I-frame instead of coding some macroblocks of frames in intra mode. In order to find the most suitable I-frame positions within a given video stream, we developed a cost function based on which intra/inter frame type is decided. The MDC scheme with the proposed intra coding criterion, with and without redundancy optimization, is implemented in the H.264/AVC reference software, JM16.0. Based on the experimental performance evaluation, we show that our method achieves higher average PSNR compared to the other optimized MDCs found in the literature.

Proceedings ArticleDOI
25 Jul 2018
TL;DR: Aiming at the detection of moving objects in video series, a moving object detection algorithm based on background difference method and inter-frame difference method is proposed and overcomes the problems of false detection and empty in the previous detection algorithms.
Abstract: Aiming at the detection of moving objects in video series, a moving object detection algorithm based on background difference method and inter-frame difference method is proposed. A new background update method is proposed to update the unchanged background area into the background frame. Experiments show that this method overcomes the problems of false detection and empty in the previous detection algorithms. The method can meet the need of real-time detection and tracking of moving targets with the advantages of high accuracy and fast calculation speed.

Journal ArticleDOI
TL;DR: Experimental results demonstrate that the scheme has high detection and localization accuracy and the algorithm is composed of feature extraction and abnormal point localization.
Abstract: Surveillance systems are ubiquitous in our lives, and surveillance videos are often used as significant evidence for judicial forensics. However, the authenticity of surveillance videos is difficult to guarantee. Ascertaining the authenticity of surveillance video is an urgent problem. Inter-frame forgery is one of the most common ways for video tampering. The forgery will reduce the correlation between adjacent frames at tampering position. Therefore, the correlation can be used to detect tamper operation. The algorithm is composed of feature extraction and abnormal point localization. During feature extraction, we extract the 2-D phase congruency of each frame, since it is a good image characteristic. Then calculate the correlation between the adjacent frames. In the second phase, the abnormal points were detected by using k-means clustering algorithm. The normal and abnormal points were clustered into two categories. Experimental results demonstrate that the scheme has high detection and localization accuracy.

Proceedings ArticleDOI
01 Nov 2018
TL;DR: A two stream CNN framework for video-based driving behaviour recognition, in which spatial stream CNN captures appearance information from still frames, whilst temporal streamCNN captures motion information with pre-computed optical flow displacement between a few adjacent video frames is employed.
Abstract: Abnormal driving behaviour is one of the leading cause of terrible traffic accidents endangering human life. Therefore, study on driving behaviour surveillance has become essential to traffic security and public management. In this paper, we conduct this promising research and employ a two stream CNN framework for video-based driving behaviour recognition, in which spatial stream CNN captures appearance information from still frames, whilst temporal stream CNN captures motion information with pre-computed optical flow displacement between a few adjacent video frames. We investigate different spatial-temporal fusion strategies to combine the intra frame static clues and inter frame dynamic clues for final behaviour recognition. So as to validate the effectiveness of the designed spatial-temporal deep learning based model, we create a simulated driving behaviour dataset, containing 1237 videos with 6 different driving behavior for recognition. Experiment result shows that our proposed method obtains noticeable performance improvements compared to the existing methods.

Journal ArticleDOI
TL;DR: Experimental results results show the improvement for the proposed approach over other block matching algorithms in terms of the performance measures.
Abstract: Block matching (BM) motion estimation plays an inevitable role in video coding applications. BM approaches are used for data compression. The compression is achieved by removing the temporal redundancy in the video sequences. In the BM process, each video frame is subdivided into macroblocks. Each macroblock in the current frame is compared with the previous frame. The main objective is to minimize sum absolute difference. In this work, some modifications have been performed on conventional artificial bee colony algorithm to improve the conventional BM systems. An initial pattern is used in the proposed algorithm to reduce the computational cost. The computational cost is represented in terms of search points and convergence time. Experimental results results show the improvement for the proposed approach over other block matching algorithms in terms of the performance measures.

Patent
25 Sep 2018
TL;DR: In this paper, an intra-frame and inter-frame combined prediction method for P frames or B frames is proposed, which consists of self-adaptively selecting by means of a rate-distortion optimization (RDO) decision whether to use the intra frame and inter frame combined prediction or not.
Abstract: An intra-frame and inter-frame combined prediction method for P frames or B frames. The method comprises: self-adaptively selecting by means of a rate-distortion optimization (RDO) decision whether to use the intra-frame and inter-frame combined prediction or not; using a method for weighting an intra prediction block and an inter prediction block in the intra-frame and inter-frame combined prediction to obtain a final prediction block; and obtaining the weighting coefficient of the intra prediction block and the inter prediction block according to prediction distortion statistics of the prediction method. Therefore, prediction precision can be improved, and coding and decoding efficiency of the prediction blocks are improved. The advantages of intra prediction and inter prediction are fully utilized in the present invention; and the optimal prediction parts of the two methods are selected to be combined, so that to a certain extent, areas with excessive distortion can be removed out of the intra prediction block and the inter prediction block, thus obtaining a better prediction effect and achieving excellent practicality and robustness.

Journal ArticleDOI
TL;DR: A novel 3D global matching algorithm to handle the challenging reconstruction of RGB-D datasets whose inter-frame overlap is small due to insufficient temporal sampling or fast camera movement, and a novel global model for alignment pruning and pose optimization is proposed.
Abstract: We present a novel 3D global matching algorithm, Sparse3D, to handle the challenging reconstruction of RGB-D datasets whose inter-frame overlap is small due to insufficient temporal sampling or fast camera movement. To support a more reliable reconstruction, two major technical components are proposed: (1) pairwise alignment using a set of complementary features, and (2) a novel global model for alignment pruning and pose optimization. We examine the effectiveness of our algorithm on multiple benchmark datasets under various inter-frame overlap, and demonstrate it better reliability over existing RGB-D reconstruction algorithms.

Proceedings ArticleDOI
14 Jun 2018
TL;DR: The experimental result exemplifies that the algorithm not only provides higher security but also good video quality and can withstand against attacks.
Abstract: In this paper, a novel video steganography scheme is proposed based on random integer generation in DCT domain. The proposed technique detects the carrier frames using scene change detection. The scene change is identified by the interframe difference of the DCT coefficients. Once the carrier frame is detected the carrier frame is divided into sub-images. The DCT coefficients of the sub-images are estimated and the 8 least significant DCT coefficients are replaced by the threshold value. The threshold value depends on the confidential information to be hidden either 0 or 1. The position of the confidential information depends on the random integer generated. The confidential information is shuffled based on the randomly generated integer, which increases the security. The experimental result exemplifies that the algorithm not only provides higher security but also good video quality and can withstand against attacks.

Journal ArticleDOI
Zhong Luan1, Hao Zeng1, Yuanyuan Shang1, Zhuhong Shao, Hui Ding 
TL;DR: The proposed algorithm greatly improved the efficiency of video dehazing and avoided halos and block effect and a new quad-tree method to estimate the atmospheric light was proposed.
Abstract: To reduce the computational complexity and maintain the effect of video dehazing, a fast and accurate video dehazing method is presented. The preliminary transmission map is estimated by the minimum channel of each pixel. An adjustment parameter is designed to fix the transmission map to reduce color distortion in the sky area. We propose a new quad-tree method to estimate the atmospheric light. In video dehazing stage, we keep the atmospheric light unchanged in the same scene by a simple but efficient parameter, which describes the similarity of the interframe image content. By using this method, unexpected flickers are effectively eliminated. Experiments results show that the proposed algorithm greatly improved the efficiency of video dehazing and avoided halos and block effect.

Patent
29 Jun 2018
TL;DR: Wang et al. as discussed by the authors proposed a convolutional neural network fusion-based ship video detection method, which comprises the four parts of preprocessing a video; obtaining an ROI of each frame and extracting low layer features; obtaining high layer features of image by using a modified VGG16 network; and predicting a ship saliency map of the ROI and extracting a ship target.
Abstract: The invention discloses an inter-frame difference and convolutional neural network fusion-based ship video detection method. The method comprises the four parts of preprocessing a video; obtaining anROI of each frame and extracting low layer features; obtaining high layer features of each frame of image by using a modified VGG16 network; and predicting a ship saliency map of the ROI of each frameand extracting a ship target. A relationship between continuous video frames is fully utilized; the interference of a background is reduced; a moving ship is accurately located; a ship moving regionis obtained; and compared with ship image saliency detection only using the low layer features, the method not only can be directly applied to the ship video detection but also reduces the situation of incomplete ship detection, has higher adaptability to a complex inland river moving ship scene, has higher detection precision, solves the problem of inaccurate inland river ship target saliency detection, and has extremely high practical application values.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: A new method to apply deep learning to bidirectional interframe prediction in video compression to create an interpolated frame by the geometric transformation matrices estimated by CNN whose inputs are temporally previous and future frames.
Abstract: In this paper, we propose a new method to apply deep learning to bidirectional interframe prediction in video compression. The novelty of the proposed method is to create an interpolated frame by the geometric transformation matrices estimated by CNN whose inputs are temporally previous and future frames. The proposed method can achieve considerably higher efficiency for bidirectional prediction because the geometric transformation matrix estimated by learning can express parallel translation, zoom in/out and change of blurriness with arbitrary accuracy. Experimental results show the prediction error reduction of over 30% compared with H.265/HEVC, especially for video sequences with small motion.

Proceedings ArticleDOI
01 Nov 2018
TL;DR: This review paper provides a comprehensive review of the state-of-the-art fast ME algorithms for HEVC inter coding, for both integer-pixel and fractional-pixel ME algorithms.
Abstract: High Efficiency Video Coding (HEVC), the latest video coding standard, is becoming popular due to its excellent coding performance, in particular in the case of high-resolution video applications. However, the significant gain in performance is achieved at the cost of substantially higher encoding complexity than its precedent H.264/AVC, in which motion estimation (ME) is one of the most time-consuming parts that effectively removes temporal redundancy. During the development, especially after the release of H.265/HEVC, plenty of fast ME algorithms have been developed to reduce the motion estimation complexity for better application of HEVC into practical real-time video applications. In this review, we provide a comprehensive review of the state-of-the-art fast ME algorithms for HEVC inter coding, for both integer-pixel and fractional-pixel ME algorithms. In all, this review paper provides a comprehensive review of the recent advances of ME for HEVC inter frame coding and hopefully it may provide valuable leads for the improvement, implementation and applications of HEVC inter-prediction as well as for the ongoing development of the next generation video coding standard.

Journal ArticleDOI
TL;DR: An improved moving objects detection method based on four inter-frame differential method and optical flow algorithm is proposed, which enhances the processing speed of optical flow method and reduces the effects of environment's illumination.
Abstract: To solve the problem of multiple targets' detection and tracking under the complex environment, in this paper, an improved moving objects detection method is proposed based on four inter-frame differential method and optical flow algorithm. Firstly, four inter-frame difference method is used to process the of video sequences. Then objects in the video is detected accurately by the optical flow algorithm used on light streaming video sequences. This improved method enhances the processing speed of optical flow method and reduces the effects of environment's illumination. Finally, the paper compares the proposed algorithm with particle filter, ViBe algorithm under different scenarios with different moving targets and individual number. This improved method is proved not only with good robustness, but also can work more quickly and accurately on the target detection and tracking.

Proceedings ArticleDOI
01 Sep 2018
TL;DR: Through experiments, it was demonstrated that when the object to be detected moves in the field of view of a camera, the proposed method can distinguish between the presence and absence of the object.
Abstract: In this paper, we describe a detection algorithm for detecting moving objects in a video frame. The proposed method utilizes the interframe difference and applies dynamic binarization using discriminant analysis. Through experiments, it was demonstrated that when the object to be detected moves in the field of view of a camera, the proposed method can distinguish between the presence and absence of the object. The positions of the moving object in the image are determined by observing the histograms of each frame.

Proceedings ArticleDOI
01 Dec 2018
TL;DR: An inter prediction technique based on the ICP algorithm and variable-size macroblocks that can be used to significantly reduce the number of bits required to represent a point cloud video sequence by inter coding the geometry information.
Abstract: In recent years, 3D point clouds have gained more attention with the possibility of applications such as virtual reality, autonomous vehicles and 3D mapping of historical artifacts, among others. However, raw point clouds generate very large amounts of data. Thus, compression is essential to enable emerging 3D systems for communication and storage.This paper presents an inter prediction technique based on the ICP algorithm and variable-size macroblocks that can be used to significantly reduce the number of bits required to represent a point cloud video sequence by inter coding the geometry information. Since consecutive frames in dynamic point cloud sequences are not guaranteed to fill the exact same 3D volume, a spatial alignment step before motion estimation is required to increase the likelihood of good matchings and thus generate a high number of inter-coded macroblocks. A decision step is also included in order to select the most favorable coding mode: intra-coding, inter-coding or inter-coding with macroblock subdivision. The proposed technique was tested in the PCC MPEG reference software and four MPEG test sequences, obtaining average bitrate reductions of about 8% with PSNR gains up to 1dB.

Patent
Zhang Hao, Lei Shizhe, Wang Saibo, Mou Fan, Fu Ting 
15 Jun 2018
TL;DR: In this article, an inter-frame fast mode selection method based on a decision tree is proposed, which is characterized by obtaining CU information at a specific location with good correlation; carrying out decision tree prediction to obtain predictive coding of an optimum mode, and obtaining some information after current CU coding in real time; and by utilizing correlation of time domain and airspace information, and combined with relevant information of surrounding CUs, carrying out fine tuning on the number and sequence of interframe coding modes.
Abstract: The invention discloses an inter-frame fast mode selection method based on a decision tree. The method is characterized by obtaining CU information at a specific location with good correlation; carrying out decision tree prediction to obtain predictive coding of an optimum mode, and obtaining some information after current CU coding in real time; and by utilizing correlation of time domain and airspace information, and combined with relevant information of surrounding CUs, carrying out fine tuning on the number and sequence of inter-frame coding modes. The method can predict the inter-frame mode in advance, can adjust mode sequence in real time in the inter-frame mode prediction process and skip unnecessary mode prediction, thereby greatly reducing inter-frame mode prediction time and reducing coding time; and the method is simple and feasible, and facilitates industrialization promotion of a new video coding standard.

Proceedings ArticleDOI
01 Oct 2018
TL;DR: Two core contributions are proposed to improve the mapping performance by exploiting the information in multi-baseline observations and sequential depth estimations by integrating the sequential depth estimation of the same physical point in a robust probabilistic manner.
Abstract: State-of-the-art monocular dense mapping methods usually divide the image sequence into several separate multi-view stereo problems thus have limited utilization of the information in multi-baseline observations and sequential depth estimations. In this paper, two core contributions are proposed to improve the mapping performance by exploiting the information. The first is an adaptive baseline matching cost computation that uses the sequential input images to provide each pixel with wide-baseline observations. The second is a frame-to-frame propagated depth filter which integrates the sequential depth estimation of the same physical point in a robust probabilistic manner. Two contributions are integrated into a monocular dense mapping system that generates the depth maps in real-time for both pinhole and fisheye cameras. Our system is fully parallelized and can run at more than 25 fps on a Nvidia Jetson TX2. We compare our work with state-of-the-art methods on the public dataset. Onboard UAV mapping and handhold experiments are also used to demonstrate the performance of our method. For the benefit of the community, we make the implementation open source11https://github.com/HKUST-Aerial-Robotics/Pinhole-Fisheye-Mapping .

Journal ArticleDOI
TL;DR: Experimental results show that the proposed GBRL method can achieve bitrate error reduction and peak signal to noise ratio (PSNR) improvement especially for the sequences with large motion, compared to the state-of-the-art rate control methods.
Abstract: In order to meet the emerging demands of high-fidelity video services, a new video coding standard — High Efficiency Video Coding (HEVC) is developed to improve the compression performance of high definition (HD) videos and save half of the bitrate for the same perceptual video quality compared with H.264/Advanced Video Coding (AVC). Rate control still plays a significant role in HD video data transmission via the communication channel. However, R-lambda model based HEVC rate control algorithm does not take the relationship between the encoding complexity and Human Visual System (HVS) into account, what’s more, the convergence speed of Least Mean Square (LMS) algorithm is slow. In this paper, an adaptive gradient information and Broyden Fletcher Goldfarb Shanno (BFGS) based R-lambda model (GBRL) is proposed for the inter frame rate control, where the gradient based on Sobel operator can effectively measure the frame-content complexity and BFGS algorithm converges speedily than LMS algorithm. Experimental results show that the proposed GBRL method can achieve bitrate error reduction and peak signal to noise ratio (PSNR) improvement especially for the sequences with large motion, compared to the state-of-the-art rate control methods. In addition, if the optimal initial quantization parameter (QP) prediction model based on linear regression can be incorporated into the proposed GBRL method, the performance of rate control can be further improved.

Patent
29 Jun 2018
TL;DR: In this article, a video object segmentation method is described, which is based on transferring the segmentation result of a reference frame to at least one other frame in the video.
Abstract: The embodiment of the invention discloses a video object segmentation method and device, electronic equipment, a storage medium and a program. The method comprises the following steps that: in at least parts of frames of a video, carrying out the interframe transfer of the object segmentation result of a reference frame in sequence from the reference frame, and obtaining the object segmentation result of each other frame in at least parts of frames; determining other frames of lost objects of the object segmentation result of the reference frame in at least parts of frames; taking other determined frames as target frames to segment the lost object so as to update the object segmentation result of the target frame; and transferring the object segmentation result of which the target frame isupdated to at least one other frame in the video. By use of the embodiment of the invention, video object segmentation result accuracy is improved.

Patent
05 Jun 2018
TL;DR: In this paper, the authors proposed a fast selection method for inter-frame prediction, which comprises the steps of judging whether a current coding unit is a minimal coding unit with preset depth; if no, dividing the current coding units into four sub-coding units; computing a rate-distortion cost of the current unit under a split mode and a minimal ratedistortioncost under a to-be-selected undivided mode; and determining the best prediction mode of the currently coding unit according to the rate distortion cost under the split mode, and the minimal rate-
Abstract: The invention provides a fast selection method and a fast selection device of an inter-frame prediction mode, and electronic equipment. The method comprises the steps of judging whether a current coding unit is a minimal coding unit with preset depth; if no, dividing the current coding unit into four sub-coding units; computing a rate-distortion cost of the current coding unit under a Split mode and a minimal rate-distortion cost of the current coding unit under a to-be-selected undivided mode; and determining the best prediction mode of the current coding unit according to the rate-distortioncost under the Split mode and the minimal rate-distortion cost under the to-be-selected undivided mode. According to the method provided by the invention, the best prediction mode is determined according to the rate-distortion cost under the Split mode and the minimal rate-distortion cost under the to-be-selected undivided mode, good coding quality and high coding rate can be simultaneously effectively achieved, the coding rate can be greatly improved under the premise of ensuring the coding quality, and the problem that the existing method is difficult to simultaneously achieve good coding quality and high coding rate is alleviated.