Showing papers on "Motion estimation published in 2021"

PDF

Open Access

Journal Article•DOI•

MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

[...]

Wenbo Bao¹, Wei-Sheng Lai², Xiaoyun Zhang¹, Zhiyong Gao¹, Ming-Hsuan Yang² - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, University of California, Merced²

01 Mar 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: In this article, a novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels, which is fully differentiable such that both the flow and kernel estimation networks can be optimized.

...read moreread less

Abstract: Motion estimation (ME) and motion compensation (MC) have been widely used for classical video frame interpolation systems over the past decades. Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed. However, existing learning based methods typically estimate either flow or compensation kernels, thereby limiting performance on both computational efficiency and interpolation accuracy. In this work, we propose a motion estimation and compensation driven neural network for video frame interpolation. A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels. This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly. The proposed model benefits from the advantages of motion estimation and compensation methods without using hand-crafted features. Compared to existing methods, our approach is computationally efficient and able to generate more visually appealing results. Furthermore, the proposed MEMC-Net architecture can be seamlessly adapted to several video enhancement tasks, e.g., super-resolution, denoising, and deblocking. Extensive quantitative and qualitative evaluations demonstrate that the proposed method performs favorably against the state-of-the-art video frame interpolation and enhancement algorithms on a wide range of datasets.

...read moreread less

168 citations

Journal Article•DOI•

An End-to-End Learning Framework for Video Compression

[...]

Guo Lu¹, Xiaoyun Zhang¹, Wanli Ouyang², Li Chen¹, Zhiyong Gao¹, Dong Xu² - Show less +2 more•Institutions (2)

Shanghai Jiao Tong University¹, University of Sydney²

01 Oct 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes the first end-to-end deep video compression framework that can outperform the widely used video coding standard H.264 and be even on par with the latest standard H265.

...read moreread less

Abstract: Traditional video compression approaches build upon the hybrid coding framework with motion-compensated prediction and residual transform coding. In this paper, we propose the first end-to-end deep video compression framework to take advantage of both the classical compression architecture and the powerful non-linear representation ability of neural networks. Our framework employs pixel-wise motion information, which is learned from an optical flow network and further compressed by an auto-encoder network to save bits. The other compression components are also implemented by the well-designed networks for high efficiency. All the modules are jointly optimized by using the rate-distortion trade-off and can collaborate with each other. More importantly, the proposed deep video compression framework is very flexible and can be easily extended by using lightweight or advanced networks for higher speed or better efficiency. We also propose to introduce the adaptive quantization layer to reduce the number of parameters for variable bitrate coding. Comprehensive experimental results demonstrate the effectiveness of the proposed framework on the benchmark datasets.

...read moreread less

123 citations

Proceedings Article•DOI•

FVC: A New Framework towards Deep Video Compression in Feature Space

[...]

Zhihao Hu¹, Guo Lu², Dong Xu³•Institutions (3)

Beihang University¹, Beijing Institute of Technology², University of Sydney³

01 Jun 2021

TL;DR: Wang et al. as mentioned in this paper proposed a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space.

...read moreread less

Abstract: Learning based video compression attracts increasing attention in the past few years. The previous hybrid coding approaches rely on pixel space operations to reduce spatial and temporal redundancy, which may suffer from inaccurate motion estimation or less effective motion compensation. In this work, we propose a feature-space video coding network (FVC) by performing all major operations (i.e., motion estimation, motion compression, motion compensation and residual compression) in the feature space. Specifically, in the proposed deformable compensation module, we first apply motion estimation in the feature space to produce motion information (i.e., the offset maps), which will be compressed by using the auto-encoder style network. Then we perform motion compensation by using deformable convolution and generate the predicted feature. After that, we compress the residual feature between the feature from the current frame and the predicted feature from our deformable compensation module. For better frame reconstruction, the reference features from multiple previous reconstructed frames are also fused by using the nonlocal attention mechanism in the multi-frame feature fusion module. Comprehensive experimental results demonstrate that the proposed framework achieves the state-of-the-art performance on four benchmark datasets including HEVC, UVG, VTL and MCL-JCV.

...read moreread less

120 citations

Proceedings Article•DOI•

Motion Representations for Articulated Animation

[...]

Aliaksandr Siarohin¹, Oliver Woodford¹, Jian Ren, Menglei Chai, Sergey Tulyakov - Show less +1 more•Institutions (1)

University of Trento¹

04 May 2021

TL;DR: In this article, a motion representation for animating articulated objects consisting of distinct parts is proposed, which can animate a variety of objects, including vehicles, pedestrians, and vehicles, in a completely unsupervised manner.

...read moreread less

Abstract: We propose novel motion representations for animating articulated objects consisting of distinct parts. In a completely unsupervised manner, our method identifies object parts, tracks them in a driving video, and infers their motions by considering their principal axes. In contrast to the previous keypoint-based works, our method extracts meaningful and consistent regions, describing locations, shape, and pose. The regions correspond to semantically relevant and distinct object parts, that are more easily detected in frames of the driving video. To force decoupling of foreground from background, we model non-object related global motion with an additional affine transformation. To facilitate animation and prevent the leakage of the shape of the driving object, we disentangle shape and pose of objects in the region space. Our model1 can animate a variety of objects, surpassing previous methods by a large margin on existing benchmarks. We present a challenging new benchmark with high-resolution videos and show that the improvement is particularly pronounced when articulated objects are considered, reaching 96.6% user preference vs. the state of the art.

...read moreread less

105 citations

Journal Article•DOI•

Correction to: An Efficient VLSI Architecture for Fast Motion Estimation Exploiting Zero Motion Prejudgment Technique and a New Quadrant-Based Search Algorithm in HEVC

[...]

Francis H. Shajin¹, P. Rajesh¹, M. Ramkumar Raja²•Institutions (2)

Anna University¹, King Khalid University²

30 Sep 2021-Circuits Systems and Signal Processing

TL;DR: New quadrant-based search algorithm with zero motion prejudgment method is proposed for motion estimation (ME) in HEVC (High Efficiency Video Coding) standard to obtain efficient output with low motion estimation time.

...read moreread less

Abstract: In this manuscript, new quadrant-based search algorithm with zero motion prejudgment is proposed for motion estimation (ME) in HEVC (High Efficiency Video Coding) standard. The HEVC standard is used to obtain efficient output with low motion estimation time. The proposed quadrant-based search algorithm is a fast block matching algorithm that obtain better block matching amid the current block and reference block. The zero motion prejudgment (ZMP) method is used to find the block, whether it is motion or static and it is used for decreasing the computational complexity (CC) in the proposed quadrant-based search algorithm. The proposed quadrant-based search algorithm with ZMP technique for motion estimation in HEVC is implemented on the FPGA hardware platform. The entire architecture is executed in Verilog HDL with Virtex-5 technology and integrated with Xilinx ISE Design Suite 14.5. The results are integrated into the CIF (352 × 288 pixels) and HD (1280 × 720 pixels) video input sequence. The evaluation metrics like PSNR, Motion estimation time, sum of absolute difference (SAD) value are analyzed with existing method like hexagon, adaptive root pattern algorithm, and diamond search algorithm. Then the hardware parameters like power consumption and maximum operating frequency are measured. The hardware utilization is reduced and the power consumption of the proposed model is diminished to 0.143 W. The maximal operating frequency of the proposed model is 440.470 MHz. The experimental outcomes demonstrate that the proposed motion evaluation approach in HEVC is more effective than existing algorithms.

...read moreread less

104 citations

Proceedings Article•DOI•

Deep Parametric Continuous Convolutional Neural Networks

[...]

Shenlong Wang¹, Shenlong Wang², Simon Suo², Simon Suo³, Wei-Chiu Ma², Andrei Pokrovsky², Raquel Urtasun², Raquel Urtasun¹ - Show less +4 more•Institutions (3)

University of Toronto¹, Uber ², University of Waterloo³

17 Jan 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: In this article, Parametric Continuous Convolutional Neural Networks (PCNNs) are proposed to exploit parameterized kernel functions that span the full continuous vector space, allowing them to learn over arbitrary data structures as long as their support relationship is computable.

...read moreread less

Abstract: Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks. This limits their applicability to many real-world applications. In this paper we propose Parametric Continuous Convolution, a new learnable operator that operates over non-grid structured data. The key idea is to exploit parameterized kernel functions that span the full continuous vector space. This generalization allows us to learn over arbitrary data structures as long as their support relationship is computable. Our experiments show significant improvement over the state-of-the-art in point cloud segmentation of indoor and outdoor scenes, and lidar motion estimation of driving scenes.

...read moreread less

74 citations

Journal Article•DOI•

A Progressive Fusion Generative Adversarial Network for Realistic and Consistent Video Super-Resolution

[...]

Peng Yi¹, Zhongyuan Wang¹, Kui Jiang¹, Junjun Jiang², Tao Lu³, Jiayi Ma¹ - Show less +2 more•Institutions (3)

Wuhan University¹, Harbin Institute of Technology², Wuhan Institute of Technology³

01 Jan 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A novel progressive fusion network for video SR, in which frames are processed in a way of progressive separation and fusion for the thorough utilization of spatio-temporal information, which incorporates multi-scale structure and hybrid convolutions into the network to capture a wide range of dependencies.

...read moreread less

Abstract: How to effectively fuse temporal information from consecutive frames remains to be a non-trivial problem in video super-resolution (SR), since most existing fusion strategies (direct fusion, slow fusion or 3D convolution) either fail to make full use of temporal information or cost too much calculation. To this end, we propose a novel progressive fusion network for video SR, in which frames are processed in a way of progressive separation and fusion for the thorough utilization of spatio-temporal information. We particularly incorporate multi-scale structure and hybrid convolutions into the network to capture a wide range of dependencies. We further propose a non-local operation to extract long-range spatio-temporal correlations directly, taking place of traditional motion estimation and motion compensation (ME&MC). This design relieves the complicated ME&MC algorithms, but enjoys better performance than various ME&MC schemes. Finally, we improve generative adversarial training for video SR to avoid temporal artifacts such as flickering and ghosting. In particular, we propose a frame variation loss with a single-sequence training method to generate more realistic and temporally consistent videos. Extensive experiments on public datasets show the superiority of our method over state-of-the-art methods in terms of performance and complexity. Our code is available at https://github.com/psychopa4/MSHPFNL.

...read moreread less

57 citations

Journal Article•DOI•

Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes

[...]

Hyeongseok Son¹, Junyong Lee¹, Jonghyeop Lee¹, Sunghyun Cho¹, Seungyong Lee² - Show less +1 more•Institutions (2)

Pohang University of Science and Technology¹, Association for Computing Machinery²

23 Aug 2021-arXiv: Computer Vision and Pattern Recognition

TL;DR: Wang et al. as mentioned in this paper proposed a blur-invariant motion estimation learning to improve motion estimation accuracy between blurry frames, and instead of aligning frames by warping with estimated motions, they use a pixel volume containing candidate sharp pixels to resolve motion estimation errors.

...read moreread less

Abstract: For the success of video deblurring, it is essential to utilize information from neighboring frames. Most state-of-the-art video deblurring methods adopt motion compensation between video frames to aggregate information from multiple frames that can help deblur a target frame. However, the motion compensation methods adopted by previous deblurring methods are not blur-invariant, and consequently, their accuracy is limited for blurry frames with different blur amounts. To alleviate this problem, we propose two novel approaches to deblur videos by effectively aggregating information from multiple video frames. First, we present blur-invariant motion estimation learning to improve motion estimation accuracy between blurry frames. Second, for motion compensation, instead of aligning frames by warping with estimated motions, we use a pixel volume that contains candidate sharp pixels to resolve motion estimation errors. We combine these two processes to propose an effective recurrent video deblurring network that fully exploits deblurred previous frames. Experiments show that our method achieves the state-of-the-art performance both quantitatively and qualitatively compared to recent methods that use deep learning.

...read moreread less

40 citations

Proceedings Article•

Asymmetric Bilateral Motion Estimation for Video Frame Interpolation

[...]

Junheum Park¹, Chul Lee², Chang-Su Kim¹•Institutions (2)

Korea University¹, Dongguk University²

01 Jan 2021

TL;DR: Wang et al. as discussed by the authors proposed a video frame interpolation algorithm based on asymmetric bilateral motion estimation (ABME), which synthesizes an intermediate frame between two input frames, and developed a new synthesis network that generates a set of dynamic filters and a residual frame using local and global information.

...read moreread less

Abstract: We propose a novel video frame interpolation algorithm based on asymmetric bilateral motion estimation (ABME), which synthesizes an intermediate frame between two input frames. First, we predict symmetric bilateral motion fields to interpolate an anchor frame. Second, we estimate asymmetric bilateral motions fields from the anchor frame to the input frames. Third, we use the asymmetric fields to warp the input frames backward and reconstruct the intermediate frame. Last, to refine the intermediate frame, we develop a new synthesis network that generates a set of dynamic filters and a residual frame using local and global information. Experimental results show that the proposed algorithm achieves excellent performance on various datasets. The source codes and pretrained models are available at this https URL.

...read moreread less

39 citations

Proceedings Article•DOI•

Self-Supervised Pillar Motion Learning for Autonomous Driving

[...]

Chenxu Luo, Xiaodong Yang, Alan L. Yuille¹•Institutions (1)

Johns Hopkins University¹

01 Jun 2021

TL;DR: In this paper, a self-supervised learning framework was proposed to estimate motion from point clouds and paired camera images using a probabilistic motion masking and cross-sensor motion regularization.

...read moreread less

Abstract: Autonomous driving can benefit from motion behavior comprehension when interacting with diverse traffic participants in highly dynamic environments. Recently, there has been a growing interest in estimating class-agnostic motion directly from point clouds. Current motion estimation methods usually require vast amount of annotated training data from self-driving scenes. However, manually labeling point clouds is notoriously difficult, error-prone and time-consuming. In this paper, we seek to answer the research question of whether the abundant unlabeled data collections can be utilized for accurate and efficient motion learning. To this end, we propose a learning framework that leverages free supervisory signals from point clouds and paired camera images to estimate motion purely via self-supervision. Our model involves a point cloud based structural consistency augmented with probabilistic motion masking as well as a cross-sensor motion regularization to realize the desired self-supervision. Experiments reveal that our approach performs competitively to supervised methods, and achieves the state-of-the-art result when combining our self-supervised model with supervised fine-tuning.

...read moreread less

37 citations

Proceedings Article•DOI•

MultiBodySync: Multi-Body Segmentation and Motion Estimation via 3D Scan Synchronization

[...]

Jiahui Huang¹, He Wang², Tolga Birdal², Minhyuk Sung³, Federica Arrigoni⁴, Shi-Min Hu¹, Leonidas J. Guibas² - Show less +3 more•Institutions (4)

Tsinghua University¹, Stanford University², Adobe Systems³, University of Trento⁴

01 Jun 2021

TL;DR: MultiBodySync as discussed by the authors proposes an end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds, which incorporates spectral synchronization into an iterative deep declarative network.

...read moreread less

Abstract: We present MultiBodySync, a novel, end-to-end trainable multi-body motion segmentation and rigid registration framework for multiple input 3D point clouds. The two non-trivial challenges posed by this multi-scan multibody setting that we investigate are: (i) guaranteeing correspondence and segmentation consistency across multiple input point clouds capturing different spatial arrangements of bodies or body parts; and (ii) obtaining robust motion-based rigid body segmentation applicable to novel object categories. We propose an approach to address these issues that incorporates spectral synchronization into an iterative deep declarative network, so as to simultaneously recover consistent correspondences as well as motion segmentation. At the same time, by explicitly disentangling the correspondence and motion segmentation estimation modules, we achieve strong generalizability across different object categories. Our extensive evaluations demonstrate that our method is effective on various datasets ranging from rigid parts in articulated objects to individually moving objects in a 3D scene, be it single-view or full point clouds. Code at https://github.com/huangjh-pub/multibody-sync.

...read moreread less

Journal Article•DOI•

Myocardial Function Imaging in Echocardiography Using Deep Learning

[...]

Andreas Ostvik¹, Ivar Mjaland Salte, Erik Smistad¹, Thuy Mi Nguyen, Daniela Melichova, Harald Brunvand, Kristina H. Haugaa², Thor Edvardsen², Bjørnar Grenne¹, Lasse Lovstakken¹ - Show less +6 more•Institutions (2)

Norwegian University of Science and Technology¹, University of Oslo²

25 Jan 2021-IEEE Transactions on Medical Imaging

TL;DR: In this paper, a deep learning based framework for motion estimation in echocardiography was proposed, which achieved an average end point error of (0.06±0.04) mm per frame using simulated data from an open access database, on par or better compared to previously reported state of the art.

...read moreread less

Abstract: Deformation imaging in echocardiography has been shown to have better diagnostic and prognostic value than conventional anatomical measures such as ejection fraction. However, despite clinical availability and demonstrated efficacy, everyday clinical use remains limited at many hospitals. The reasons are complex, but practical robustness has been questioned, and a large inter-vendor variability has been demonstrated. In this work, we propose a novel deep learning based framework for motion estimation in echocardiography, and use this to fully automate myocardial function imaging. A motion estimator was developed based on a PWC-Net architecture, which achieved an average end point error of (0.06±0.04) mm per frame using simulated data from an open access database, on par or better compared to previously reported state of the art. We further demonstrate unique adaptability to image artifacts such as signal dropouts, made possible using trained models that incorporate relevant image augmentations. Further, a fully automatic pipeline consisting of cardiac view classification, event detection, myocardial segmentation and motion estimation was developed and used to estimate left ventricular longitudinal strain in vivo. The method showed promise by achieving a mean deviation of (−0.7±1.6)% compared to a semi-automatic commercial solution for N=30 patients with relevant disease, within the expected limits of agreement. We thus believe that learning-based motion estimation can facilitate extended use of strain imaging in clinical practice.

...read moreread less

Journal Article•DOI•

Non-Rigid Respiratory Motion Estimation of Whole-Heart Coronary MR Images Using Unsupervised Deep Learning

[...]

Haikun Qi¹, Niccolo Fuin¹, Gastao Cruz¹, Jiazhen Pan¹, Thomas Kuestner¹, Aurelien Bustin¹, René M. Botnar¹, Claudia Prieto¹ - Show less +4 more•Institutions (1)

King's College London¹

01 Jan 2021-IEEE Transactions on Medical Imaging

TL;DR: The proposed RespME-net has achieved similar motion-corrected CMRA image quality to the conventional registration method regarding coronary artery length and sharpness, and can predict 3D non-rigid motion fields with subpixel accuracy within ~10 seconds, being ~20 times faster than a GPU-implemented state-of-the-art non- Rigid registration method.

...read moreread less

Abstract: Non-rigid motion-corrected reconstruction has been proposed to account for the complex motion of the heart in free-breathing 3D coronary magnetic resonance angiography (CMRA). This reconstruction framework requires efficient and accurate estimation of non-rigid motion fields from undersampled images at different respiratory positions (or bins). However, state-of-the-art registration methods can be time-consuming. This article presents a novel unsupervised deep learning-based strategy for fast estimation of inter-bin 3D non-rigid respiratory motion fields for motion-corrected free-breathing CMRA. The proposed 3D respiratory motion estimation network (RespME-net) is trained as a deep encoder-decoder network, taking pairs of 3D image patches extracted from CMRA volumes as input and outputting the motion field between image patches. Using image warping by the estimated motion field, a loss function that imposes image similarity and motion smoothness is adopted to enable training without ground truth motion field. RespME-net is trained patch-wise to circumvent the challenges of training a 3D network volume-wise which requires large amounts of GPU memory and 3D datasets. We perform 5-fold cross-validation with 45 CMRA datasets and demonstrate that RespME-net can predict 3D non-rigid motion fields with subpixel accuracy (0.44 ± 0.38 mm) within ~10 seconds, being ~20 times faster than a GPU-implemented state-of-the-art non-rigid registration method. Moreover, we perform non-rigid motion-compensated CMRA reconstruction for 9 additional patients. The proposed RespME-net has achieved similar motion-corrected CMRA image quality to the conventional registration method regarding coronary artery length and sharpness.

...read moreread less

Proceedings Article•DOI•

Learning to Segment Rigid Motions from Two Frames

[...]

Gengshan Yang¹, Deva Ramanan¹•Institutions (1)

Carnegie Mellon University¹

11 Jan 2021

TL;DR: This work proposes a modular network, whose architecture is motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field, and achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel.

...read moreread less

Abstract: Appearance-based detectors achieve remarkable performance on common scenes, benefiting from high-capacity models and massive annotated data, but tend to fail for scenarios that lack training data. Geometric motion segmentation algorithms, however, generalize to novel scenes, but have yet to achieve comparable performance to appearance-based ones, due to noisy motion estimations and degenerate motion configurations. To combine the best of both worlds, we propose a modular network, whose architecture is motivated by a geometric analysis of what independent object motions can be recovered from an egomotion field. It takes two consecutive frames as input and predicts segmentation masks for the background and multiple rigidly moving objects, which are then parameterized by 3D rigid transforma tions. Our method achieves state-of-the-art performance for rigid motion segmentation on KITTI and Sintel. The inferred rigid motions lead to a significant improvement for depth and scene flow estimation.

...read moreread less

Journal Article•DOI•

Plane-Edge-SLAM: Seamless Fusion of Planes and Edges for SLAM in Indoor Environments

[...]

Qinxuan Sun¹, Jing Yuan¹, Xuebo Zhang¹, Feng Duan¹•Institutions (1)

Nankai University¹

01 Oct 2021-IEEE Transactions on Automation Science and Engineering

TL;DR: A plane-edge-SLAM system using an RGB-D sensor and an adaptive weighting algorithm is developed to address the seamless fusion of planes and edges and benefits the performance of motion estimation.

...read moreread less

Abstract: Planes and edges are attractive features for simultaneous localization and mapping (SLAM) in indoor environments because they can be reliably extracted and are robust to illumination changes. However, it remains a challenging problem to seamlessly fuse two different kinds of features to avoid degeneracy and accurately estimate the camera motion. In this article, a plane-edge-SLAM system using an RGB-D sensor is developed to address the seamless fusion of planes and edges. Constraint analysis is first performed to obtain a quantitative measure of how the planes constrain the camera motion estimation. Then, using the results of the constraint analysis, an adaptive weighting algorithm is elaborately designed to achieve seamless fusion. Through the fusion of planes and edges, the solution to motion estimation is fully constrained, and the problem remains well-posed in all circumstances. In addition, a probabilistic plane fitting algorithm is proposed to fit a plane model to the noisy 3-D points. By exploiting the error model of the depth sensor, the proposed plane fitting is adaptive to various measurement noises corresponding to different depth measurements. As a result, the estimated plane parameters are more accurate and robust to the points with large uncertainties. Compared with the existing plane fitting methods, the proposed method definitely benefits the performance of motion estimation. The results of extensive experiments on public data sets and in real-world indoor scenes demonstrate that the plane-edge-SLAM system can achieve high accuracy and robustness. Note to Practitioners —This article is motivated by the robust localization and mapping for mobile robots. We suggest a novel simultaneous localization and mapping (SLAM) approach fusing the plane and edge features in indoor scenes (plane-edge-SLAM). This newly proposed approach works well in the textureless or dark scenes and is robust to the sensor noise. The experiments are carried out in various indoor scenes for mobile robots, and the results demonstrate the robustness and effectiveness of the proposed framework. In future work, we will address the fusion of other high-level features (for example, 3-D lines) and the active exploration of the environments.

...read moreread less

Journal Article•DOI•

Recurrent Video Deblurring with Blur-Invariant Motion Estimation and Pixel Volumes

[...]

Hyeongseok Son¹, Junyong Lee¹, Jonghyeop Lee¹, Sunghyun Cho¹, Seungyong Lee¹ - Show less +1 more•Institutions (1)

Pohang University of Science and Technology¹

21 Aug 2021-ACM Transactions on Graphics

TL;DR: For the success of video deblurring, it is essential to utilize information from neighboring frames as discussed by the authors, and most state-of-the-art methods adopt motion compensation between video frames to deblur the video.

...read moreread less

Proceedings Article•DOI•

HOME: Heatmap Output for future Motion Estimation

[...]

Thomas Gilles¹, Stefano Sabatini¹, Dzmitry Tsishkou¹, Bogdan Stanciulescu², Fabien Moutarde² - Show less +1 more•Institutions (2)

Huawei¹, Mines ParisTech²

19 Sep 2021

TL;DR: Home as mentioned in this paper uses a simple architecture with classic convolution networks coupled with an attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future.

...read moreread less

Abstract: In this paper, we propose HOME, a framework tackling the motion forecasting problem with an image output representing the probability distribution of the agent's future location. This method allows for a simple architecture with classic convolution networks coupled with attention mechanism for agent interactions, and outputs an unconstrained 2D top-view representation of the agent's possible future. Based on this output, we design two methods to sample a finite set of agent's future locations. These methods allow us to control the optimization trade-off between miss rate and final displacement error for multiple modalities without having to retrain any part of the model. We apply our method to the Argoverse Motion Forecasting Benchmark and achieve 1st place on the online leaderboard.

...read moreread less

Journal Article•DOI•

MIMO Radar Accurate Imaging and Motion Estimation for 3-D Maneuvering Ship Target

[...]

Ziying Hu¹, Wei Wang¹, Fuwang Dong¹•Institutions (1)

Harbin Engineering University¹

30 Jul 2021-IEEE Transactions on Instrumentation and Measurement

TL;DR: In this article, an accurate imaging and motion estimation method based on multiple-input-multiple-out (MIMO) radar is presented, where a preprocessing strategy is exerted based on the space-time adaptive processing (STAP) theory, clutter signals can be suppressed effectually with constructing a Doppler spectrum model.

...read moreread less

Abstract: Image deterioration problem occurs in radar imaging for ship target, which results from the complex time-varying motions of ship, the noise in channels, and the clutter on sea surface. It is hard to be solved effectively due to coherent accumulation sampling time and high-dimensional parametric model. Hence, an accurate imaging and motion estimation method based on multiple-input–multiple-out (MIMO) radar is presented. First, the multidimensional signal model is built to characterize target features accurately. To reduce the interference from sea clutters, a preprocessing strategy is exerted based on the space–time adaptive processing (STAP) theory, clutter signals can be suppressed effectually with constructing a Doppler spectrum model. Then, for accurate imaging and motion estimation, a combined trace norm minimization problem is deduced based on the relaxation of tensor rank, where the noise in sea environments is also considered. Meanwhile, generalized tensor total variation constraint is developed to ensure stable estimation and smooth imaging results when separating the noise term. Accordingly, an effective decomposition criterion is formulated based on alternating direction multiplier method (ADMM) strategy, and motion parameters can be precisely calculated based on the least square (LS) method. Finally, theoretical analysis and simulation results present the accurate performance of the proposed method.

...read moreread less

Journal Article•DOI•

Real-time high speed motion prediction using fast aperture-robust event-driven visual flow.

[...]

Himanshu Akolkar¹, Siohoi Ieng², Ryad Benosman²•Institutions (2)

University of Pittsburgh¹, International Society for Intelligence Research²

01 Jan 2021-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This paper proposes a novel multi-scale plane fitting based visual flow algorithm that is robust to the aperture problem and also computationally fast and efficient.

...read moreread less

Abstract: Optical flow is a crucial component of the feature space for early visual processing of dynamic scenes especially in new applications such as self-driving vehicles, drones and autonomous robots. The dynamic vision sensors are well suited for such applications because of their asynchronous, sparse and temporally precise representation of the visual dynamics. Many algorithms proposed for computing visual flow for these sensors suffer from the aperture problem as the direction of the estimated flow is governed by the curvature of the object rather than the true motion direction. Some methods that do overcome this problem by temporal windowing under-utilize the true precise temporal nature of the dynamic sensors. In this paper, we propose a novel multi-scale plane fitting based visual flow algorithm that is robust to the aperture problem and also computationally fast and efficient. Our algorithm performs well in many scenarios ranging from fixed camera recording simple geometric shapes to real world scenarios such as camera mounted on a moving car and can successfully perform event-by-event motion estimation of objects in the scene to allow for predictions of up to 500 ms i.e. equivalent to 10 to 25 frames with traditional cameras.

...read moreread less

Proceedings Article•DOI•

DeepTag: An Unsupervised Deep Learning Method for Motion Tracking on Cardiac Tagging Magnetic Resonance Images

[...]

Meng Ye¹, Mikael Kanski², Dong Yang³, Qi Chang¹, Zhennan Yan, Qiaoying Huang¹, Leon Axel², Dimitris N. Metaxas¹ - Show less +4 more•Institutions (3)

Rutgers University¹, New York University², Nvidia³

04 Mar 2021

TL;DR: Wang et al. as discussed by the authors proposed a novel deep learning-based fully unsupervised method for in vivo motion tracking on t-MRI images, which estimates the motion field (INF) between any two consecutive tMRI frames by a bi-directional generative diffeomorphic registration neural network and then estimates the Lagrangian motion field between the reference frame and any other frame through a differentiable composition layer.

...read moreread less

Abstract: Cardiac tagging magnetic resonance imaging (t-MRI) is the gold standard for regional myocardium deformation and cardiac strain estimation. However, this technique has not been widely used in clinical diagnosis, as a result of the difficulty of motion tracking encountered with t-MRI images. In this paper, we propose a novel deep learning-based fully unsupervised method for in vivo motion tracking on t-MRI images. We first estimate the motion field (INF) between any two consecutive t-MRI frames by a bi-directional generative diffeomorphic registration neural network. Using this result, we then estimate the Lagrangian motion field between the reference frame and any other frame through a differentiable composition layer. By utilizing temporal information to perform reasonable estimations on spatiotemporal motion fields, this novel method provides a useful solution for motion tracking and image registration in dynamic medical imaging. Our method has been validated on a representative clinical t-MRI dataset; the experimental results show that our method is superior to conventional motion tracking methods in terms of landmark tracking accuracy and inference efficiency. Project page is at: https://github.com/DeepTag/cardiac_tagging_motion_estimation.

...read moreread less

Journal Article•DOI•

Accurate IMU Factor Using Switched Linear Systems for VIO

[...]

John Henawy¹, Zhengguo Li¹, Wei-Yun Yau¹, Gerald Seet²•Institutions (2)

Agency for Science, Technology and Research¹, Nanyang Technological University²

01 Aug 2021-IEEE Transactions on Industrial Electronics

TL;DR: This work was supported in part by the National Robotics Programme (NRP) under SERC Grants 162 25 00036 and 192 25 00049.

...read moreread less

Abstract: Accurate motion estimation plays a crucial role in state estimation of an unmanned aerial vehicle (UAV). This is usually carried out by fusing the kinematics of an inertial measurement unit (IMU) with the video output of a camera. However, the accuracy of existing approaches is hindered by the discretization effect of the model even at a high IMU sampling rate. In order to improve the accuracy, we propose a new IMU motion integration model for the IMU kinematics in continuous time. The kinematics are modeled using a switched linear system. A closed-form discrete formulation is derived to compute the mean measurement, the covariance matrix, and the Jacobian matrix. Thus, it is more accurate and more efficient for online estimation of visual-inertial odometry (VIO), particularly when there is a high dynamic change in the agent's motion or the agent travels with high speed. The proposed IMU factor framework is evaluated using both real public datasets and indoor environment under different scenarios of motion capture. Our evaluation shows that the proposed framework outperforms the state-of-the-art VIO approach by up to 22.71% accuracy improvement on the EuRoc dataset and 38.15% accuracy improvement for motion estimation under the indoor environment.

...read moreread less

Journal Article•DOI•

Deep Network-Based Frame Extrapolation With Reference Frame Alignment

[...]

Shuai Huo¹, Dong Liu¹, Bin Li², Siwei Ma³, Feng Wu¹, Wen Gao³ - Show less +2 more•Institutions (3)

University of Science and Technology of China¹, Microsoft², Peking University³

01 Mar 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: This work introduces reference frame alignment as a key technique for deep network-based frame extrapolation, and proposes to align the reference frames, e.g. using block-based motion estimation and motion compensation, and extrapolate from the aligned frames by a trained deep network.

...read moreread less

Abstract: Frame extrapolation is to predict future frames from the past (reference) frames, which has been studied intensively in the computer vision research and has great potential in video coding. Recently, a number of studies have been devoted to the use of deep networks for frame extrapolation, which achieves certain success. However, due to the complex and diverse motion patterns in natural video, it is still difficult to extrapolate frames with high fidelity directly from reference frames. To address this problem, we introduce reference frame alignment as a key technique for deep network-based frame extrapolation. We propose to align the reference frames, e.g. using block-based motion estimation and motion compensation, and then to extrapolate from the aligned frames by a trained deep network. Since the alignment, a preprocessing step, effectively reduces the diversity of network input, we observe that the network is easier to train and the extrapolated frames are of higher quality. We verify the proposed technique in video coding, using the extrapolated frame for inter prediction in High Efficiency Video Coding (HEVC) and Versatile Video Coding (VVC). We investigate different schemes, including whether to align between the target frame and the reference frames, and whether to perform motion estimation on the extrapolated frame. We conduct a comprehensive set of experiments to study the efficiency of the proposed method and to compare different schemes. Experimental results show that our proposal achieves on average 5.3% and 2.8% BD-rate reduction in Y component compared to HEVC, under low-delay P and low-delay B configurations, respectively. Our proposal performs much better than the frame extrapolation without reference frame alignment.

...read moreread less

Journal Article•DOI•

Dynamic MRI reconstruction with end-to-end motion-guided network.

[...]

Qiaoying Huang¹, Yikun Xian¹, Dong Yang², Hui Qu¹, Jingru Yi¹, Pengxiang Wu¹, Dimitris N. Metaxas¹ - Show less +3 more•Institutions (2)

Rutgers University¹, Nvidia²

01 Feb 2021-Medical Image Analysis

TL;DR: A novel dynamic MRI reconstruction approach called MODRN and an end-to-end improved version called MOD RN(e2e), both of which enhance the reconstruction quality by infusing motion information into the modeling process with deep neural networks are proposed.

...read moreread less

Journal Article•DOI•

MDPET: A Unified Motion Correction and Denoising Adversarial Network for Low-Dose Gated PET

[...]

Bo Zhou¹, Yu-Jung Tsai¹, Xiongchao Chen¹, James S. Duncan¹, Chi Liu¹ - Show less +1 more•Institutions (1)

Yale University¹

28 Apr 2021-IEEE Transactions on Medical Imaging

TL;DR: Zhang et al. as mentioned in this paper proposed a Temporal Siamese Pyramid Network (TSP-Net) with basic units made up of 1.) siamese pyramid network and 2.) a recurrent layer for motion estimation among the gates.

...read moreread less

Abstract: In positron emission tomography (PET), gating is commonly utilized to reduce respiratory motion blurring and to facilitate motion correction methods. In application where low-dose gated PET is useful, reducing injection dose causes increased noise levels in gated images that could corrupt motion estimation and subsequent corrections, leading to inferior image quality. To address these issues, we propose MDPET, a unified motion correction and denoising adversarial network for generating motion-compensated low-noise images from low-dose gated PET data. Specifically, we proposed a Temporal Siamese Pyramid Network (TSP-Net) with basic units made up of 1.) Siamese Pyramid Network (SP-Net), and 2.) a recurrent layer for motion estimation among the gates. The denoising network is unified with our motion estimation network to simultaneously correct the motion and predict a motion-compensated denoised PET reconstruction. The experimental results on human data demonstrated that our MDPET can generate accurate motion estimation directly from low-dose gated images and produce high-quality motion-compensated low-noise reconstructions. Comparative studies with previous methods also show that our MDPET is able to generate superior motion estimation and denoising performance. Our code is available at https://github.com/bbbbbbzhou/MDPET .

...read moreread less

Journal Article•DOI•

MR-PET head motion correction based on co-registration of multicontrast MR images.

[...]

Zhaolin Chen¹, Francesco Sforazzini¹, Jakub Baran¹, Thomas G. Close¹, Nadim Jon Shah², Nadim Jon Shah¹, Gary F. Egan¹, Gary F. Egan³ - Show less +4 more•Institutions (3)

Monash University¹, Forschungszentrum Jülich², Australian Research Council³

01 Sep 2021-Human Brain Mapping

TL;DR: A fully automated PET motion correction method, MR‐guided MAF, based on the co‐registration of multicontrast MR images, which can reduce head motion introduced artefacts and improve the image sharpness and quantitative accuracy of PET images acquired using simultaneous MR‐PET scanners.

...read moreread less

Abstract: Head motion is a major source of image artefacts in neuroimaging studies and can lead to degradation of the quantitative accuracy of reconstructed PET images. Simultaneous magnetic resonance-positron emission tomography (MR-PET) makes it possible to estimate head motion information from high-resolution MR images and then correct motion artefacts in PET images. In this article, we introduce a fully automated PET motion correction method, MR-guided MAF, based on the co-registration of multicontrast MR images. The performance of the MR-guided MAF method was evaluated using MR-PET data acquired from a cohort of ten healthy participants who received a slow infusion of fluorodeoxyglucose ([18-F]FDG). Compared with conventional methods, MR-guided PET image reconstruction can reduce head motion introduced artefacts and improve the image sharpness and quantitative accuracy of PET images acquired using simultaneous MR-PET scanners. The fully automated motion estimation method has been implemented as a publicly available web-service.

...read moreread less

Proceedings Article•

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

[...]

Seokju Lee¹, Sunghoon Im², Stephen Lin³, In So Kweon¹•Institutions (3)

KAIST¹, Daegu Gyeongbuk Institute of Science and Technology², Microsoft³

18 May 2021

TL;DR: In this article, an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion, and depth in a monocular camera setup without supervision is presented.

...read moreread less

Abstract: We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion, and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we design a unified instance-aware photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we introduce a general-purpose auto-annotation scheme using any off-the-shelf instance segmentation and optical flow models to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are publicly available.

...read moreread less

Journal Article•DOI•

Compressed sensing plus motion (CS + M): A new perspective for improving undersampled MR image reconstruction.

[...]

Angelica I. Aviles-Rivero¹, Noémie Debroux², Guy Williams¹, Martin J. Graves¹, Carola-Bibiane Schönlieb¹ - Show less +1 more•Institutions (2)

University of Cambridge¹, University of Auvergne²

01 Feb 2021-Medical Image Analysis

TL;DR: This work proposes a framework for dynamic MRI reconstruction framed under a new multi-task optimisation model called Compressed Sensing Plus Motion (CS + M), and shows that the proposed scheme reduces blurring artefacts, and preserves the target shape and fine details in the reconstruction.

...read moreread less

Journal Article•DOI•

Predictive Generalized Graph Fourier Transform for Attribute Compression of Dynamic Point Clouds

[...]

Yiqun Xu¹, Wei Hu², Shanshe Wang², Xinfeng Zhang¹, Shiqi Wang³, Siwei Ma², Zongming Guo², Wen Gao² - Show less +4 more•Institutions (3)

Chinese Academy of Sciences¹, Peking University², City University of Hong Kong³

01 May 2021-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: In this article, the authors propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding and predictive transform coding, assuming the Gaussian Markov Random Field model with respect to a spatio-temporal graph underlying the attributes of dynamic points.

...read moreread less

Abstract: As 3D scanning devices and depth sensors advance, dynamic point clouds have attracted increasing attention as a format for 3D objects in motion, with applications in various fields such as immersive telepresence, navigation for autonomous driving and gaming. Nevertheless, the tremendous amount of data in dynamic point clouds significantly burden transmission and storage. To this end, we propose a complete compression framework for attributes of 3D dynamic point clouds, focusing on optimal inter-coding. Firstly, we derive the optimal inter-prediction and predictive transform coding assuming the Gaussian Markov Random Field model with respect to a spatio-temporal graph underlying the attributes of dynamic point clouds. The optimal predictive transform proves to be the Generalized Graph Fourier Transform in terms of spatio-temporal decorrelation. Secondly, we propose refined motion estimation via efficient registration prior to inter-prediction, which searches the temporal correspondence between adjacent frames of irregular point clouds. Finally, we present a complete framework based on the optimal inter-coding and our previously proposed intra-coding, where we determine the optimal coding mode from rate-distortion optimization with the proposed offline-trained $\lambda $ -Q model. Experimental results show that we achieve around 17% bit rate reduction on average over competitive dynamic point cloud compression methods.

...read moreread less

Journal Article•DOI•

T-LOAM: Truncated Least Squares LiDAR-Only Odometry and Mapping in Real Time

[...]

Pengwei Zhou¹, Xuexun Guo¹, Xiaofei Pei¹, Ci Chen¹•Institutions (1)

Wuhan University of Technology¹

03 Jun 2021-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: A novel, computationally efficient, and robust light detection and ranging (LiDAR)-only odometry framework based on truncated least squares termed T-LOAM, which focuses on alleviating the impact of outliers to allow robust navigation in sparse, noisy, or cluttered scenarios where degeneration occurs.

...read moreread less

Abstract: We propose a novel, computationally efficient, and robust light detection and ranging (LiDAR)-only odometry framework based on truncated least squares termed T-LOAM. Our method focuses on alleviating the impact of outliers to allow robust navigation in sparse, noisy, or cluttered scenarios where degeneration occurs. As preprocessing, the multiregion ground extraction and dynamic curved-voxel clustering methods are proposed to accomplish the segmentation of 3D point clouds and filter out unstable objects. A novel feature extraction module is tailored to discriminate four peculiar features: edge features, sphere features, planar features, and ground features. As frontend, a hierarchical feature-based LiDAR-only odometry performs precise motion estimates through the truncated least squares method for directly processing various features. The preprocessing model and motion estimation precision have been evaluated on the KITTI odometry benchmark as well as various campus scenarios. The experimental results have demonstrated the real-time capability and superior precision of the proposed T-LOAM over other state-of-the-art algorithms.

...read moreread less

Proceedings Article•DOI•

Spatiotemporal Registration for Event-based Visual Odometry

[...]

Daqi Liu¹, Alvaro Parra¹, Tat-Jun Chin¹•Institutions (1)

University of Adelaide¹

01 Jun 2021

TL;DR: This paper proposed spatiotemporal registration as a compelling technique for event-based rotational motion estimation, which produces feature tracks as a byproduct, which directly supports an efficient visual odometry pipeline with graph-based optimisation for motion averaging.

...read moreread less

Abstract: A useful application of event sensing is visual odometry, especially in settings that require high-temporal resolution. The state-of-the-art method of contrast maximisation recovers the motion from a batch of events by maximising the contrast of the image of warped events. However, the cost scales with image resolution and the temporal resolution can be limited by the need for large batch sizes to yield sufficient structure in the contrast image1. In this work, we propose spatiotemporal registration as a compelling technique for event-based rotational motion estimation. We theoretically justify the approach and establish its fundamental and practical advantages over contrast maximisation. In particular, spatiotemporal registration also produces feature tracks as a by-product, which directly supports an efficient visual odometry pipeline with graph-based optimisation for motion averaging. The simplicity of our visual odometry pipeline allows it to process more than 1 M events/second. We also contribute a new event dataset for visual odometry, where motion sequences with large velocity variations were acquired using a high-precision robot arm2.

...read moreread less

Collapse