Showing papers on "Motion estimation published in 2018"

PDF

Open Access

Proceedings Article•DOI•

Social GAN: Socially Acceptable Trajectories with Generative Adversarial Networks

[...]

Agrim Gupta¹, Justin Johnson¹, Li Fei-Fei¹, Silvio Savarese¹, Alexandre Alahi² - Show less +1 more•Institutions (2)

Stanford University¹, École Polytechnique²

29 Mar 2018

TL;DR: A recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people, and outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity.

...read moreread less

Abstract: Understanding human motion behavior is critical for autonomous moving platforms (like self-driving cars and social robots) if they are to navigate human-centric environments. This is challenging because human motion is inherently multimodal: given a history of human motion paths, there are many socially plausible ways that people could move in the future. We tackle this problem by combining tools from sequence prediction and generative adversarial networks: a recurrent sequence-to-sequence model observes motion histories and predicts future behavior, using a novel pooling mechanism to aggregate information across people. We predict socially plausible futures by training adversarially against a recurrent discriminator, and encourage diverse predictions with a novel variety loss. Through experiments on several datasets we demonstrate that our approach outperforms prior work in terms of accuracy, variety, collision avoidance, and computational complexity.

...read moreread less

1,461 citations

Proceedings Article•DOI•

Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation

[...]

Younghyun Jo¹, Seoung Wug Oh¹, Jaeyeon Kang¹, Seon Joo Kim¹•Institutions (1)

Yonsei University¹

01 Jun 2018

TL;DR: A novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation is proposed.

...read moreread less

Abstract: Video super-resolution (VSR) has become even more important recently to provide high resolution (HR) contents for ultra high definition displays. While many deep learning based VSR methods have been proposed, most of them rely heavily on the accuracy of motion estimation and compensation. We introduce a fundamentally different framework for VSR in this paper. We propose a novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation. With our approach, an HR image is reconstructed directly from the input image using the dynamic upsampling filters, and the fine details are added through the computed residual. Our network with the help of a new data augmentation technique can generate much sharper HR videos with temporal consistency, compared with the previous methods. We also provide analysis of our network through extensive experiments to show how the network deals with motions implicitly.

...read moreread less

503 citations

Proceedings Article•DOI•

Deep Parametric Continuous Convolutional Neural Networks

[...]

Shenlong Wang¹, Shenlong Wang², Simon Suo², Simon Suo³, Wei-Chiu Ma², Andrei Pokrovsky², Raquel Urtasun², Raquel Urtasun¹ - Show less +4 more•Institutions (3)

University of Toronto¹, Uber ², University of Waterloo³

01 Jun 2018

TL;DR: The key idea is to exploit parameterized kernel functions that span the full continuous vector space, which allows us to learn over arbitrary data structures as long as their support relationship is computable.

...read moreread less

Abstract: Standard convolutional neural networks assume a grid structured input is available and exploit discrete convolutions as their fundamental building blocks. This limits their applicability to many real-world applications. In this paper we propose Parametric Continuous Convolution, a new learnable operator that operates over non-grid structured data. The key idea is to exploit parameterized kernel functions that span the full continuous vector space. This generalization allows us to learn over arbitrary data structures as long as their support relationship is computable. Our experiments show significant improvement over the state-of-the-art in point cloud segmentation of indoor and outdoor scenes, and lidar motion estimation of driving scenes.

...read moreread less

392 citations

Book Chapter•DOI•

Deep Virtual Stereo Odometry: Leveraging Deep Depth Prediction for Monocular Direct Sparse Odometry

[...]

Nan Yang¹, Rui Wang¹, Jörg Stückler¹, Daniel Cremers¹•Institutions (1)

Technische Universität München¹

08 Sep 2018

TL;DR: The Deep Virtual Stereo Odometry incorporates deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements and designs a novel deep network that refines predicted depth from a single image in a two-stage process.

...read moreread less

Abstract: Monocular visual odometry approaches that purely rely on geometric cues are prone to scale drift and require sufficient motion parallax in successive frames for motion estimation and 3D reconstruction. In this paper, we propose to leverage deep monocular depth prediction to overcome limitations of geometry-based monocular visual odometry. To this end, we incorporate deep depth predictions into Direct Sparse Odometry (DSO) as direct virtual stereo measurements. For depth prediction, we design a novel deep network that refines predicted depth from a single image in a two-stage process. We train our network in a semi-supervised way on photoconsistency in stereo images and on consistency with accurate sparse depth reconstructions from Stereo DSO. Our deep predictions excel state-of-the-art approaches for monocular depth on the KITTI benchmark. Moreover, our Deep Virtual Stereo Odometry clearly exceeds previous monocular and deep-learning based methods in accuracy. It even achieves comparable performance to the state-of-the-art stereo methods, while only relying on a single camera.

...read moreread less

357 citations

Book Chapter•DOI•

DeepPhys: Video-Based Physiological Measurement Using Convolutional Attention Networks

[...]

Weixuan Chen¹, Daniel McDuff²•Institutions (2)

Massachusetts Institute of Technology¹, Microsoft²

08 Sep 2018

TL;DR: In this paper, the authors proposed an end-to-end system for video-based measurement of heart and breathing rate using a deep convolutional network and an attention mechanism using appearance information to guide motion estimation.

...read moreread less

Abstract: Non-contact video-based physiological measurement has many applications in health care and human-computer interaction. Practical applications require measurements to be accurate even in the presence of large head rotations. We propose the first end-to-end system for video-based measurement of heart and breathing rate using a deep convolutional network. The system features a new motion representation based on a skin reflection model and a new attention mechanism using appearance information to guide motion estimation, both of which enable robust measurement under heterogeneous lighting and major motions. Our approach significantly outperforms all current state-of-the-art methods on both RGB and infrared video datasets. Furthermore, it allows spatial-temporal distributions of physiological signals to be visualized via the attention mechanism.

...read moreread less

276 citations

Proceedings Article•DOI•

HP-GAN: Probabilistic 3D Human Motion Prediction via GAN

[...]

Emad Barsoum¹, John R. Kender¹, Zicheng Liu²•Institutions (2)

Columbia University¹, Microsoft²

18 Jun 2018

TL;DR: A novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which the model learns a probability density function of future human poses conditioned on previous poses.

...read moreread less

Abstract: Predicting and understanding human motion dynamics has many applications, such as motion synthesis, augmented reality, security, and autonomous vehicles. Due to the recent success of generative adversarial networks (GAN), there has been much interest in probabilistic estimation and synthetic data generation using deep neural network architectures and learning algorithms. We propose a novel sequence-to-sequence model for probabilistic human motion prediction, trained with a modified version of improved Wasserstein generative adversarial networks (WGAN-GP), in which we use a custom loss function designed for human motion prediction. Our model, which we call HP-GAN, learns a probability density function of future human poses conditioned on previous poses. It predicts multiple sequences of possible future human poses, each from the same input sequence but a different vector z drawn from a random distribution. Furthermore, to quantify the quality of the non-deterministic predictions, we simultaneously train a motion-quality-assessment model that learns the probability that a given skeleton sequence is a real human motion. We test our algorithm on two of the largest skeleton datasets: NTURGB-D and Human3.6M. We train our model on both single and multiple action types. Its predictive power for long-term motion estimation is demonstrated by generating multiple plausible futures of more than 30 frames from just 10 frames of input. We show that most sequences generated from the same input have more than 50% probabilities of being judged as a real human sequence. We published all the code used in this paper to https://github.com/ebarsoum/hpgan.

...read moreread less

231 citations

Proceedings Article•DOI•

Towards High Performance Video Object Detection

[...]

Xizhou Zhu¹, Jifeng Dai², Lu Yuan², Yichen Wei¹•Institutions (2)

University of Science and Technology of China¹, Microsoft²

18 Jun 2018

TL;DR: In this paper, a unified approach based on the principle of multi-frame end-to-end learning of features and cross-frame motion is proposed for video object detection, which steadily pushes forward the performance envelope (speed-accuracy tradeoff), towards high performance video object Detection.

...read moreread less

Abstract: There has been significant progresses for image object detection in recent years. Nevertheless, video object detection has received little attention, although it is more challenging and more important in practical scenarios. Built upon the recent works [37, 36], this work proposes a unified approach based on the principle of multi-frame end-to-end learning of features and cross-frame motion. Our approach extends prior works with three new techniques and steadily pushes forward the performance envelope (speed-accuracy tradeoff), towards high performance video object detection.

...read moreread less

217 citations

Proceedings Article•DOI•

StaticFusion: Background Reconstruction for Dense RGB-D SLAM in Dynamic Environments

[...]

Raluca Scona¹, Mariano Jaimez², Yvan Petillot¹, Maurice Fallon³, Daniel Cremers² - Show less +1 more•Institutions (3)

Heriot-Watt University¹, Technische Universität München², University of Oxford³

21 May 2018

TL;DR: A method for robust dense RGB-D SLAM in dynamic environments which detects moving objects and simultaneously reconstructs the background structure and achieves similar performance in static environments and improved accuracy and robustness in dynamic scenes is proposed.

...read moreread less

Abstract: Dynamic environments are challenging for visual SLAM as moving objects can impair camera pose tracking and cause corruptions to be integrated into the map In this paper, we propose a method for robust dense RGB-D SLAM in dynamic environments which detects moving objects and simultaneously reconstructs the background structure While most methods employ implicit robust penalisers or outlier filtering techniques in order to handle moving objects, our approach is to simultaneously estimate the camera motion as well as a probabilistic static/dynamic segmentation of the current RGB-D image pair This segmentation is then used for weighted dense RGB-D fusion to estimate a 3D model of only the static parts of the environment By leveraging the 3D model for frame-to-model alignment, as well as static/dynamic segmentation, camera motion estimation has reduced overall drift - as well as being more robust to the presence of dynamics in the scene Demonstrations are presented which compare the proposed method to related state-of-the-art approaches using both static and dynamic sequences The proposed method achieves similar performance in static environments and improved accuracy and robustness in dynamic scenes

...read moreread less

178 citations

Posted Content•

MEMC-Net: Motion Estimation and Motion Compensation Driven Neural Network for Video Interpolation and Enhancement

[...]

Wenbo Bao¹, Wei-Sheng Lai², Xiaoyun Zhang¹, Zhiyong Gao¹, Ming-Hsuan Yang² - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, University of California, Merced²

20 Oct 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels and is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly.

...read moreread less

Abstract: Motion estimation (ME) and motion compensation (MC) have been widely used for classical video frame interpolation systems over the past decades. Recently, a number of data-driven frame interpolation methods based on convolutional neural networks have been proposed. However, existing learning based methods typically estimate either flow or compensation kernels, thereby limiting performance on both computational efficiency and interpolation accuracy. In this work, we propose a motion estimation and compensation driven neural network for video frame interpolation. A novel adaptive warping layer is developed to integrate both optical flow and interpolation kernels to synthesize target frame pixels. This layer is fully differentiable such that both the flow and kernel estimation networks can be optimized jointly. The proposed model benefits from the advantages of motion estimation and compensation methods without using hand-crafted features. Compared to existing methods, our approach is computationally efficient and able to generate more visually appealing results. Furthermore, the proposed MEMC-Net can be seamlessly adapted to several video enhancement tasks, e.g., super-resolution, denoising, and deblocking. Extensive quantitative and qualitative evaluations demonstrate that the proposed method performs favorably against the state-of-the-art video frame interpolation and enhancement algorithms on a wide range of datasets.

...read moreread less

170 citations

Journal Article•DOI•

Vibration-based damage detection in wind turbine blades using Phase-based Motion Estimation and motion magnification

[...]

Aral Sarrafi¹, Zhu Mao¹, Christopher Niezrecki¹, Peyman Poozesh¹•Institutions (1)

University of Massachusetts Lowell¹

12 May 2018-Journal of Sound and Vibration

TL;DR: The subtle motions from recorded video are extracted by means of Phase-based Motion Estimation (PME) and the extracted information is used to conduct damage identification on a 2.3-m long Skystream® wind turbine blade (WTB).

...read moreread less

163 citations

Journal Article•DOI•

Video Super-Resolution via Bidirectional Recurrent Convolutional Networks

[...]

Yan Huang¹, Wei Wang¹, Liang Wang¹•Institutions (1)

Chinese Academy of Sciences¹

01 Apr 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This model has a low computational complexity and runs orders of magnitude faster than other multi-frame SR methods, and with the powerful temporal dependency modeling, can super resolve videos with complex motions and achieve well performance.

...read moreread less

Abstract: Super resolving a low-resolution video, namely video super-resolution (SR), is usually handled by either single-image SR or multi-frame SR. Single-Image SR deals with each video frame independently, and ignores intrinsic temporal dependency of video frames which actually plays a very important role in video SR. Multi-Frame SR generally extracts motion information, e.g., optical flow, to model the temporal dependency, but often shows high computational cost. Considering that recurrent neural networks (RNNs) can model long-term temporal dependency of video sequences well, we propose a fully convolutional RNN named bidirectional recurrent convolutional network for efficient multi-frame SR. Different from vanilla RNNs, 1) the commonly-used full feedforward and recurrent connections are replaced with weight-sharing convolutional connections. So they can greatly reduce the large number of network parameters and well model the temporal dependency in a finer level, i.e., patch-based rather than frame-based, and 2) connections from input layers at previous timesteps to the current hidden layer are added by 3D feedforward convolutions, which aim to capture discriminate spatio-temporal patterns for short-term fast-varying motions in local adjacent frames. Due to the cheap convolutional operations, our model has a low computational complexity and runs orders of magnitude faster than other multi-frame SR methods. With the powerful temporal dependency modeling, our model can super resolve videos with complex motions and achieve well performance.

...read moreread less

Proceedings Article•DOI•

Precise Ego-Motion Estimation with Millimeter-Wave Radar Under Diverse and Challenging Conditions

[...]

Sarah H. Cen¹, Paul Newman¹•Institutions (1)

University of Oxford¹

21 May 2018

TL;DR: This paper presents a reliable and accurate radar-only motion estimation algorithm for mobile autonomous systems, using a frequency-modulated continuous-wave scanning radar to extract landmarks and performs scan matching by greedily adding point correspondences based on unary descriptors and pairwise compatibility scores.

...read moreread less

Abstract: In contrast to cameras, lidars, GPS, and proprioceptive sensors, radars are affordable and efficient systems that operate well under variable weather and lighting conditions, require no external infrastructure, and detect long-range objects. In this paper, we present a reliable and accurate radar-only motion estimation algorithm for mobile autonomous systems. Using a frequency-modulated continuous-wave (FMCW) scanning radar, we first extract landmarks with an algorithm that accounts for unwanted effects in radar returns. To estimate relative motion, we then perform scan matching by greedily adding point correspondences based on unary descriptors and pairwise compatibility scores. Our radar odometry results are robust under a variety of conditions, including those under which visual odometry and GPS/INS fail.

...read moreread less

Journal Article•DOI•

Three-dimensional motion corrected sensitivity encoding reconstruction for multi-shot multi-slice MRI: Application to neonatal brain imaging.

[...]

Lucilio Cordero-Grande¹, Emer Hughes¹, Jana Hutter¹, Anthony N. Price¹, Joseph V. Hajnal¹ - Show less +1 more•Institutions (1)

St Thomas' Hospital¹

01 Mar 2018-Magnetic Resonance in Medicine

TL;DR: To introduce a methodology for the reconstruction of multi‐shot, multi‐slice magnetic resonance imaging able to cope with both within‐plane and through‐plane rigid motion and to describe its application in structural brain imaging.

...read moreread less

Abstract: Purpose To introduce a methodology for the reconstruction of multi-shot, multi-slice magnetic resonance imaging able to cope with both within-plane and through-plane rigid motion and to describe its application in structural brain imaging. Theory and Methods The method alternates between motion estimation and reconstruction using a common objective function for both. Estimates of three-dimensional motion states for each shot and slice are gradually refined by improving on the fit of current reconstructions to the partial k-space information from multiple coils. Overlapped slices and super-resolution allow recovery of through-plane motion and outlier rejection discards artifacted shots. The method is applied to T2 and T1 brain scans acquired in different views. Results The procedure has greatly diminished artifacts in a database of 1883 neonatal image volumes, as assessed by image quality metrics and visual inspection. Examples showing the ability to correct for motion and robustness against damaged shots are provided. Combination of motion corrected reconstructions for different views has shown further artifact suppression and resolution recovery. Conclusion The proposed method addresses the problem of rigid motion in multi-shot multi-slice anatomical brain scans. Tests on a large collection of potentially corrupted datasets have shown a remarkable image quality improvement. Magn Reson Med, 2017. © 2017 The Authors Magnetic Resonance in Medicine published by Wiley Periodicals, Inc. on behalf of International Society for Magnetic Resonance in Medicine. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.

...read moreread less

Book Chapter•DOI•

Joint learning of motion estimation and segmentation for cardiac MR image sequences

[...]

Chen Qin¹, Wenjia Bai¹, Jo Schlemper¹, Steffen E. Petersen², Stefan K. Piechnik³, Stefan Neubauer³, Daniel Rueckert¹ - Show less +3 more•Institutions (3)

Imperial College London¹, Queen Mary University of London², University of Oxford³

16 Sep 2018

TL;DR: In this article, a Siamese-style recurrent spatial transformer network is used for joint estimation of motion and segmentation from cardiac MR image sequences, and a joint multi-scale feature encoder is learned by optimizing the segmentation branch and the motion estimation branch simultaneously, enabling the weakly-supervised segmentation by taking advantage of features that are unsupervisedly learned in the motion estimator from a large amount of unannotated data.

...read moreread less

Abstract: Cardiac motion estimation and segmentation play important roles in quantitatively assessing cardiac function and diagnosing cardiovascular diseases. In this paper, we propose a novel deep learning method for joint estimation of motion and segmentation from cardiac MR image sequences. The proposed network consists of two branches: a cardiac motion estimation branch which is built on a novel unsupervised Siamese style recurrent spatial transformer network, and a cardiac segmentation branch that is based on a fully convolutional network. In particular, a joint multi-scale feature encoder is learned by optimizing the segmentation branch and the motion estimation branch simultaneously. This enables the weakly-supervised segmentation by taking advantage of features that are unsupervisedly learned in the motion estimation branch from a large amount of unannotated data. Experimental results using cardiac MlRI images from 220 subjects show that the joint learning of both tasks is complementary and the proposed models outperform the competing methods significantly in terms of accuracy and speed.

...read moreread less

Journal Article•DOI•

Two-Stage Motion Correction for Super-Resolution Ultrasound Imaging in Human Lower Limb

[...]

Sevan Harput¹, Kirsten Christensen-Jeffries², Jemma Brown², Yuanwei Li¹, K. J. Williams¹, Alun H. Davies¹, Robert J. Eckersley², Christopher Dunsby¹, Meng-Xing Tang¹ - Show less +5 more•Institutions (2)

Imperial College London¹, King's College London²

09 Apr 2018-IEEE Transactions on Ultrasonics Ferroelectrics and Frequency Control

TL;DR: This paper investigates the feasibility of a two-stage motion estimation method, which is a combination of affine and nonrigid estimation, for SR US imaging and reduces the width of the motion-blurred microvessels to approximately 1.5-fold.

...read moreread less

Abstract: The structure of microvasculature cannot be resolved using conventional ultrasound (US) imaging due to the fundamental diffraction limit at clinical US frequencies. It is possible to overcome this resolution limitation by localizing individual microbubbles through multiple frames and forming a superresolved image, which usually requires seconds to minutes of acquisition. Over this time interval, motion is inevitable and tissue movement is typically a combination of large- and small-scale tissue translation and deformation. Therefore, super-resolution (SR) imaging is prone to motion artifacts as other imaging modalities based on multiple acquisitions are. This paper investigates the feasibility of a two-stage motion estimation method, which is a combination of affine and nonrigid estimation, for SR US imaging. First, the motion correction accuracy of the proposed method is evaluated using simulations with increasing complexity of motion. A mean absolute error of 12.2 $\mu \text{m}$ was achieved in simulations for the worst-case scenario. The motion correction algorithm was then applied to a clinical data set to demonstrate its potential to enable in vivo SR US imaging in the presence of patient motion. The size of the identified microvessels from the clinical SR images was measured to assess the feasibility of the two-stage motion correction method, which reduced the width of the motion-blurred microvessels to approximately 1.5-fold.

...read moreread less

Posted Content•

DeepV2D: Video to Depth with Differentiable Structure from Motion.

[...]

Zachary Teed¹, Jia Deng¹•Institutions (1)

Princeton University¹

11 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation and compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture.

...read moreread less

Abstract: We propose DeepV2D, an end-to-end deep learning architecture for predicting depth from video. DeepV2D combines the representation ability of neural networks with the geometric principles governing image formation. We compose a collection of classical geometric algorithms, which are converted into trainable modules and combined into an end-to-end differentiable architecture. DeepV2D interleaves two stages: motion estimation and depth estimation. During inference, motion and depth estimation are alternated and converge to accurate depth. Code is available this https URL.

...read moreread less

Journal Article•DOI•

SCOM: Spatiotemporal Constrained Optimization for Salient Object Detection

[...]

Yuhuan Chen¹, Wenbin Zou¹, Yi Tang¹, Xia Li², Chen Xu¹, Nikos Komodakis³ - Show less +2 more•Institutions (3)

Shenzhen University¹, The Chinese University of Hong Kong², École des ponts ParisTech³

07 Mar 2018-IEEE Transactions on Image Processing

TL;DR: A novel model for video salient object detection called spatiotemporal constrained optimization model (SCOM), which exploits spatial and temporal cues, as well as a local constraint, to achieve a global saliency optimization.

...read moreread less

Abstract: This paper presents a novel model for video salient object detection called spatiotemporal constrained optimization model (SCOM), which exploits spatial and temporal cues, as well as a local constraint, to achieve a global saliency optimization. For a robust motion estimation of salient objects, we propose a novel approach to modeling the motion cues from optical flow field, the saliency map of the prior video frame and the motion history of change detection, which is able to distinguish the moving salient objects from diverse changing background regions. Furthermore, an effective objectness measure is proposed with intuitive geometrical interpretation to extract some reliable object and background regions, which provided as the basis to define the foreground potential, background potential, and the constraint to support saliency propagation. These potentials and the constraint are formulated into the proposed SCOM framework to generate an optimal saliency map for each frame in a video. The proposed model is extensively evaluated on the widely used challenging benchmark data sets. Experiments demonstrate that our proposed SCOM substantially outperforms the state-of-the-art saliency models.

...read moreread less

Journal Article•DOI•

3D freehand ultrasound without external tracking using deep learning.

[...]

Raphael Prevost, Mehrdad Salehi¹, Simon Jagoda, Navneet Kumar, Julian Sprung, Alexander Ladikos, Robert Bauer, Oliver Zettinig, Wolfgang Wein - Show less +5 more•Institutions (1)

Technische Universität München¹

15 Jun 2018-Medical Image Analysis

TL;DR: This novel approach that relies on a statistical analysis rather than physical models, and use a convolutional neural network to directly estimate the motion of successive ultrasound frames in an end‐to‐end fashion is introduced, yielding unprecedentedly accurate reconstructions.

...read moreread less

Journal Article•DOI•

An Efficient Four-Parameter Affine Motion Model for Video Coding

[...]

Li Li¹, Houqiang Li¹, Dong Liu¹, Zhu Li², Yang Haitao³, Lin Sixin³, Huanbang Chen³, Feng Wu¹ - Show less +4 more•Institutions (3)

University of Science and Technology of China¹, University of Missouri–Kansas City², MediaTech Institute³

01 Aug 2018-IEEE Transactions on Circuits and Systems for Video Technology

TL;DR: A simplified affine motion model-based coding framework to overcome the limitation of a translational motion model and maintain low-computational complexity is studied.

...read moreread less

Abstract: In this paper, we study a simplified affine motion model-based coding framework to overcome the limitation of a translational motion model and maintain low-computational complexity. The proposed framework mainly has three key contributions. First, we propose to reduce the number of affine motion parameters from 6 to 4. The proposed four-parameter affine motion model can not only handle most of the complex motions in natural videos, but also save the bits for two parameters. Second, to efficiently encode the affine motion parameters, we propose two motion prediction modes, i.e., an advanced affine motion vector prediction scheme combined with a gradient-based fast affine motion estimation algorithm and an affine model merge scheme, where the latter attempts to reuse the affine motion parameters (instead of the motion vectors) of neighboring blocks. Third, we propose two fast affine motion compensation algorithms. One is the one-step sub-pixel interpolation that reduces the computations of each interpolation. The other is the interpolation-precision-based adaptive block size motion compensation that performs motion compensation at the block level rather than the pixel level to reduce the number of interpolation. Our proposed techniques have been implemented based on the state-of-the-art high-efficiency video coding standard, and the experimental results show that the proposed techniques altogether achieve, on average, 11.1% and 19.3% bits saving for random access and low-delay configurations, respectively, on typical video sequences that have rich rotation or zooming motions. Meanwhile, the computational complexity increases of both the encoder and the decoder are within an acceptable range.

...read moreread less

Journal Article•DOI•

Video Denoising via Empirical Bayesian Estimation of Space-Time Patches

[...]

Pablo Arias¹, Jean-Michel Morel¹•Institutions (1)

Université Paris-Saclay¹

01 Jan 2018-Journal of Mathematical Imaging and Vision

TL;DR: A new patch-based empirical Bayesian video denoising algorithm that builds a Bayesian model for each group of similar space-time patches as simple corrections of the eigenvalues of the sample covariance matrix, demonstrating empirically that these estimators lead to better empirical Wiener filters.

...read moreread less

Abstract: In this paper we present a new patch-based empirical Bayesian video denoising algorithm. The method builds a Bayesian model for each group of similar space-time patches. These patches are not motion-compensated, and therefore avoid the risk of inaccuracies caused by motion estimation errors. The high dimensionality of spatiotemporal patches together with a limited number of available samples poses challenges when estimating the statistics needed for an empirical Bayesian method. We therefore assume that groups of similar patches have a low intrinsic dimensionality, leading to a spiked covariance model. Based on theoretical results about the estimation of spiked covariance matrices, we propose estimators of the eigenvalues of the a priori covariance in high-dimensional spaces as simple corrections of the eigenvalues of the sample covariance matrix. We demonstrate empirically that these estimators lead to better empirical Wiener filters. A comparison on classic benchmark videos demonstrates improved visual quality and an increased PSNR with respect to state-of-the-art video denoising methods.

...read moreread less

Proceedings Article•DOI•

Direct Visual SLAM Using Sparse Depth for Camera-LiDAR System

[...]

Young-Sik Shin¹, Yeong Sang Park¹, Ayoung Kim¹•Institutions (1)

KAIST¹

21 May 2018

TL;DR: A framework for direct visual simultaneous localization and mapping (SLAM) combining a monocular camera with sparse depth information from Light Detection and Ranging (LiDAR) and strict pose marginalization for accurate pose-graph SLAM and depth-integrated frame matching for large-scale mapping is described.

...read moreread less

Abstract: This paper describes a framework for direct visual simultaneous localization and mapping (SLAM) combining a monocular camera with sparse depth information from Light Detection and Ranging (LiDAR). To ensure realtime performance while maintaining high accuracy in motion estimation, we present (i) a sliding window-based tracking method, (ii) strict pose marginalization for accurate pose-graph SLAM and (iii) depth-integrated frame matching for large-scale mapping. Unlike conventional feature-based visual and LiDAR mapping, the proposed approach is direct, eliminating the visual feature in the objective function. We evaluated results using our portable camera-LiDAR system as well as KITTI odometry benchmark datasets. The experimental results prove that the characteristics of two complementary sensors are very effective in improving real-time performance and accuracy. Via validation, we achieved low drift error of 0.98 % in the KITTI benchmark including various environments such as a highway and residential areas.

...read moreread less

Posted Content•DOI•

Correction of respiratory artifacts in MRI head motion estimates

[...]

Damien A. Fair¹, Oscar Miranda-Dominguez¹, Abraham Z. Snyder², Anders Perrone¹, Eric Earl¹, Andrew N. Van², Jonathan M. Koller², Eric Feczko¹, Rachel L. Klein¹, Amy E. Mirro², Jacqueline M. Hampton², Babatunde Adeyemo², Timothy O. Laumann², Caterina Gratton², Deanna J. Greene², Bradley L. Schlaggar², Donald J. Hagler³, Richard Watts⁴, Hugh Garavan⁴, M Deanna⁵, Joel T. Nigg¹, Steven E. Petersen², Anders M. Dale³, Sarah W. Feldstein-Ewing¹, Bonnie J. Nagel¹, Nico U.F. Dosenbach² - Show less +22 more•Institutions (5)

Oregon Health & Science University¹, Washington University in St. Louis², University of California, San Diego³, University of Vermont⁴, University of Washington⁵

07 Jun 2018-bioRxiv

TL;DR: It is shown unequivocally that respirations contaminate movement estimates in functional MRI and that respiration generates apparent head motion not associated with degraded quality of functional MRI, and a novel approach using a band-stop filter that accurately removes these respiratory effects is developed.

...read moreread less

Abstract: Head motion represents one of the greatest technical obstacles for brain MRI. Accurate detection of artifacts induced by head motion requires precise estimation of movement. However, this estimation may be corrupted by factitious effects owing to main field fluctuations generated by body motion. In the current report, we examine head motion estimation in multiband resting state functional connectivity MRI (rs-fcMRI) data from the Adolescent Brain and Cognitive Development (ABCD) Study and a comparison 9single-shot9 dataset from Oregon Health & Science University. We show unequivocally that respirations contaminate movement estimates in functional MRI and that respiration generates apparent head motion not associated with degraded quality of functional MRI. We have developed a novel approach using a band-stop filter that accurately removes these respiratory effects. Subsequently, we demonstrate that utilizing this filter improves post-processing data quality. Lastly, we demonstrate the real-time implementation of motion estimate filtering in our FIRMM (Framewise Integrated Real-Time MRI Monitoring) software package.

...read moreread less

Joint learning of motion estimation and segmentation for cardiac MR image sequences

[...]

Chen Qin¹, Wenjia Bai¹, Jo Schlemper¹, Steffen E. Petersen², Stefan K. Piechnik³, Stefan Neubauer³, Daniel Rueckert¹ - Show less +3 more•Institutions (3)

Imperial College London¹, Queen Mary University of London², University of Oxford³

01 Jan 2018

TL;DR: Experimental results show that the joint learning of both tasks is complementary and the proposed models outperform the competing methods significantly in terms of accuracy and speed.

...read moreread less

Book Chapter•DOI•

Deep Reinforcement Learning with Iterative Shift for Visual Tracking

[...]

Liangliang Ren¹, Xin Yuan¹, Jiwen Lu¹, Ming Yang, Jie Zhou¹ - Show less +1 more•Institutions (1)

Tsinghua University¹

08 Sep 2018

TL;DR: A deep reinforcement learning with iterative shift (DRL-IS) method for single object tracking, where an actor-critic network is introduced to predict the iterative shifts of object bounding boxes, and evaluate the shifts to take actions on whether to update object models or re-initialize tracking.

...read moreread less

Abstract: Visual tracking is confronted by the dilemma to locate a target both accurately and efficiently, and make decisions online whether and how to adapt the appearance model or even restart tracking. In this paper, we propose a deep reinforcement learning with iterative shift (DRL-IS) method for single object tracking, where an actor-critic network is introduced to predict the iterative shifts of object bounding boxes, and evaluate the shifts to take actions on whether to update object models or re-initialize tracking. Since locating an object is achieved by an iterative shift process, rather than online classification on many sampled locations, the proposed method is robust to cope with large deformations and abrupt motion, and computationally efficient since finding a target takes up to 10 shifts. In offline training, the critic network guides to learn how to make decisions jointly on motion estimation and tracking status in an end-to-end manner. Experimental results on the OTB benchmarks with large deformation improve the tracking precision by 1.7% and runs about 5 times faster than the competing state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

DeepCalib: a deep learning approach for automatic intrinsic calibration of wide field-of-view cameras

[...]

Oleksandr Bogdan¹, Viktor Eckstein², Francois Rameau¹, Jean-Charles Bazin¹•Institutions (2)

KAIST¹, Karlsruhe Institute of Technology²

13 Dec 2018

TL;DR: This work builds upon the recent developments in deep Convolutional Neural Networks (CNN) and automatically estimates the intrinsic parameters of the camera from a single input image, using the great amount of omnidirectional images available on the Internet to generate a large-scale dataset.

...read moreread less

Abstract: Calibration of wide field-of-view cameras is a fundamental step for numerous visual media production applications, such as 3D reconstruction, image undistortion, augmented reality and camera motion estimation. However, existing calibration methods require multiple images of a calibration pattern (typically a checkerboard), assume the presence of lines, require manual interaction and/or need an image sequence. In contrast, we present a novel fully automatic deep learning-based approach that overcomes all these limitations and works with a single image of general scenes. Our approach builds upon the recent developments in deep Convolutional Neural Networks (CNN): our network automatically estimates the intrinsic parameters of the camera (focal length and distortion parameter) from a single input image. In order to train the CNN, we leverage the great amount of omnidirectional images available on the Internet to automatically generate a large-scale dataset composed of millions of wide field-of-view images with ground truth intrinsic parameters. Experiments successfully demonstrated the quality of our results, both quantitatively and qualitatively.

...read moreread less

Journal Article•DOI•

Human Motion Segmentation via Robust Kernel Sparse Subspace Clustering

[...]

Guiyu Xia¹, Huaijiang Sun¹, Lei Feng¹, Guoqing Zhang¹, Yazhou Liu¹ - Show less +1 more•Institutions (1)

Nanjing University of Science and Technology¹

01 Jan 2018-IEEE Transactions on Image Processing

TL;DR: This paper converts the segmentation of motion capture data into a temporal subspace clustering problem, and proposes a new segmentation method, which is robust to non-Gaussian noise, since correntropy is a localized similarity measure.

...read moreread less

Abstract: Studies on human motion have attracted a lot of attentions. Human motion capture data, which much more precisely records human motion than videos do, has been widely used in many areas. Motion segmentation is an indispensable step for many related applications, but current segmentation methods for motion capture data do not effectively model some important characteristics of motion capture data, such as Riemannian manifold structure and containing non-Gaussian noise. In this paper, we convert the segmentation of motion capture data into a temporal subspace clustering problem. Under the framework of sparse subspace clustering, we propose to use the geodesic exponential kernel to model the Riemannian manifold structure, use correntropy to measure the reconstruction error, use the triangle constraint to guarantee temporal continuity in each cluster and use multi-view reconstruction to extract the relations between different joints. Therefore, exploiting some special characteristics of motion capture data, we propose a new segmentation method, which is robust to non-Gaussian noise, since correntropy is a localized similarity measure. We also develop an efficient optimization algorithm based on block coordinate descent method to solve the proposed model. Our optimization algorithm has a linear complexity while sparse subspace clustering is originally a quadratic problem. Extensive experiment results both on simulated noisy data set and real noisy data set demonstrate the advantage of the proposed method.

...read moreread less

Journal Article•DOI•

LS-VO: Learning Dense Optical Subspace for Robust Visual Odometry Estimation

[...]

Gabriele Costante¹, Thomas A. Ciarfuglia¹•Institutions (1)

University of Perugia¹

07 Feb 2018

TL;DR: In this paper, an autoencoder network is used to find a nonlinear representation of the optical flow manifold and a latent space visual odometry (LS-VO) is learned jointly with the estimation task.

...read moreread less

Abstract: This work proposes a novel deep network architecture to solve the camera ego-motion estimation problem. A motion estimation network generally learns features similar to optical flow (OF) fields starting from sequences of images. This OF can be described by a lower dimensional latent space. Previous research has shown how to find linear approximations of this space. We propose to use an autoencoder network to find a nonlinear representation of the OF manifold. In addition, we propose to learn the latent space jointly with the estimation task, so that the learned OF features become a more robust description of the OF input. We call this novel architecture latent space visual odometry (LS-VO). The experiments show that LS-VO achieves a considerable increase in performances with respect to baselines, while the number of parameters of the estimation network only slightly increases.

...read moreread less

Journal Article•DOI•

Adaptive Fractional-Pixel Motion Estimation Skipped Algorithm for Efficient HEVC Motion Estimation

[...]

Zhaoqing Pan¹, Jianjun Lei², Yajuan Zhang³, Fu Lee Wang⁴•Institutions (4)

Nanjing University of Information Science and Technology¹, Tianjin University², Hebei University of Technology³, Caritas Institute of Higher Education⁴

04 Jan 2018-ACM Transactions on Multimedia Computing, Communications, and Applications

TL;DR: An adaptive fractional-pixel ME skipped scheme is proposed for low-complexity HEVC ME, which reduces ME encoding time by an average of 63.22% while encoding efficiency performance is maintained.

...read moreread less

Abstract: High-Efficiency Video Coding (HEVC) efficiently addresses the storage and transmit problems of high-definition videos, especially for 4K videos. The variable-size Prediction Units (PUs)--based Motion Estimation (ME) contributes a significant compression rate to the HEVC encoder and also generates a huge computation load. Meanwhile, high-level encoding complexity prevents widespread adoption of the HEVC encoder in multimedia systems. In this article, an adaptive fractional-pixel ME skipped scheme is proposed for low-complexity HEVC ME. First, based on the property of the variable-size PUs--based ME process and the video content partition relationship among variable-size PUs, all inter-PU modes during a coding unit encoding process are classified into root-type PU mode and children-type PU modes. Then, according to the ME result of the root-type PU mode, the fractional-pixel ME of its children-type PU modes is adaptively skipped. Simulation results show that, compared to the original ME in HEVC reference software, the proposed algorithm reduces ME encoding time by an average of 63.22% while encoding efficiency performance is maintained.

...read moreread less

Journal Article•DOI•

Low-Rank Plus Sparse Decomposition and Localized Radon Transform for Ship-Wake Detection in Synthetic Aperture Radar Images

[...]

Filippo Biondi

01 Jan 2018-IEEE Geoscience and Remote Sensing Letters

TL;DR: A complete procedure for the automatic estimation of maritime target motion parameters by evaluating the generated Kelvin waves detected in synthetic aperture radar (SAR) images by evaluating a dual-stage low-rank plus sparse decomposition (LRSD) assisted by Radon transform (RT) for clutter reduction, sparse object detection, precise wake inclination estimation, and Kelvin wave spectral analysis.

...read moreread less

Abstract: The problem in obtaining stable motion estimation of maritime targets is that sea clutter makes wake structure detection and reconnaissance difficult. This letter presents a complete procedure for the automatic estimation of maritime target motion parameters by evaluating the generated Kelvin waves detected in synthetic aperture radar (SAR) images. The algorithm consists in evaluating a dual-stage low-rank plus sparse decomposition (LRSD) assisted by Radon transform (RT) for clutter reduction, sparse object detection, precise wake inclination estimation, and Kelvin wave spectral analysis. The algorithm is based on the robust principal component analysis (RPCA) implemented by convex programming. The LRSD algorithm permits the extrapolation of sparse objects of interest consisting of the maritime targets and the Kelvin pattern from the unchanging low-rank background. This dual-stage RPCA and RT applied to SAR surveillance permits fast detection and enhanced motion parameter estimation of maritime targets.

...read moreread less

Posted Content•

EVA$^2$: Exploiting Temporal Redundancy in Live Computer Vision

[...]

Mark Buckler, Philip Bedoukian, Suren Jayasuriya, Adrian Sampson

16 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A new algorithm, activation motion compensation, detects changes in the visual input and incrementally updates a previously-computed activation and applies well-known motion estimation techniques to adapt to visual changes to avoid unnecessary computation on most frames.

...read moreread less

Abstract: Hardware support for deep convolutional neural networks (CNNs) is critical to advanced computer vision in mobile and embedded devices. Current designs, however, accelerate generic CNNs; they do not exploit the unique characteristics of real-time vision. We propose to use the temporal redundancy in natural video to avoid unnecessary computation on most frames. A new algorithm, activation motion compensation, detects changes in the visual input and incrementally updates a previously-computed output. The technique takes inspiration from video compression and applies well-known motion estimation techniques to adapt to visual changes. We use an adaptive key frame rate to control the trade-off between efficiency and vision quality as the input changes. We implement the technique in hardware as an extension to existing state-of-the-art CNN accelerator designs. The new unit reduces the average energy per frame by 54.2%, 61.7%, and 87.6% for three CNNs with less than 1% loss in vision accuracy.

...read moreread less

Collapse