scispace - formally typeset
Search or ask a question

Showing papers by "Andrew Rabinovich published in 2018"


Proceedings ArticleDOI
18 Jun 2018
TL;DR: In this paper, a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision is presented, which operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass.
Abstract: This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection repeatability and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to state-of-the-art homography estimation results on HPatches when compared to LIFT, SIFT and ORB.

982 citations


Proceedings Article
03 Jul 2018
TL;DR: Gradient normalization (GradNorm) as discussed by the authors automatically balances training in deep multitask models by dynamically tuning gradient magnitudes, which has been shown to improve accuracy and reduce overfitting across multiple tasks when compared to single-task networks.
Abstract: Deep multitask networks, in which one neural network produces multiple predictive outputs, can offer better speed and performance than their single-task counterparts but are challenging to train properly. We present a gradient normalization (GradNorm) algorithm that automatically balances training in deep multitask models by dynamically tuning gradient magnitudes. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting across multiple tasks when compared to single-task networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process that incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we will demonstrate that gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning.

644 citations


Book ChapterDOI
TL;DR: A deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels is presented and it is demonstrated that it would indeed be possible to efficiently transform sparse depth measurements obtained using e.g. lower-power depth sensors or SLAM systems into high-quality densedepth maps.
Abstract: We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The model works simultaneously for both indoor/outdoor scenes and produces state-of-the-art dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI datasets. We surpass the state-of-the-art for monocular depth estimation even with depth values for only 1 out of every ~10000 image pixels, and we outperform other sparse-to-dense depth methods at all sparsity levels. With depth values for 1/256 of the image pixels, we achieve a mean absolute error of less than 1% of actual depth on indoor scenes, comparable to the performance of consumer-grade depth sensor hardware. Our experiments demonstrate that it would indeed be possible to efficiently transform sparse depth measurements obtained using e.g. lower-power depth sensors or SLAM systems into high-quality dense depth maps.

92 citations


Book ChapterDOI
08 Sep 2018
TL;DR: In this article, the authors presented a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels, and achieved state-of-the-art performance on both the NYUv2 and KITTI datasets.
Abstract: We present a deep model that can accurately produce dense depth maps given an RGB image with known depth at a very sparse set of pixels. The model works simultaneously for both indoor/outdoor scenes and produces state-of-the-art dense depth maps at nearly real-time speeds on both the NYUv2 and KITTI datasets. We surpass the state-of-the-art for monocular depth estimation even with depth values for only 1 out of every \({\sim }10000\) image pixels, and we outperform other sparse-to-dense depth methods at all sparsity levels. With depth values for \(1{\slash }256\) of the image pixels, we achieve a mean error of less than \(1\%\) of actual depth on indoor scenes, comparable to the performance of consumer-grade depth sensor hardware. Our experiments demonstrate that it would indeed be possible to efficiently transform sparse depth measurements obtained using e.g. lower-power depth sensors or SLAM systems into high-quality dense depth maps.

68 citations


Posted Content
TL;DR: A self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry frontend, a network which computes pointwise data associations across images.
Abstract: We propose a self-supervised learning framework that uses unlabeled monocular video sequences to generate large-scale supervision for training a Visual Odometry (VO) frontend, a network which computes pointwise data associations across images. Our self-improving method enables a VO frontend to learn over time, unlike other VO and SLAM systems which require time-consuming hand-tuning or expensive data collection to adapt to new environments. Our proposed frontend operates on monocular images and consists of a single multi-task convolutional neural network which outputs 2D keypoints locations, keypoint descriptors, and a novel point stability score. We use the output of VO to create a self-supervised dataset of point correspondences to retrain the frontend. When trained using VO at scale on 2.5 million monocular images from ScanNet, the stability classifier automatically discovers a ranking for keypoints that are not likely to help in VO, such as t-junctions across depth discontinuities, features on shadows and highlights, and dynamic objects like people. The resulting frontend outperforms both traditional methods (SIFT, ORB, AKAZE) and deep learning methods (SuperPoint and LF-Net) in a 3D-to-2D pose estimation task on ScanNet.

35 citations


Patent
16 Mar 2018
TL;DR: In this article, a convolutional neural network is used to estimate the layout of a room using two-dimensional ordered keypoints associated with a room type, which can be used in augmented or mixed reality, robotics, autonomous indoor navigation, etc.
Abstract: Systems and methods for estimating a layout of a room are disclosed. The room layout can comprise the location of a floor, one or more walls, and a ceiling, in one aspect, a neural network can analyze an image of a portion of a room to determine the room layout. The neural network can comprise a convolutional neural network having an encoder sub-network, a decoder sub-network, and a side sub-network. The neural network can determine a three-dimensional room layout using two-dimensional ordered keypoints associated with a room type. The room layout can be used in applications such as augmented or mixed reality, robotics, autonomous indoor navigation, etc.

24 citations


Patent
09 Nov 2018
TL;DR: In this paper, a meta-learning approach is proposed to adapt task loss balancing weights in the course of training to get improved performance on multiple tasks on real world datasets, which can lead to superior performance over the use of static weights determined by expensive random searches or heuristics.
Abstract: Methods and systems for meta-learning are described for automating learning of child tasks with a single neural network. The order in which tasks are learned by the neural network can affect performance of the network, and the meta-learning approach can use a task-level curriculum for multi-task training. The task-level curriculum can be learned by monitoring a trajectory of loss functions during training. The meta-learning approach can learn to adapt task loss balancing weights in the course of training to get improved performance on multiple tasks on real world datasets. Advantageously, learning to dynamically balance weights among different task losses can lead to superior performance over the use of static weights determined by expensive random searches or heuristics. Embodiments of the meta-learning approach can be used for computer vision tasks or natural language processing tasks, and the trained neural networks can be used by augmented or virtual reality devices.

7 citations


Patent
29 Oct 2018
TL;DR: In this paper, the authors discuss the importance of relationships in the context of social media, and propose a method to build a social media profile of a person's relationship with another person.
Abstract: 본 개시내용은 네트워크가 해결하고자 하는 문제/데이터의 상관관계들(correlations)을 이용함으로써 신경 네트워크들의 구조 학습을 구현하기 위한 개선된 접근법을 제공한다. 하단 콘볼루셔널 층들로부터 완전히 연결된 층들까지 내내 정보 이득의 병목들을 발견하는 그리디 접근법이 설명된다. 단순히 아키텍처를 더 깊게 하기 보다는, 부가적인 컴퓨테이션 및 커패시턴스는 그것이 필요할 곳에만 부가된다.