Showing papers on "Orientation (computer vision) published in 2020"

PDF

Open Access

Posted Content•

Center-based 3D Object Detection and Tracking.

[...]

Tianwei Yin¹, Xingyi Zhou¹, Philipp Krähenbühl¹•Institutions (1)

19 Jun 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: The framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity, and refines these estimates using additional point features on the object.

...read moreread less

Abstract: Three-dimensional objects are commonly represented as 3D boxes in a point-cloud. This representation mimics the well-studied image-based 2D bounding-box detection but comes with additional challenges. Objects in a 3D world do not follow any particular orientation, and box-based detectors have difficulties enumerating all orientations or fitting an axis-aligned bounding box to rotated objects. In this paper, we instead propose to represent, detect, and track 3D objects as points. Our framework, CenterPoint, first detects centers of objects using a keypoint detector and regresses to other attributes, including 3D size, 3D orientation, and velocity. In a second stage, it refines these estimates using additional point features on the object. In CenterPoint, 3D object tracking simplifies to greedy closest-point matching. The resulting detection and tracking algorithm is simple, efficient, and effective. CenterPoint achieved state-of-the-art performance on the nuScenes benchmark for both 3D detection and tracking, with 65.5 NDS and 63.8 AMOTA for a single model. On the Waymo Open Dataset, CenterPoint outperforms all previous single model method by a large margin and ranks first among all Lidar-only submissions. The code and pretrained models are available at this https URL.

...read moreread less

397 citations

Book Chapter•DOI•

DeepIM: Deep Iterative Matching for 6D Pose Estimation

[...]

Yi Li¹, Gu Wang¹, Xiangyang Ji¹, Yu Xiang², Dieter Fox² - Show less +1 more•Institutions (2)

Tsinghua University¹, University of Washington²

01 Mar 2020

TL;DR: A novel deep neural network for 6D pose matching named DeepIM is proposed that is able to iteratively refine the pose by matching the rendered image against the observed image.

...read moreread less

Abstract: Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the input image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

...read moreread less

340 citations

Posted Content•

RTM3D: Real-time Monocular 3D Detection from Object Keypoints for Autonomous Driving

[...]

Peixuan Li, Huaici Zhao, Pengfei Liu, Feidao Cao

10 Jan 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes an efficient and accurate monocular 3D detection framework in single shot that achieves state-of-the-art performance on the KITTI benchmark and predicts the nine perspective keypoints of a 3D bounding box in image space, and utilizes the geometric relationship of 3D and 2D perspectives.

...read moreread less

Abstract: In this work, we propose an efficient and accurate monocular 3D detection framework in single shot. Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component. Four edges of a 2D box provide only four constraints and the performance deteriorates dramatically with the small error of the 2D detector. Different from these approaches, our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space. In this method, the properties of the object can be predicted stably even when the estimation of keypoints is very noisy, which enables us to obtain fast detection speed with a small architecture. Training our method only uses the 3D properties of the object without the need for external networks or supervision data. Our method is the first real-time system for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark. Code will be released at this https URL.

...read moreread less

167 citations

Journal Article•DOI•

Rotation-aware and multi-scale convolutional neural network for object detection in remote sensing images

[...]

Kun Fu¹, Zhonghan Chang¹, Yue Zhang¹, Guangluan Xu¹, Keshu Zhang¹, Xian Sun¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

01 Mar 2020-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: A unified framework upon the region-based convolutional neural network for arbitrary-oriented and multi-scale object detection in remote sensing images is built and achieves state-of-the-art performance, which demonstrates the effectiveness of the proposed methods.

...read moreread less

Abstract: Object detection plays an important role in the field of remote sensing imagery analysis. The most challenging issues in advancing this task are the large variation in object scales and the arbitrary orientation of objects. In this paper, we build a unified framework upon the region-based convolutional neural network for arbitrary-oriented and multi-scale object detection in remote sensing images. To handle the problem of multi-scale object detection, a feature-fusion architecture is proposed to generate a multi-scale feature hierarchy, which augments the features of shallow layers with semantic representations via a top-down pathway and combines the feature maps of top layers with low-level information by a bottom-up pathway. By combining features of different levels, we can form a powerful feature representation for multi-scale objects. Most previous methods locate objects with arbitrary orientations and dense spatial distributions via axis-aligned boxes, which may cover adjacent instances and background areas. We build a rotation-aware object detector that uses oriented boxes to localize objects in remote sensing images. The region proposal network augments the anchors with multiple default angles to cover oriented objects. It utilizes oriented proposal boxes to enclose objects rather than horizontal proposals that coarsely locate oriented objects. The orientation RoI pooling operation is introduced to extract the feature maps of oriented proposals for the following R-CNN subnetwork. We conduct comprehensive experiments on a public dataset for oriented object detection in remote sensing images. Our method achieves state-of-the-art performance, which demonstrates the effectiveness of the proposed methods.

...read moreread less

165 citations

Posted Content•

ASLFeat: Learning Local Features of Accurate Shape and Localization

[...]

Zixin Luo¹, Lei Zhou¹, Xuyang Bai¹, Hongkai Chen¹, Jiahui Zhang¹, Yao Yao¹, Shiwei Li², Tian Fang, Long Quan¹ - Show less +5 more•Institutions (2)

Hong Kong University of Science and Technology¹, Tsinghua University²

23 Mar 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors, by resorting to deformable convolutional networks to densely estimate and apply local transformation in ASLFeat.

...read moreread less

Abstract: This work focuses on mitigating two limitations in the joint learning of local feature detectors and descriptors. First, the ability to estimate the local shape (scale, orientation, etc.) of feature points is often neglected during dense feature extraction, while the shape-awareness is crucial to acquire stronger geometric invariance. Second, the localization accuracy of detected keypoints is not sufficient to reliably recover camera geometry, which has become the bottleneck in tasks such as 3D reconstruction. In this paper, we present ASLFeat, with three light-weight yet effective modifications to mitigate above issues. First, we resort to deformable convolutional networks to densely estimate and apply local transformation. Second, we take advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization. Finally, we use a peakiness measurement to relate feature responses and derive more indicative detection scores. The effect of each modification is thoroughly studied, and the evaluation is extensively conducted across a variety of practical scenarios. State-of-the-art results are reported that demonstrate the superiority of our methods.

...read moreread less

125 citations

Journal Article•DOI•

Imaging and fusing time series for wearable sensor-based human activity recognition

[...]

Zhen Qin¹, Yibo Zhang¹, Shuyu Meng¹, Zhiguang Qin¹, Kim-Kwang Raymond Choo² - Show less +1 more•Institutions (2)

University of Electronic Science and Technology of China¹, University of Texas at San Antonio²

01 Jan 2020-Information Fusion

TL;DR: In this article, a deep neural network architecture for human activity recognition based on multiple sensor data is proposed, which encodes the time series of sensor data as images and leverages these transformed images to retain the necessary features for activity recognition.

...read moreread less

123 citations

Proceedings Article•DOI•

ASLFeat: Learning Local Features of Accurate Shape and Localization

[...]

Zixin Luo¹, Lei Zhou¹, Xuyang Bai¹, Hongkai Chen¹, Jiahui Zhang², Yao Yao¹, Shiwei Li, Tian Fang, Long Quan¹ - Show less +5 more•Institutions (2)

Hong Kong University of Science and Technology¹, Tsinghua University²

14 Jun 2020

TL;DR: ASLiu et al. as mentioned in this paper used deformable convolutional networks to densely estimate and apply local transformation, and took advantage of the inherent feature hierarchy to restore spatial resolution and low-level details for accurate keypoint localization.

...read moreread less

118 citations

Journal Article•DOI•

Augmented Autoencoders: Implicit 3D Orientation Learning for 6D Object Detection

[...]

Martin Sundermeyer¹, Zoltan-Csaba Marton¹, Maximilian Durner¹, Rudolph Triebel¹, Rudolph Triebel² - Show less +1 more•Institutions (2)

German Aerospace Center¹, Technische Universität München²

01 Mar 2020-International Journal of Computer Vision

TL;DR: This novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization and achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain.

...read moreread less

Abstract: We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization. This so-called Augmented Autoencoder has several advantages over existing methods: It does not require real, pose-annotated training data, generalizes to various test sensors and inherently handles object and view symmetries. Instead of learning an explicit mapping from input images to object poses, it provides an implicit representation of object orientations defined by samples in a latent space. Our pipeline achieves state-of-the-art performance on the T-LESS dataset both in the RGB and RGB-D domain. We also evaluate on the LineMOD dataset where we can compete with other synthetically trained approaches. We further increase performance by correcting 3D orientation estimates to account for perspective errors when the object deviates from the image center and show extended results. Our code is available here https://github.com/DLR-RM/AugmentedAutoencoder.

...read moreread less

111 citations

Proceedings Article•DOI•

Where Am I Looking At? Joint Location and Orientation Estimation by Cross-View Matching

[...]

Yujiao Shi¹, Xin Yu², Dylan Campbell¹, Hongdong Li¹•Institutions (2)

Australian National University¹, University of Technology, Sydney²

14 Jun 2020

TL;DR: A Dynamic Similarity Matching network is designed to estimate cross-view orientation alignment during localization and improves state-of-the-art performance on large-scale geo-localization datasets.

...read moreread less

Abstract: Cross-view geo-localization is the problem of estimating the position and orientation (latitude, longitude and azimuth angle) of a camera at ground level given a large-scale database of geo-tagged aerial (eg., satellite) images. Existing approaches treat the task as a pure location estimation problem by learning discriminative feature descriptors, but neglect orientation alignment. It is well-recognized that knowing the orientation between ground and aerial images can significantly reduce matching ambiguity between these two views, especially when the ground-level images have a limited Field of View (FoV) instead of a full field-of-view panorama. Therefore, we design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization. In particular, we address the cross-view domain gap by applying a polar transform to the aerial images to approximately align the images up to an unknown azimuth angle. Then, a two-stream convolutional network is used to learn deep features from the ground and polar-transformed aerial images. Finally, we obtain the orientation by computing the correlation between cross-view features, which also provides a more accurate measure of feature similarity, improving location recall. Experiments on standard datasets demonstrate that our method significantly improves state-of-the-art performance. Remarkably, we improve the top-1 location recall rate on the CVUSA dataset by a factor of 1.5x for panoramas with known orientation, by a factor of 3.3x for panoramas with unknown orientation, and by a factor of 6x for 180-degree FoV images with unknown orientation.

...read moreread less

104 citations

Journal Article•DOI•

Efficient structure from motion for large-scale UAV images: A review and a comparison of SfM tools

[...]

San Jiang¹, Jiang Cheng², Wanshou Jiang³•Institutions (3)

China University of Geosciences (Wuhan)¹, Tencent², Wuhan University³

01 Sep 2020-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: A systematic survey of the state-of-the-art for match pair selection from both ordered and unordered datasets, for outlier removal of initial matches dominated by outliers, and for efficiency improvement of BA are given and an experimental evaluation for six well-known SfM-based software packages on UAV image orientation is conducted.

...read moreread less

Abstract: Unmanned aerial vehicle (UAV) images have gained extensive attention in varying fields, and the Structure from Motion (SfM) technique has become the gold standard for aerial triangulation of UAV images. With increasing data volume caused by the use of multi-view and high-resolution imaging systems and the enhancement of UAV platform’s endurance, the capability for orientation of large-scale UAV images is becoming a prominent and necessary feature for SfM-based solutions. A classical SfM pipeline consists of three major steps, i.e., (i) feature extraction for an individual image, (ii) feature matching for each image pair, and (iii) parameter solving based on iterative bundle adjustment. Most of the time costs are consumed in the second and third steps. This can be explained from three main aspects. First, for feature matching the large number of images and high overlapping degrees cause high combinational complexity of match pairs. Second, the efficiency of commonly utilized techniques for outlier removal would be seriously degenerated because of high outlier ratios of initial matches. Third, for parameter solving of camera poses and scene structures, the iterative execution of bundle adjustment (BA) leads to high computational costs in the incremental SfM workflow. Thus, this paper gives a systematic survey of the state-of-the-art for match pair selection from both ordered and unordered datasets, for outlier removal of initial matches dominated by outliers, and for efficiency improvement of BA, and conducts an experimental evaluation for six well-known SfM-based software packages on UAV image orientation.

...read moreread less

102 citations

Journal Article•DOI•

Content-based image retrieval system using ORB and SIFT features

[...]

Payal Chhabra¹, Naresh Kumar Garg¹, Munish Kumar¹•Institutions (1)

Punjab Technical University¹

01 Apr 2020-Neural Computing and Applications

TL;DR: A content-based image retrieval (CBIR) system has been proposed to extract a feature vector from an image and to effectively retrieve content- based images.

...read moreread less

Abstract: Measures of components in digital images are expanded and to locate a specific image in the light of substance from a huge database is sometimes troublesome. In this paper, a content-based image retrieval (CBIR) system has been proposed to extract a feature vector from an image and to effectively retrieve content-based images. In this work, two types of image feature descriptor extraction methods, namely Oriented Fast and Rotated BRIEF (ORB) and scale-invariant feature transform (SIFT) are considered. ORB detector uses a fast key points and descriptor use a BRIEF descriptor. SIFT be used for analysis of images based on various orientation and scale. K-means clustering algorithm is used over both descriptors from which the mean of every cluster is obtained. Locality-preserving projection dimensionality reduction algorithm is used to reduce the dimensions of an image feature vector. At the time of retrieval, the image feature vectors are stored in the image database and matched with testing data feature vector for CBIR. The execution of the proposed work is assessed by utilizing a decision tree, random forest, and MLP classifiers. Two, public databases, namely Wang database and corel database, have been considered for the experimentation work. Combination of ORB and SIFT feature vectors are tested for images in Wang database and corel database which accomplishes a highest precision rate of 99.53% and 86.20% for coral database and Wang database, respectively.

...read moreread less

Book Chapter•DOI•

RTM3D: Real-Time Monocular 3D Detection from Object Keypoints for Autonomous Driving

[...]

Peixuan Li, Huaici Zhao, Pengfei Liu, Feidao Cao

10 Jan 2020

TL;DR: In this paper, the nine perspective keypoints of a 3D bounding box in image space are predicted and the geometric relationship of 3D and 2D perspectives is utilized to recover the dimension, location, and orientation in 3D space.

...read moreread less

Abstract: In this work, we propose an efficient and accurate monocular 3D detection framework in single shot. Most successful 3D detectors take the projection constraint from the 3D bounding box to the 2D box as an important component. Four edges of a 2D box provide only four constraints and the performance deteriorates dramatically with the small error of the 2D detector. Different from these approaches, our method predicts the nine perspective keypoints of a 3D bounding box in image space, and then utilize the geometric relationship of 3D and 2D perspectives to recover the dimension, location, and orientation in 3D space. In this method, the properties of the object can be predicted stably even when the estimation of keypoints is very noisy, which enables us to obtain fast detection speed with a small architecture. Training our method only uses the 3D properties of the object without any extra annotations, category-specific 3D shape priors, or depth maps. Our method is the first real-time system (FPS > 24) for monocular image 3D detection while achieves state-of-the-art performance on the KITTI benchmark.

...read moreread less

Posted Content•

Dense Label Encoding for Boundary Discontinuity Free Rotation Detection

[...]

Xue Yang¹, Liping Hou², Yue Zhou¹, Wentao Wang¹, Junchi Yan¹ - Show less +1 more•Institutions (2)

Shanghai Jiao Tong University¹, Chinese Academy of Sciences²

19 Nov 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper explores a relatively less-studied methodology based on classification for rotation detection, and proposes new techniques to inherently dismiss the boundary discontinuity issue as encountered by the regression-based detectors.

...read moreread less

Abstract: Rotation detection serves as a fundamental building block in many visual applications involving aerial image, scene text, and face etc. Differing from the dominant regression-based approaches for orientation estimation, this paper explores a relatively less-studied methodology based on classification. The hope is to inherently dismiss the boundary discontinuity issue as encountered by the regression-based detectors. We propose new techniques to push its frontier in two aspects: i) new encoding mechanism: the design of two Densely Coded Labels (DCL) for angle classification, to replace the Sparsely Coded Label (SCL) in existing classification-based detectors, leading to three times training speed increase as empirically observed across benchmarks, further with notable improvement in detection accuracy; ii) loss re-weighting: we propose Angle Distance and Aspect Ratio Sensitive Weighting (ADARSW), which improves the detection accuracy especially for square-like objects, by making DCL-based detectors sensitive to angular distance and object's aspect ratio. Extensive experiments and visual analysis on large-scale public datasets for aerial images i.e. DOTA, UCAS-AOD, HRSC2016, as well as scene text dataset ICDAR2015 and MLT, show the effectiveness of our approach. The source code is available at this https URL and is also integrated in our open source rotation detection benchmark: this https URL.

...read moreread less

Posted Content•

Visual Camera Re-Localization from RGB and RGB-D Images Using DSAC

[...]

Eric Brachmann, Carsten Rother¹•Institutions (1)

Heidelberg University¹

27 Feb 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A learning-based system that estimates the camera position and orientation from a single input image relative to a known environment using a deep neural network and fully differentiable pose optimization achieves state-of-the-art accuracy on various public datasets for RGB-based re-localization, and competitive accuracy forRGB-D based re- localization.

...read moreread less

Abstract: We describe a learning-based system that estimates the camera position and orientation from a single input image relative to a known environment. The system is flexible w.r.t. the amount of information available at test and at training time, catering to different applications. Input images can be RGB-D or RGB, and a 3D model of the environment can be utilized for training but is not necessary. In the minimal case, our system requires only RGB images and ground truth poses at training time, and it requires only a single RGB image at test time. The framework consists of a deep neural network and fully differentiable pose optimization. The neural network predicts so called scene coordinates, i.e. dense correspondences between the input image and 3D scene space of the environment. The pose optimization implements robust fitting of pose parameters using differentiable RANSAC (DSAC) to facilitate end-to-end training. The system, an extension of DSAC++ and referred to as DSAC*, achieves state-of-the-art accuracy an various public datasets for RGB-based re-localization, and competitive accuracy for RGB-D-based re-localization.

...read moreread less

Journal Article•DOI•

RADet: Refine Feature Pyramid Network and Multi-Layer Attention Network for Arbitrary-Oriented Object Detection of Remote Sensing Images

[...]

Yangyang Li, Qin Huang, Xuan Pei, Licheng Jiao, Ronghua Shang - Show less +1 more

25 Jan 2020-Remote Sensing

TL;DR: A novel rotation detector for remote sensing images, mainly inspired by Mask R-CNN, namely RADet is proposed, which can obtain the rotation bounding box of objects with shape mask predicted by the mask branch, and is shown to outperform existing leading object detectors in remote sensing field.

...read moreread less

Abstract: Object detection has made significant progress in many real-world scenes. Despite this remarkable progress, the common use case of detection in remote sensing images remains challenging even for leading object detectors, due to the complex background, objects with arbitrary orientation, and large difference in scale of objects. In this paper, we propose a novel rotation detector for remote sensing images, mainly inspired by Mask R-CNN, namely RADet. RADet can obtain the rotation bounding box of objects with shape mask predicted by the mask branch, which is a novel, simple and effective way to get the rotation bounding box of objects. Specifically, a refine feature pyramid network is devised with an improved building block constructing top-down feature maps, to solve the problem of large difference in scales. Meanwhile, the position attention network and the channel attention network are jointly explored by modeling the spatial position dependence between global pixels and highlighting the object feature, for detecting small object surrounded by complex background. Extensive experiments on two remote sensing public datasets, DOTA and NWPUVHR -10, show our method to outperform existing leading object detectors in remote sensing field.

...read moreread less

Book Chapter•DOI•

Category Level Object Pose Estimation via Neural Analysis-by-Synthesis

[...]

Xu Chen¹, Zijian Dong¹, Jie Song¹, Andreas Geiger², Otmar Hilliges¹ - Show less +1 more•Institutions (2)

ETH Zurich¹, University of Tübingen²

23 Aug 2020

TL;DR: This paper combines a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary.

...read moreread less

Abstract: Many object pose estimation algorithms rely on the analysis-by-synthesis framework which requires explicit representations of individual object instances. In this paper we combine a gradient-based fitting procedure with a parametric neural image synthesis module that is capable of implicitly representing the appearance, shape and pose of entire object categories, thus rendering the need for explicit CAD models per object instance unnecessary. The image synthesis network is designed to efficiently span the pose configuration space so that model capacity can be used to capture the shape and local appearance (i.e., texture) variations jointly. At inference time the synthesized images are compared to the target via an appearance based loss and the error signal is backpropagated through the network to the input parameters. Keeping the network parameters fixed, this allows for iterative optimization of the object pose, shape and appearance in a joint manner and we experimentally show that the method can recover orientation of objects with high accuracy from 2D images alone. When provided with depth measurements, to overcome scale ambiguities, the method can accurately recover the full 6DOF pose successfully.

...read moreread less

Journal Article•DOI•

A Comprehensive Survey of Indoor Localization Methods Based on Computer Vision.

[...]

Anca Morar¹, Alin Moldoveanu¹, Irina Mocanu¹, Florica Moldoveanu¹, Ion Emilian Radoi¹, Victor Asavei¹, Alexandru Gradinaru¹, Alexandru Butean² - Show less +4 more•Institutions (2)

Politehnica University of Bucharest¹, Lucian Blaga University of Sibiu²

06 May 2020-Sensors

TL;DR: An overview of the computer vision based indoor localization domain is offered, presenting application areas, commercial tools, existing benchmarks, and other reviews, and proposing a new classification based on the configuration stage (use of known environment data), sensing devices, type of detected elements, and localization method.

...read moreread less

Abstract: Computer vision based indoor localization methods use either an infrastructure of static cameras to track mobile entities (e.g., people, robots) or cameras attached to the mobile entities. Methods in the first category employ object tracking, while the others map images from mobile cameras with images acquired during a configuration stage or extracted from 3D reconstructed models of the space. This paper offers an overview of the computer vision based indoor localization domain, presenting application areas, commercial tools, existing benchmarks, and other reviews. It provides a survey of indoor localization research solutions, proposing a new classification based on the configuration stage (use of known environment data), sensing devices, type of detected elements, and localization method. It groups 70 of the most recent and relevant image based indoor localization methods according to the proposed classification and discusses their advantages and drawbacks. It highlights localization methods that also offer orientation information, as this is required by an increasing number of applications of indoor localization (e.g., augmented reality).

...read moreread less

Journal Article•DOI•

Measurement of full-field displacement time history of a vibrating continuous edge from video

[...]

Sutanu Bhowmick¹, Satish Nagarajaiah¹, Zhilu Lai¹•Institutions (1)

Rice University¹

01 Oct 2020-Mechanical Systems and Signal Processing

TL;DR: A novel method of measuring full-field displacement response of a vibrating continuous edge of a structural component is proposed from its video and shows high correspondence between the actual motion of the cable and traced motion of its edge, over time.

...read moreread less

Book Chapter•DOI•

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

[...]

Fangyun Wei¹, Xiao Sun¹, Hongyang Li², Jingdong Wang¹, Stephen Lin¹ - Show less +1 more•Institutions (2)

Microsoft¹, Peking University²

23 Aug 2020

TL;DR: Point-Set Anchors as discussed by the authors proposes to regress from a set of points placed at more advantageous positions, which are arranged to reflect a good initialization for the given task such as modes in the training data for pose estimation, which lie closer to the ground truth than the central point and provide more informative features for regression.

...read moreread less

Abstract: A recent approach for object detection and human pose estimation is to regress bounding boxes or human keypoints from a central point on the object or person. While this center-point regression is simple and efficient, we argue that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation. To facilitate inference, we propose to instead perform regression from a set of points placed at more advantageous positions. This point set is arranged to reflect a good initialization for the given task, such as modes in the training data for pose estimation, which lie closer to the ground truth than the central point and provide more informative features for regression. As the utility of a point set depends on how well its scale, aspect ratio and rotation matches the target, we adopt the anchor box technique of sampling these transformations to generate additional point-set candidates. We apply this proposed framework, called Point-Set Anchors, to object detection, instance segmentation, and human pose estimation. Our results show that this general-purpose approach can achieve performance competitive with state-of-the-art methods for each of these tasks .

...read moreread less

Journal Article•DOI•

Tomato leaf segmentation algorithms for mobile phone applications using deep learning

[...]

Lawrence C. Ngugi¹, Moataz M. Abdelwahab¹, Mohammed Abo-Zahhad¹, Mohammed Abo-Zahhad²•Institutions (2)

Egypt-Japan University of Science and Technology¹, Assiut University²

01 Nov 2020-Computers and Electronics in Agriculture

TL;DR: This paper proposes fully convolutional neural networks to perform automatic background subtraction for leaf images captured in mobile applications and reports state-of-the-art leaf image segmentation performance.

...read moreread less

Journal Article•DOI•

The Prediction of Saliency Map for Head and Eye Movements in 360 Degree Images

[...]

Yucheng Zhu¹, Guangtao Zhai¹, Xiongkuo Min¹, Jiantao Zhou²•Institutions (2)

Shanghai Jiao Tong University¹, University of Macau²

01 Sep 2020-IEEE Transactions on Multimedia

TL;DR: A new panorama-oriented model, to predict head and eye movements, is proposed, and the spherical harmonics are employed to extract features at different frequency bands and orientations to estimate the saliency.

...read moreread less

Abstract: By recording the whole scene around the capturer, virtual reality (VR) techniques can provide viewers the sense of presence. To provide a satisfactory quality of experience, there should be at least 60 pixels per degree, so the resolution of panoramas should reach 21600 × 10800. The huge amount of data will put great demands on data processing and transmission. However, when exploring in the virtual environment, viewers only perceive the content in the current field of view (FOV). Therefore if we can predict the head and eye movements which are important behaviors of viewer, more processing resources can be allocated to the active FOV. But conventional saliency prediction methods are not fully adequate for panoramic images. In this paper, a new panorama-oriented model, to predict head and eye movements, is proposed. Due to the superiority of computation in the spherical domain, the spherical harmonics are employed to extract features at different frequency bands and orientations. Related low- and high-level features including the rare components in the frequency domain and color domain, the difference between center vision and peripheral vision, visual equilibrium, person and car detection, and equator bias are extracted to estimate the saliency. To predict head movements, visual mechanisms including visual uncertainty and equilibrium are incorporated, and the graphical model and functional representation for the switch of head orientation are established. Extensive experimental results on the publicly available database demonstrate the effectiveness of our methods.

...read moreread less

Journal Article•DOI•

Axis Learning for Orientated Objects Detection in Aerial Images

[...]

Zhifeng Xiao, Linjun Qian, Weiping Shao, Xiaowei Tan, Kai Wang - Show less +1 more

12 Mar 2020-Remote Sensing

TL;DR: A new one-stage anchor-free method to detect orientated objects in per-pixel prediction fashion with less computational complexity is proposed and a new aspect-ratio-aware orientation centerness method is proposed to better weigh positive pixel points, in order to guide the network to learn discriminative features from a complex background, which brings improvements for large aspect ratio object detection.

...read moreread less

Abstract: Orientated object detection in aerial images is still a challenging task due to the bird’s eye view and the various scales and arbitrary angles of objects in aerial images. Most current methods for orientated object detection are anchor-based, which require considerable pre-defined anchors and are time consuming. In this article, we propose a new one-stage anchor-free method to detect orientated objects in per-pixel prediction fashion with less computational complexity. Arbitrary orientated objects are detected by predicting the axis of the object, which is the line connecting the head and tail of the object, and the width of the object is vertical to the axis. By predicting objects at the pixel level of feature maps directly, the method avoids setting a number of hyperparameters related to anchor and is computationally efficient. Besides, a new aspect-ratio-aware orientation centerness method is proposed to better weigh positive pixel points, in order to guide the network to learn discriminative features from a complex background, which brings improvements for large aspect ratio object detection. The method is tested on two common aerial image datasets, achieving better performance compared with most one-stage orientated methods and many two-stage anchor-based methods with a simpler procedure and lower computational complexity.

...read moreread less

Journal Article•DOI•

Dense Steerable Filter CNNs for Exploiting Rotational Symmetry in Histology Images

[...]

Simon Graham¹, David Epstein¹, Nasir M. Rajpoot¹•Institutions (1)

University of Warwick¹

06 Apr 2020-IEEE Transactions on Medical Imaging

TL;DR: DSF-CNN as mentioned in this paper uses group convolutions with multiple rotated copies of each filter in a densely connected framework, enabling exact rotation and decreasing the number of trainable parameters compared to standard filters.

...read moreread less

Abstract: Histology images are inherently symmetric under rotation, where each orientation is equally as likely to appear. However, this rotational symmetry is not widely utilised as prior knowledge in modern Convolutional Neural Networks (CNNs), resulting in data hungry models that learn independent features at each orientation. Allowing CNNs to be rotation-equivariant removes the necessity to learn this set of transformations from the data and instead frees up model capacity, allowing more discriminative features to be learned. This reduction in the number of required parameters also reduces the risk of overfitting. In this paper, we propose Dense Steerable Filter CNNs (DSF-CNNs) that use group convolutions with multiple rotated copies of each filter in a densely connected framework. Each filter is defined as a linear combination of steerable basis filters, enabling exact rotation and decreasing the number of trainable parameters compared to standard filters. We also provide the first in-depth comparison of different rotation-equivariant CNNs for histology image analysis and demonstrate the advantage of encoding rotational symmetry into modern architectures. We show that DSF-CNNs achieve state-of-the-art performance, with significantly fewer parameters, when applied to three different tasks in the area of computational pathology: breast tumour classification, colon gland segmentation and multi-tissue nuclear segmentation.

...read moreread less

Posted Content•

Point-Set Anchors for Object Detection, Instance Segmentation and Pose Estimation

[...]

Fangyun Wei¹, Xiao Sun¹, Hongyang Li², Jingdong Wang¹, Stephen Lin¹ - Show less +1 more•Institutions (2)

Microsoft¹, Peking University²

06 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work argues that the image features extracted at a central point contain limited information for predicting distant keypoints or bounding box boundaries, due to object deformation and scale/orientation variation, and proposes to perform regression from a set of points placed at more advantageous positions to facilitate inference.

...read moreread less

Posted Content•

se(3)-TrackNet: Data-driven 6D Pose Tracking by Calibrating Image Residuals in Synthetic Domains

[...]

Bowen Wen¹, Chaitanya Mitash¹, Baozhang Ren¹, Kostas E. Bekris¹•Institutions (1)

Rutgers University¹

27 Jul 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a data-driven optimization approach for long-term, 6D pose tracking, which aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object’s model.

...read moreread less

Abstract: Tracking the 6D pose of objects in video sequences is important for robot manipulation. This task, however, introduces multiple challenges: (i) robot manipulation involves significant occlusions; (ii) data and annotations are troublesome and difficult to collect for 6D poses, which complicates machine learning solutions, and (iii) incremental error drift often accumulates in long term tracking to necessitate re-initialization of the object's pose. This work proposes a data-driven optimization approach for long-term, 6D pose tracking. It aims to identify the optimal relative pose given the current RGB-D observation and a synthetic image conditioned on the previous best estimate and the object's model. The key contribution in this context is a novel neural network architecture, which appropriately disentangles the feature encoding to help reduce domain shift, and an effective 3D orientation representation via Lie Algebra. Consequently, even when the network is trained only with synthetic data can work effectively over real images. Comprehensive experiments over benchmarks - existing ones as well as a new dataset with significant occlusions related to object manipulation - show that the proposed approach achieves consistently robust estimates and outperforms alternatives, even though they have been trained with real images. The approach is also the most computationally efficient among the alternatives and achieves a tracking frequency of 90.9Hz.

...read moreread less

Journal Article•

A Novel CNN-based Method for Accurate Ship Detection in HR Optical Remote Sensing Images via Rotated Bounding Box

[...]

Linhao Li¹, Zhiqiang Zhou¹, Bo Wang¹, Lingjuan Miao¹, Hua Zong - Show less +1 more•Institutions (1)

Beijing Institute of Technology¹

15 Apr 2020-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel CNN-based ship detection method that is able to predict the orientation and other variables independently, and yet more effectively, with a novel dual-branch regression network, based on the observation that the ship targets are nearly rotation-invariant in remote sensing images.

...read moreread less

Abstract: Currently, reliable and accurate ship detection in optical remote sensing images is still challenging. Even the state-of-the-art convolutional neural network (CNN) based methods cannot obtain very satisfactory results. To more accurately locate the ships in diverse orientations, some recent methods conduct the detection via the rotated bounding box. However, it further increases the difficulty of detection, because an additional variable of ship orientation must be accurately predicted in the algorithm. In this paper, a novel CNN-based ship detection method is proposed, by overcoming some common deficiencies of current CNN-based methods in ship detection. Specifically, to generate rotated region proposals, current methods have to predefine multi-oriented anchors, and predict all unknown variables together in one regression process, limiting the quality of overall prediction. By contrast, we are able to predict the orientation and other variables independently, and yet more effectively, with a novel dual-branch regression network, based on the observation that the ship targets are nearly rotation-invariant in remote sensing images. Next, a shape-adaptive pooling method is proposed, to overcome the limitation of typical regular ROI-pooling in extracting the features of the ships with various aspect ratios. Furthermore, we propose to incorporate multilevel features via the spatially-variant adaptive pooling. This novel approach, called multilevel adaptive pooling, leads to a compact feature representation more qualified for the simultaneous ship classification and localization. Finally, detailed ablation study performed on the proposed approaches is provided, along with some useful insights. Experimental results demonstrate the great superiority of the proposed method in ship detection.

...read moreread less

Journal Article•DOI•

Vision and Deep Learning-Based Algorithms to Detect and Quantify Cracks on Concrete Surfaces from UAV Videos

[...]

Sutanu Bhowmick¹, Satish Nagarajaiah¹, Ashok Veeraraghavan¹•Institutions (1)

Rice University¹

05 Nov 2020-Sensors

TL;DR: A framework of detection, quantification, and localization of damage on a civil infrastructure using the proposed framework can directly be used in the prognosis of the structure’s ability to withstand service loads.

...read moreread less

Abstract: Immediate assessment of structural integrity of important civil infrastructures, like bridges, hospitals, or dams, is of utmost importance after natural disasters. Currently, inspection is performed manually by engineers who look for local damages and their extent on significant locations of the structure to understand its implication on its global stability. However, the whole process is time-consuming and prone to human errors. Due to their size and extent, some regions of civil structures are hard to gain access for manual inspection. In such situations, a vision-based system of Unmanned Aerial Vehicles (UAVs) programmed with Artificial Intelligence algorithms may be an effective alternative to carry out a health assessment of civil infrastructures in a timely manner. This paper proposes a framework of achieving the above-mentioned goal using computer vision and deep learning algorithms for detection of cracks on the concrete surface from its image by carrying out image segmentation of pixels, i.e., classification of pixels in an image of the concrete surface and whether it belongs to cracks or not. The image segmentation or dense pixel level classification is carried out using a deep neural network architecture named U-Net. Further, morphological operations on the segmented images result in dense measurements of crack geometry, like length, width, area, and crack orientation for individual cracks present in the image. The efficacy and robustness of the proposed method as a viable real-life application was validated by carrying out a laboratory experiment of a four-point bending test on an 8-foot-long concrete beam of which the video is recorded using a camera mounted on a UAV-based, as well as a still ground-based, video camera. Detection, quantification, and localization of damage on a civil infrastructure using the proposed framework can directly be used in the prognosis of the structure's ability to withstand service loads.

...read moreread less

Journal Article•DOI•

A New Quaternion Kalman Filter Based Foot-Mounted IMU and UWB Tightly-Coupled Method for Indoor Pedestrian Navigation

[...]

Kai Wen¹, Kegen Yu², Yingbing Li¹, Shubi Zhang², Wanwei Zhang¹ - Show less +1 more•Institutions (2)

Wuhan University¹, China University of Mining and Technology²

18 Feb 2020-IEEE Transactions on Vehicular Technology

TL;DR: A novel IPN method based on shoe-mounted micro-electro-mechanical systems inertial measurement unit and ultra-wideband which is able to obtain high-precision position and orientation estimates at low cost and the system complexity is reduced.

...read moreread less

Abstract: In the field of indoor pedestrian navigation (IPN), the orientation information of a pedestrian is often obtained by means of strap-down inertial navigation system (SINS). To deal with the problem of divergence in SINS based orientation estimates, additional orientation sensors, such as a camera, are needed to provide external orientation observations, resulting in increased cost and complexity of system. Although a low-cost magnetometer (or compass) can be used, it is significantly affected by geomagnetic disturbances indoors. Besides, the magnetometer can only give the heading observation which is insufficient to correct orientation errors in all three directions. In this paper, we propose a novel IPN method based on shoe-mounted micro-electro-mechanical systems inertial measurement unit and ultra-wideband. The biggest advantage of this method is able to obtain high-precision position and orientation estimates at low cost. In addition, in the proposed method, the data fusion is implemented by a quaternion Kalman filter which does not involve any complex linearization and hence the system complexity is reduced. Experimental results show that a decimeter level position accuracy is achieved and the orientation drifts can be limited to 0.066 radians in indoor environments.

...read moreread less

Book Chapter•DOI•

AIM 2020: Scene Relighting and Illumination Estimation Challenge

[...]

Majed El Helou¹, Ruofan Zhou¹, Sabine Süsstrunk¹, Radu Timofte², Mahmoud Afifi³, Michael S. Brown³, Kele Xu⁴, Hengxing Cai⁴, Yuzhong Liu⁴, Li-Wen Wang⁵, Zhi-Song Liu⁵, Chu-Tak Li⁵, Sourya Dipta Das⁶, Nisarg Shah⁷, Akashdeep Jassal⁸, Tongtong Zhao⁹, Shanshan Zhao, Sabari Nathan, M. Parisa Beham¹⁰, R. Suganya¹¹, Qing Wang¹², Zhongyun Hu¹², Xin Huang¹², Yaning Li¹², Maitreya Suin¹³, Kuldeep Purohit¹³, A. N. Rajagopalan¹³, Densen Puthussery¹⁴, P. S. Hrishikesh¹⁴, Melvin Kuriakose¹⁴, C. V. Jiji¹⁴, Yu Zhu¹⁵, Liping Dong¹⁵, Zhuolong Jiang¹⁵, Chenghua Li¹⁵, Cong Leng¹⁵, Jian Cheng¹⁵ - Show less +33 more•Institutions (15)

École Polytechnique Fédérale de Lausanne¹, ETH Zurich², York University³, National University of Defense Technology⁴, Hong Kong Polytechnic University⁵, Jadavpur University⁶, Indian Institute of Technology, Jodhpur⁷, PEC University of Technology⁸, Dalian Maritime University⁹, Sethu Institute of Technology¹⁰, Thiagarajar College of Engineering¹¹, Northwestern Polytechnical University¹², Indian Institute of Technology Madras¹³, College of Engineering, Trivandrum¹⁴, Chinese Academy of Sciences¹⁵

23 Aug 2020

TL;DR: The AIM 2020 challenge on virtual image relighting and illumination estimation as discussed by the authors focused on one-to-one relighting, where the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation.

...read moreread less

Abstract: We review the AIM 2020 challenge on virtual image relighting and illumination estimation. This paper presents the novel VIDIT dataset used in the challenge and the different proposed solutions and final evaluation results over the 3 challenge tracks. The first track considered one-to-one relighting; the objective was to relight an input photo of a scene with a different color temperature and illuminant orientation (i.e., light source position). The goal of the second track was to estimate illumination settings, namely the color temperature and orientation, from a given image. Lastly, the third track dealt with any-to-any relighting, thus a generalization of the first track. The target color temperature and orientation, rather than being pre-determined, are instead given by a guide image. Participants were allowed to make use of their track 1 and 2 solutions for track 3. The tracks had 94, 52, and 56 registered participants, respectively, leading to 20 confirmed submissions in the final competition stage.

...read moreread less

Journal Article•DOI•

GNSS/INS-Assisted Structure from Motion Strategies for UAV-Based Imagery over Mechanized Agricultural Fields

[...]

Seyyed Meghdad Hasheminasab, Tian Zhou, Ayman Habib

21 Jan 2020-Remote Sensing

TL;DR: Two structure from motion strategies are provided, which use trajectory information provided by an onboard survey-grade global navigation satellite system/inertial navigation system (GNSS/INS) and system calibration parameters to generate denser and more accurate 3D point clouds as well as orthophotos without any gaps.

...read moreread less

Abstract: Acquired imagery by unmanned aerial vehicles (UAVs) has been widely used for three-dimensional (3D) reconstruction/modeling in various digital agriculture applications, such as phenotyping, crop monitoring, and yield prediction. 3D reconstruction from well-textured UAV-based images has matured and the user community has access to several commercial and opensource tools that provide accurate products at a high level of automation. However, in some applications, such as digital agriculture, due to repetitive image patterns, these approaches are not always able to produce reliable/complete products. The main limitation of these techniques is their inability to establish a sufficient number of correctly matched features among overlapping images, causing incomplete and/or inaccurate 3D reconstruction. This paper provides two structure from motion (SfM) strategies, which use trajectory information provided by an onboard survey-grade global navigation satellite system/inertial navigation system (GNSS/INS) and system calibration parameters. The main difference between the proposed strategies is that the first one—denoted as partially GNSS/INS-assisted SfM—implements the four stages of an automated triangulation procedure, namely, imaging matching, relative orientation parameters (ROPs) estimation, exterior orientation parameters (EOPs) recovery, and bundle adjustment (BA). The second strategy— denoted as fully GNSS/INS-assisted SfM—removes the EOPs estimation step while introducing a random sample consensus (RANSAC)-based strategy for removing matching outliers before the BA stage. Both strategies modify the image matching by restricting the search space for conjugate points. They also implement a linear procedure for ROPs’ refinement. Finally, they use the GNSS/INS information in modified collinearity equations for a simpler BA procedure that could be used for refining system calibration parameters. Eight datasets over six agricultural fields are used to evaluate the performance of the developed strategies. In comparison with a traditional SfM framework and Pix4D Mapper Pro, the proposed strategies are able to generate denser and more accurate 3D point clouds as well as orthophotos without any gaps.

...read moreread less

Collapse