Showing papers on "Orientation (computer vision) published in 2018"

PDF

Open Access

Journal Article•DOI•

SECOND: Sparsely Embedded Convolutional Detection

[...]

Yan Yan¹, Yuxing Mao¹, Bo Li•Institutions (1)

06 Oct 2018-Sensors

TL;DR: An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.

...read moreread less

Abstract: LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed.

...read moreread less

1,624 citations

Proceedings Article•DOI•

DOTA: A Large-Scale Dataset for Object Detection in Aerial Images

[...]

Gui-Song Xia¹, Xiang Bai¹, Jian Ding¹, Zhen Zhu¹, Serge Belongie¹, Jiebo Luo¹, Mihai Datcu¹, Marcello Pelillo¹, Liangpei Zhang¹ - Show less +5 more•Institutions (1)

Wuhan University¹

01 Jun 2018

TL;DR: The Dataset for Object Detection in Aerial Images (DOTA) as discussed by the authors is a large-scale dataset of aerial images collected from different sensors and platforms and contains objects exhibiting a wide variety of scales, orientations, and shapes.

...read moreread less

Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect 2806 aerial images from different sensors and platforms. Each image is of the size about 4000 A— 4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using 15 common object categories. The fully annotated DOTA images contains 188, 282 instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral. To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

...read moreread less

1,502 citations

Book Chapter•DOI•

Implicit 3D Orientation Learning for 6D Object Detection from RGB Images

[...]

Martin Sundermeyer¹, Zoltan-Csaba Marton¹, Maximilian Durner¹, Manuel Brucker¹, Rudolph Triebel¹ - Show less +1 more•Institutions (1)

German Aerospace Center¹

08 Sep 2018

TL;DR: This work proposes a real-time RGB-based pipeline for object detection and 6D pose estimation based on a variant of the Denoising Autoencoder trained on simulated views of a 3D model using Domain Randomization.

...read moreread less

Abstract: We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization.

...read moreread less

549 citations

Journal Article•DOI•

DeepIM: Deep Iterative Matching for 6D Pose Estimation

[...]

Yi Li¹, Gu Wang¹, Xiangyang Ji¹, Yu Xiang², Dieter Fox² - Show less +1 more•Institutions (2)

Tsinghua University¹, Nvidia²

31 Mar 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: A novel deep neural network for 6D pose matching named DeepIM is proposed, trained to predict a relative pose transformation using a disentangled representation of 3D location and 3D orientation and an iterative training process.

...read moreread less

Abstract: Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the observed image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

...read moreread less

220 citations

Book Chapter•DOI•

Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery

[...]

Seyed Majid Azimi¹, Eleonora Vig¹, Reza Bahmanyar¹, Marco Körner², Peter Reinartz¹ - Show less +1 more•Institutions (2)

German Aerospace Center¹, Technische Universität München²

02 Dec 2018

TL;DR: Zhang et al. as discussed by the authors proposed a new method consisting of a joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features, which are fed into rotation-based region proposal and region of interest networks to produce object detections.

...read moreread less

Abstract: Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (\(+6\)% and \(+12\)% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.

...read moreread less

158 citations

Journal Article•DOI•

OS-SIFT: A Robust SIFT-Like Algorithm for High-Resolution Optical-to-SAR Image Registration in Suburban Areas

[...]

Yuming Xiang¹, Feng Wang¹, Hongjian You¹•Institutions (1)

Chinese Academy of Sciences¹

30 Jan 2018-IEEE Transactions on Geoscience and Remote Sensing

TL;DR: The experimental results show that the proposed OS-SIFT algorithm gives a robust registration result for optical-to-SAR images and outperforms other state-of-the-art algorithms in terms of registration accuracy.

...read moreread less

Abstract: Although the scale-invariant feature transform (SIFT) algorithm has been successfully applied to both optical image registration and synthetic aperture radar (SAR) image registration, SIFT-like algorithms have failed to register high-resolution (HR) optical and SAR images due to large geometric differences and intensity differences. In this paper, to perform optical-to-SAR (OS) image registration, we proposed an advanced SIFT-like algorithm (OS-SIFT) that consists of three main modules: keypoint detection in two Harris scale spaces, orientation assignment and descriptor extraction, and keypoint matching. Considering the inherent properties of SAR images and optical images, the multiscale ratio of exponentially weighted averages and multiscale Sobel operators are used to calculate consistent gradients for the SAR images and optical images on the basis of which, as a result, two Harris scale spaces can be constructed. Keypoints are detected by finding the local maxima in the scale space followed by a localization refinement method based on the spatial relationship of the keypoints. Moreover, gradient location orientation histogram-like descriptors are extracted using multiple image patches to increase the distinctiveness. The experimental results on simulated images and several HR satellite images show that the proposed OS-SIFT algorithm gives a robust registration result for optical-to-SAR images and outperforms other state-of-the-art algorithms in terms of registration accuracy.

...read moreread less

145 citations

Journal Article•DOI•

Ω-Net (Omega-Net): Fully automatic, multi-view cardiac MR detection, orientation, and segmentation with deep neural networks.

[...]

Davis M. Vigneault¹, Davis M. Vigneault², Davis M. Vigneault³, Weidi Xie¹, Carolyn Y. Ho⁴, David A. Bluemke⁵, J. Alison Noble¹ - Show less +3 more•Institutions (5)

University of Oxford¹, National Institutes of Health², Tufts University³, Brigham and Women's Hospital⁴, University of Wisconsin-Madison⁵

22 May 2018-Medical Image Analysis

TL;DR: This architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally, as well as for convolutional neural network architecture in general.

...read moreread less

111 citations

Posted Content•

Night-to-Day Image Translation for Retrieval-based Localization

[...]

Asha Anoosheh¹, Torsten Sattler², Radu Timofte¹, Marc Pollefeys¹, Luc Van Gool¹ - Show less +1 more•Institutions (2)

ETH Zurich¹, Chalmers University of Technology²

26 Sep 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes ToDayGAN – a modified image-translation model to alter nighttime driving images to a more useful daytime representation, and improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

...read moreread less

Abstract: Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query's pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings. In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN - a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

...read moreread less

102 citations

Journal Article•DOI•

A review on the applications of image logs in structural analysis and sedimentary characterization

[...]

Jin Lai¹, Guiwen Wang¹, Song Wang¹, Cao Juntao², Mei Li², Xiaojiao Pang¹, Chuang Han², Xuqiang Fan¹, Liu Yang¹, Zhibo He¹, Ziqiang Qin³ - Show less +7 more•Institutions (3)

China University of Petroleum¹, China National Petroleum Corporation², University of Wyoming³

01 Aug 2018-Marine and Petroleum Geology

TL;DR: In this paper, a thoroughgoing review focusing on the recent applications of borehole image logs for sedimentological and structural description and interpretation, and aims to establish image log facies which can provide guidelines in sedimentary reservoir interpretation.

...read moreread less

101 citations

Posted Content•

Monocular Total Capture: Posing Face, Body, and Hands in the Wild.

[...]

Donglai Xiang¹, Hanbyul Joo², Yaser Sheikh¹•Institutions (2)

Carnegie Mellon University¹, Facebook²

04 Dec 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work presents the first method to capture the 3D total motion of a target person from a monocular view input, and leverages a 3D deformable human model to reconstruct total body pose from the CNN outputs with the aid of the pose and shape prior in the model.

...read moreread less

Abstract: We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image space. POFs are predicted by a Fully Convolutional Network (FCN), along with the joint confidence maps. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. We leverage a 3D deformable human model to reconstruct total body pose from the CNN outputs by exploiting the pose and shape prior in the model. We also present a texture-based tracking method to obtain temporally coherent motion capture output. We perform thorough quantitative evaluations including comparison with the existing body-specific and hand-specific methods, and performance analysis on camera viewpoint and human pose changes. Finally, we demonstrate the results of our total body motion capture on various challenging in-the-wild videos. Our code and newly collected human motion dataset will be publicly shared.

...read moreread less

100 citations

Journal Article•DOI•

Image retrieval using BIM and features from pretrained VGG network for indoor localization

[...]

Inhae Ha¹, Hongjo Kim¹, Somin Park¹, Hyoungkwan Kim¹•Institutions (1)

Yonsei University¹

01 Aug 2018-Building and Environment

TL;DR: This study presents a new image-based indoor localization method using building information modeling (BIM) and convolutional neural networks (CNNs) that constructs a dataset with rendered BIM images and searches the dataset for images most similar to indoor photographs, thereby estimating the indoor position and orientation of the photograph.

...read moreread less

Journal Article•DOI•

Dynamic hand gesture recognition using vision-based approach for human---computer interaction

[...]

Joyeeta Singha¹, Amarjit Roy¹, Rabul Hussain Laskar¹•Institutions (1)

National Institute of Technology, Silchar¹

01 Feb 2018-Neural Computing and Applications

TL;DR: A vision-based approach is used to build a dynamic hand gesture recognition system and it is concluded that classifier fusion provides satisfactory results compared to other individual classifiers.

...read moreread less

Abstract: In this work, a vision-based approach is used to build a dynamic hand gesture recognition system. Various challenges such as complicated background, change in illumination and occlusion make the detection and tracking of hand difficult in any vision-based approaches. To overcome such challenges, a hand detection technique is developed by combining three-frame differencing and skin filtering. The three-frame differencing is performed for both colored and grayscale frames. The hand is then tracked using modified Kanade---Lucas---Tomasi feature tracker where the features were selected using the compact criteria. Velocity and orientation information were added to remove the redundant feature points. Finally, color cue information is used to locate the final hand region in the tracked region. During the feature extraction, 44 features were selected from the existing literatures. Using all the features could lead to overfitting, information redundancy and dimension disaster. Thus, a system with optimal features was selected using analysis of variance combined with incremental feature selection. These selected features were then fed as an input to the ANN, SVM and kNN model. These individual classifiers were combined to produce classifier fusion model. Fivefold cross-validation has been used to evaluate the performance of the proposed model. Based on the experimental results, it may be concluded that classifier fusion provides satisfactory results (92.23 %) compared to other individual classifiers. One-way analysis of variance test, Friedman's test and Kruskal---Wallis test have also been conducted to validate the statistical significance of the results.

...read moreread less

Journal Article•DOI•

Reflectance and Natural Illumination from Single-Material Specular Objects Using Deep Learning

[...]

Stamatios Georgoulis¹, Konstantinos Rematas², Tobias Ritschel³, Efstratios Gavves⁴, Mario Fritz⁵, Luc Van Gool¹, Tinne Tuytelaars¹ - Show less +3 more•Institutions (5)

Katholieke Universiteit Leuven¹, University of Washington², University College London³, University of Amsterdam⁴, Max Planck Society⁵

01 Aug 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A data-driven, learning-based approach trained on a very large dataset that estimates reflectance and illumination information from a single image depicting a single-material specular object from a given class under natural illumination is presented.

...read moreread less

Abstract: In this paper, we present a method that estimates reflectance and illumination information from a single image depicting a single-material specular object from a given class under natural illumination. We follow a data-driven, learning-based approach trained on a very large dataset, but in contrast to earlier work we do not assume one or more components (shape, reflectance, or illumination) to be known. We propose a two-step approach, where we first estimate the object’s reflectance map, and then further decompose it into reflectance and illumination. For the first step, we introduce a Convolutional Neural Network (CNN) that directly predicts a reflectance map from the input image itself, as well as an indirect scheme that uses additional supervision, first estimating surface orientation and afterwards inferring the reflectance map using a learning-based sparse data interpolation technique. For the second step, we suggest a CNN architecture to reconstruct both Phong reflectance parameters and high-resolution spherical illumination maps from the reflectance map. We also propose new datasets to train these CNNs. We demonstrate the effectiveness of our approach for both steps by extensive quantitative and qualitative evaluation in both synthetic and real data as well as through numerous applications, that show improvements over the state-of-the-art.

...read moreread less

Journal Article•DOI•

A novel pore extraction method for heterogeneous fingerprint images using Convolutional Neural Networks

[...]

Ruggero Donida Labati¹, Angelo Genovese¹, Enrique Muñoz¹, Vincenzo Piuri¹, Fabio Scotti¹ - Show less +1 more•Institutions (1)

University of Milan¹

01 Oct 2018-Pattern Recognition Letters

TL;DR: This paper proposes the first method in the literature able to extract the coordinates of the pores from touch-based, touchless, and latent fingerprint images, and uses specifically designed and trained Convolutional Neural Networks to estimate and refine the centroid of each pore.

...read moreread less

Posted Content•

Towards Multi-class Object Detection in Unconstrained Remote Sensing Imagery

[...]

Seyed Majid Azimi¹, Eleonora Vig¹, Reza Bahmanyar¹, Marco Körner², Peter Reinartz¹ - Show less +1 more•Institutions (2)

German Aerospace Center¹, Technische Universität München²

07 Jul 2018-arXiv: Computer Vision and Pattern Recognition

TL;DR: This work proposes a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features that can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations.

...read moreread less

Abstract: Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (+6% and +12% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.

...read moreread less

Journal Article•DOI•

Indoor Positioning Based on Fingerprint-Image and Deep Learning

[...]

Wenhua Shao¹, Haiyong Luo², Fang Zhao¹, Yan Ma¹, Zhongliang Zhao³, Antonino Crivello - Show less +2 more•Institutions (3)

Beijing University of Posts and Telecommunications¹, Chinese Academy of Sciences², University of Bern³

30 Nov 2018-IEEE Access

TL;DR: It is shown that the CNN solution is able to automatically learn location patterns, thus significantly lower the workforce burden of designing a localization system and achieves an accuracy of about 1 m under different smartphone orientations, users, and use patterns.

...read moreread less

Abstract: Wi-Fi and magnetic field fingerprinting have been a hot topic in indoor positioning researches because of their ubiquity and location-related features. Wi-Fi signals can provide rough initial positions, and magnetic fields can further improve the positioning accuracies, therefore many researchers have tried to combine the two signals for high-accuracy indoor localization. Currently, state-of-the-art solutions design separate algorithms to process different indoor signals. Outputs of these algorithms are generally used as inputs of data fusion strategies. These methods rely on computationally expensive particle filters, labor-intensive feature analysis, and time-consuming parameter tuning to achieve better accuracies. Besides, particle filters need to estimate the moving directions of particles, limiting smartphone orientation to be stable, and aligned with the user’s moving directions. In this paper, we adopted a convolutional neural network (CNN) to implement an accurate and orientation-free positioning system. Inspired by the state-of-the-art image classification methods, we design a novel hybrid location image using Wi-Fi and magnetic field fingerprints, and then a CNN is employed to classify the locations of the fingerprint images. In order to prevent the overfitting problem of the positioning CNN on limited training datasets, we also propose to divide the learning process into two steps to adopt proper learning strategies for different network branches. We show that the CNN solution is able to automatically learn location patterns, thus significantly lower the workforce burden of designing a localization system. Our experimental results convincingly reveal that the proposed positioning method achieves an accuracy of about 1 m under different smartphone orientations, users, and use patterns.

...read moreread less

Journal Article•DOI•

Simultaneous Ship Detection and Orientation Estimation in SAR Images Based on Attention Module and Angle Regression.

[...]

Jizhou Wang¹, Changhua Lu¹, Weiwei Jiang¹•Institutions (1)

Hefei University of Technology¹

29 Aug 2018-Sensors

TL;DR: A semantic aggregation method which fuses features in a top-down way and can provide abundant location and semantic information, which is helpful for classification and location is developed.

...read moreread less

Abstract: Ship detection and angle estimation in SAR images play an important role in marine surveillance Previous works have detected ships first and estimated their orientations second This is time-consuming and tedious In order to solve the problems above, we attempt to combine these two tasks using a convolutional neural network so that ships may be detected and their orientations estimated simultaneously The proposed method is based on the original SSD (Single Shot Detector), but using a rotatable bounding box This method can learn and predict the class, location, and angle information of ships using only one forward computation The generated oriented bounding box is much tighter than the traditional bounding box and is robust to background disturbances We develop a semantic aggregation method which fuses features in a top-down way This method can provide abundant location and semantic information, which is helpful for classification and location We adopt the attention module for the six prediction layers It can adaptively select meaningful features and neglect weak ones This is helpful for detecting small ships Multi-orientation anchors are designed with different sizes, aspect ratios, and orientations These can consider both speed and accuracy Angular regression is embedded into the existing bounding box regression module, and thus the angle prediction is output with the position and score, without requiring too many extra computations The loss function with angular regression is used for optimizing the model AAP (average angle precision) is used for evaluating the performance The experiments on the dataset demonstrate the effectiveness of our method

...read moreread less

Journal Article•DOI•

Degraded document image binarization using structural symmetry of strokes

[...]

Fuxi Jia¹, Cunzhao Shi¹, Kun He¹, Chunheng Wang¹, Baihua Xiao¹ - Show less +1 more•Institutions (1)

Chinese Academy of Sciences¹

01 Feb 2018-Pattern Recognition

TL;DR: The structural symmetric pixels (SSPs) are utilized to calculate the local threshold in neighborhood and the voting result of multiple thresholds will determine whether one pixel belongs to the foreground or not and an adaptive global threshold selection algorithm is proposed.

...read moreread less

Journal Article•DOI•

Real-time local smoothing for five-axis linear toolpath considering smoothing error constraints

[...]

Jie Huang¹, Xu Du¹, Li-Min Zhu¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Jan 2018-International Journal of Machine Tools & Manufacture

TL;DR: In this paper, a real-time G 2 continuous local smoothing method was proposed by replacing the corners of tool position and tool orientation paths with cubic B-splines, which can be implemented on-line and integrated into a self-developed open-architecture CNC system.

...read moreread less

Abstract: Five-axis linear toolpaths (or G 01 blocks) are widely used in CNC machine tools. The tangential and curvature discontinuities at the corners of linear toolpath result in feed fluctuation and deteriorate the machining efficiency and quality. Several methods have been proposed to locally smooth the corners for three-axis toolpath, but local smoothing for five-axis toolpath is still challenging due to two main difficulties: control of the tool orientation smoothing error and parameter synchronization between tool position and tool orientation. This paper proposes a real-time G 2 continuous local smoothing method by replacing the corners of tool position and tool orientation paths with cubic B-splines. The two difficulties in five-axis local smoothing are both resolved in simple and analytical ways. With a two-step method, the tool orientation smoothing error is directly and analytically constrained. By converting the remaining linear segments into B-splines, the C 1 continuity of the smooth tool position and smooth tool orientation paths is achieved. The parameter synchronization is realized by sharing the parameter of the tool orientation with that of the tool position. Compared with the existing analytical methods, the proposed method has a higher computation efficiency and a tighter tolerance in control of the orientation smoothing error. We have developed an open-source benchmark which validates the computation efficiency and error control ability of the proposed method. After smoothing and synchronization, the inserted B-spline of tool position is traversed with constant feedrate and the feedrate of the remaining linear segment is planned with jerk-bounded trajectory profile. The proposed smoothing method can be implemented on-line and has been integrated into a self-developed open-architecture CNC system. Its effectiveness for on-line generating smooth motion has been validated via simulations and experiments.

...read moreread less

Journal Article•DOI•

Material based salient object detection from hyperspectral images

[...]

Jie Liang¹, Jun Zhou², Lei Tong³, Xiao Bai⁴, Bin Wang¹ - Show less +1 more•Institutions (4)

China Aerodynamics Research and Development Center¹, Griffith University², Beijing University of Technology³, Beihang University⁴

01 Apr 2018-Pattern Recognition

TL;DR: A material-based salient object detection method which can effectively distinguish objects with similar perceived color but different spectral responses, and outperforms several existing hyperspectral salient object Detection approaches and the state-of-the-art methods proposed for RGB images.

...read moreread less

Journal Article•DOI•

Occlusion-aware depth estimation for light field using multi-orientation EPIs

[...]

Hao Sheng¹, Pan Zhao¹, Shuo Zhang¹, Jun Zhang², Da Yang¹ - Show less +1 more•Institutions (2)

Beihang University¹, University of Wisconsin–Milwaukee²

01 Feb 2018-Pattern Recognition

TL;DR: The multi-orientation EPIs and optimal orientation selection are proved to be effective in detecting and excluding occlusions and outperforms state-of-the-art depth estimation methods, especially near occlusion boundaries.

...read moreread less

Journal Article•DOI•

Local voxelized structure for 3D binary feature representation and robust registration of point clouds from low-cost sensors

[...]

Siwen Quan¹, Jie Ma¹, Fangyu Hu¹, Bin Fang¹, Tao Ma¹ - Show less +1 more•Institutions (1)

Huazhong University of Science and Technology¹

01 May 2018-Information Sciences

TL;DR: Experiments and extensive comparisons show the effectiveness and the over-all superiority of the proposed LoVS descriptor and LoVS-based point cloud registration algorithm for low-quality, e.g., noise and varying data resolutions.

...read moreread less

Journal Article•DOI•

DiODe: Directional Orientation Detection of Segmented Deep Brain Stimulation Leads: A Sequential Algorithm Based on CT Imaging.

[...]

Alexandra Hellerbach, Till A. Dembek, Mauritius Hoevels, Jasmin A Holz¹, Andreas Gierich, Klaus Luyken, Michael T. Barbe, Jochen Wirths, Veerle Visser-Vandewalle, Harald Treuer - Show less +6 more•Institutions (1)

University of Cologne¹

01 Jan 2018-Stereotactic and Functional Neurosurgery

TL;DR: An accurate and robust algorithm is developed that quantitatively compares the similarity of the observed CT artifacts with calculated artifact patterns based on the lead’s orientation marker and a geometric model of the segmented electrodes and provides highly accurate results for the orientation of the segmentsed electrodes for all angular constellations that typically occur in clinical cases.

...read moreread less

Abstract: Background Directional deep brain stimulation (DBS) allows steering the stimulation in an axial direction which offers greater flexibility in programming. However, accurate anatomical visualization of the lead orientation is required for interpreting the observed stimulation effects and to guide programming. Objectives In this study we aimed to develop and test an accurate and robust algorithm for determining the orientation of segmented electrodes based on standard postoperative CT imaging used in DBS. Methods Orientation angles of directional leads (CartesiaTM; Boston Scientific, Marlborough, MA, USA) were determined using CT imaging. Therefore, a sequential algorithm was developed that quantitatively compares the similarity of the observed CT artifacts with calculated artifact patterns based on the lead's orientation marker and a geometric model of the segmented electrodes. Measurements of seven ground truth phantoms and three leads with 60 different configurations of lead implantation and orientation angles were analyzed for validation. Results The accuracy of the determined electrode orientation angles was -0.6 ± 1.5° (range: -5.4 to 4.2°). This accuracy proved to be sufficiently high to resolve even subtle differences between individual leads. Conclusions The presented algorithm is user independent and provides highly accurate results for the orientation of the segmented electrodes for all angular constellations that typically occur in clinical cases.

...read moreread less

Journal Article•DOI•

Appearance based pedestrians head pose and body orientation estimation using deep learning

[...]

Mudassar Raza¹, Zonghai Chen¹, Saeed Ur Rehman¹, Peng Wang¹, Peng Bao¹ - Show less +1 more•Institutions (1)

University of Science and Technology of China¹

10 Jan 2018-Neurocomputing

TL;DR: An appearance-based pedestrian head-pose and full-body orientation prediction by employing a deep-learning mechanism and the comparison with existing state-of-the-art approaches demonstrates the effectiveness of the presented approach.

...read moreread less

Journal Article•DOI•

Curvilinear Structure Analysis by Ranking the Orientation Responses of Path Operators

[...]

Odyssée Merveille¹, Hugues Talbot¹, Laurent Najman¹, Nicolas Passat²•Institutions (2)

University of Paris¹, University of Reims Champagne-Ardenne²

01 Feb 2018-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: This article introduces a new, non-linear operator, called RORPO (Ranking the Orientation Responses of Path Operators), Inspired by the multidirectional paradigm currently used in linear filtering for thin structure analysis, and built upon the notion of path operator from mathematical morphology.

...read moreread less

Abstract: The analysis of thin curvilinear objects in 3D images is a complex and challenging task. In this article, we introduce a new, non-linear operator, called RORPO (Ranking the Orientation Responses of Path Operators). Inspired by the multidirectional paradigm currently used in linear filtering for thin structure analysis, RORPO is built upon the notion of path operator from mathematical morphology. This operator, unlike most operators commonly used for 3D curvilinear structure analysis, is discrete, non-linear and non-local. From this new operator, two main curvilinear structure characteristics can be estimated: an intensity feature, that can be assimilated to a quantitative measure of curvilinearity; and a directional feature, providing a quantitative measure of the structure's orientation. We provide a full description of the structural and algorithmic details for computing these two features from RORPO, and we discuss computational issues. We experimentally assess RORPO by comparison with three of the most popular curvilinear structure analysis filters, namely Frangi Vesselness, Optimally Oriented Flux, and Hybrid Diffusion with Continuous Switch. In particular, we show that our method provides up to 8 percent more true positive and 50 percent less false positives than the next best method, on synthetic and real 3D images.

...read moreread less

Patent•

Employing three-dimensional (3d) data predicted from two-dimensional (2d) images using neural networks for 3d modeling applications and other applications

[...]

David Alan Gausebeck

25 Sep 2018

TL;DR: The disclosed subject matter is directed to employing machine learning models configured to predict 3D data from 2D images using deep learning techniques to derive 3DData from two-dimensional data (3D-from-2D) neural network models to derive threeD data for the two- dimensional images.

...read moreread less

Abstract: The disclosed subject matter is directed to employing machine learning models configured to predict 3D data from 2D images using deep learning techniques to derive 3D data for the 2D images. In some embodiments, a method is provided that comprises receiving, by a system operatively coupled to a processor, a two-dimensional image, and determining, by the system, auxiliary data for the two-dimensional image, wherein the auxiliary data comprises orientation information regarding a capture orientation of the two-dimensional image. The method further comprises, deriving, by the system, three-dimensional information for the two-dimensional image using one or more neural network models configured to infer the three-dimensional information based on the two-dimensional image and the auxiliary data

...read moreread less

Journal Article•DOI•

Automated Aerial Triangulation for UAV-Based Mapping

[...]

Fangning He, Tian Zhou, Weifeng Xiong, Seyyed Meghdad Hasheminnasab, Ayman Habib - Show less +1 more

04 Dec 2018-Remote Sensing

TL;DR: The derived experimental results demonstrate the superior performance of the proposed framework in providing an accurate 3D model, especially when dealing with acquired UAV images containing repetitive pattern and significant image distortions.

...read moreread less

Abstract: Accurate 3D reconstruction/modelling from unmanned aerial vehicle (UAV)-based imagery has become the key prerequisite in various applications. Although current commercial software has automated the process of image-based reconstruction, a transparent system, which can be incorporated with different user-defined constraints, is still preferred by the photogrammetric research community. In this regard, this paper presents a transparent framework for the automated aerial triangulation of UAV images. The proposed framework is conducted in three steps. In the first step, two approaches, which take advantage of prior information regarding the flight trajectory, are implemented for reliable relative orientation recovery. Then, initial recovery of image exterior orientation parameters (EOPs) is achieved through either an incremental or global approach. Finally, a global bundle adjustment involving Ground Control Points (GCPs) and check points is carried out to refine all estimated parameters in the defined mapping coordinate system. Four real image datasets, which are acquired by two different UAV platforms, have been utilized to evaluate the feasibility of the proposed framework. In addition, a comparative analysis between the proposed framework and the existing commercial software is performed. The derived experimental results demonstrate the superior performance of the proposed framework in providing an accurate 3D model, especially when dealing with acquired UAV images containing repetitive pattern and significant image distortions.

...read moreread less

Journal Article•DOI•

Reliable facial expression recognition for multi-scale images using weber local binary image based cosine transform features

[...]

Sajid Ali Khan¹, Ayyaz Hussain², Muhammad Usman¹•Institutions (2)

Shaheed Zulfiqar Ali Bhutto Institute of Science and Technology¹, International Islamic University, Islamabad²

01 Jan 2018-Multimedia Tools and Applications

TL;DR: A novel technique called Weber Local Binary Image Cosine Transform (WLBI-CT) extracts and integrates the frequency components of images obtained through Weber local descriptor and local binary descriptor that help in accurate classification of various facial expressions in the challenging domain of multi-scale and multi-orientation facial images.

...read moreread less

Abstract: Accurate recognition of facial expression is a challenging problem especially from multi-scale and multi orientation face images. In this article, we propose a novel technique called Weber Local Binary Image Cosine Transform (WLBI-CT). WLBI-CT extracts and integrates the frequency components of images obtained through Weber local descriptor and local binary descriptor. These frequency components help in accurate classification of various facial expressions in the challenging domain of multi-scale and multi-orientation facial images. Identification of significant feature set plays a vital role in the success of any facial expression recognition system. Effect of multiple feature sets with varying block sizes has been investigated using different multi-scale images taken from well-known JAFEE, MMI and CK+ datasets. Extensive experimentation has been performed to demonstrate that the proposed technique outperforms the contemporary techniques in terms of recognition rate and computational time.

...read moreread less

Proceedings Article•DOI•

Automatic 3D Indoor Scene Modeling from Single Panorama

[...]

Yang Yang, Shi Jin¹, Ruiyang Liu¹, Sing Bing Kang², Jingyi Yu - Show less +1 more•Institutions (2)

ShanghaiTech University¹, Microsoft²

18 Jun 2018

TL;DR: A system that automatically extracts 3D geometry of an indoor scene from a single 2D panorama and uses the recovered layout to guide shape estimation of the remaining objects using their normal information is described.

...read moreread less

Abstract: We describe a system that automatically extracts 3D geometry of an indoor scene from a single 2D panorama. Our system recovers the spatial layout by finding the floor, walls, and ceiling; it also recovers shapes of typical indoor objects such as furniture. Using sampled perspective sub-views, we extract geometric cues (lines, vanishing points, orientation map, and surface normals) and semantic cues (saliency and object detection information). These cues are used for ground plane estimation and occlusion reasoning. The global spatial layout is inferred through a constraint graph on line segments and planar superpixels. The recovered layout is then used to guide shape estimation of the remaining objects using their normal information. Experiments on synthetic and real datasets show that our approach is state-of-the-art in both accuracy and efficiency. Our system can handle cluttered scenes with complex geometry that are challenging to existing techniques.

...read moreread less

Book Chapter•DOI•

Automatic View Planning with Multi-scale Deep Reinforcement Learning Agents

[...]

Amir Alansary¹, Loic Le Folgoc¹, Ghislain Vaillant¹, Ozan Oktay¹, Yuanwei Li¹, Wenjia Bai¹, Jonathan Passerat-Palmbach¹, Ricardo Guerrero¹, Konstantinos Kamnitsas¹, Benjamin Hou¹, Steven McDonagh¹, Ben Glocker¹, Bernhard Kainz¹, Daniel Rueckert¹ - Show less +10 more•Institutions (1)

Imperial College London¹

16 Sep 2018

TL;DR: In this article, a multi-scale RL agent framework was employed to find standardized view planes in 3D image acquisitions, which can be used to mimic experienced operators and achieve an accuracy of 1.53 mm, 1.98 mm and 4.84 mm.

...read moreread less

Abstract: We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53 mm, 1.98 mm and 4.84 mm, respectively.

...read moreread less

Collapse