Showing papers on "Orientation (computer vision) published in 2019"

PDF

Open Access

Journal Article•DOI•

MRtrix3: A fast, flexible and open software framework for medical image processing and visualisation

[...]

Jacques-Donald Tournier¹, Robert E. Smith², Robert E. Smith³, David Raffelt², Rami Tabbara², Thijs Dhollander³, Thijs Dhollander², Maximilian Pietsch¹, Daan Christiaens¹, Ben Jeurissen⁴, Chun-Hung Yeh², Chun-Hung Yeh³, Alan Connelly³, Alan Connelly² - Show less +10 more•Institutions (4)

King's College London¹, Florey Institute of Neuroscience and Mental Health², University of Melbourne³, University of Antwerp⁴

15 Nov 2019-NeuroImage

TL;DR: A high-level overview of the features of the MRtrix3 framework and general-purpose image processing applications provided with the software is provided.

...read moreread less

1,228 citations

Proceedings Article•DOI•

Monocular Total Capture: Posing Face, Body, and Hands in the Wild

[...]

Donglai Xiang¹, Hanbyul Joo², Yaser Sheikh¹•Institutions (2)

Carnegie Mellon University¹, Facebook²

15 Jun 2019

TL;DR: Li et al. as mentioned in this paper used 3D Part Orientation Fields (POFs) to encode the 3D orientations of all body parts in the common 2D image space, and predicted POFs by a Fully Convolutional Network, along with the joint confidence maps.

...read moreread less

Abstract: We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image space. POFs are predicted by a Fully Convolutional Network, along with the joint confidence maps. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. We leverage a 3D deformable human model to reconstruct total body pose from the CNN outputs with the aid of the pose and shape prior in the model. We also present a texture-based tracking method to obtain temporally coherent motion capture output. We perform thorough quantitative evaluations including comparison with the existing body-specific and hand-specific methods, and performance analysis on camera viewpoint and human pose changes. Finally, we demonstrate the results of our total body motion capture on various challenging in-the-wild videos.

...read moreread less

294 citations

Journal Article•DOI•

Modeling the Random Orientation of Mobile Devices: Measurement, Analysis and LiFi Use Case

[...]

Mohammad Dehghani Soltani¹, Ardimas Andi Purwita¹, Zhihong Zeng¹, Harald Haas¹, Majid Safari¹ - Show less +1 more•Institutions (1)

University of Edinburgh¹

01 Mar 2019-IEEE Transactions on Communications

TL;DR: In this paper, an orientation-based random waypoint (ORWP) mobility model is proposed by considering the random orientation of the UE during the user's movement, and the performance of ORWP is assessed on the handover rate.

...read moreread less

Abstract: Light-fidelity (LiFi) is a networked optical wireless communication (OWC) solution for high-speed indoor connectivity for fixed and mobile optical communications. Unlike conventional radio frequency wireless systems, the OWC channel is not isotropic, meaning that the device orientation affects the channel gain significantly, particularly for mobile users. However, due to the lack of a proper model for device orientation, many studies have assumed that the receiver is vertically upward and fixed. In this paper, a novel model for device orientation based on experimental measurements of 40 participants has been proposed. It is shown that the probability density function (PDF) of the polar angle can be modeled either based on a Laplace (for static users) or a Gaussian (for mobile users) distribution. In addition, a closed-form expression is obtained for the PDF of the cosine of the incidence angle based on which the line-of-sight (LOS) channel gain is described in OWC channels. An approximation of this PDF based on the truncated Laplace is proposed and the accuracy of this approximation is confirmed by the Kolmogorov–Smirnov distance. Moreover, the statistics of the LOS channel gain are calculated and the random orientation of a user equipment (UE) is modeled as a random process. The influence of the random orientation on signal-to-noise-ratio performance of OWC systems has been evaluated. Finally, an orientation-based random waypoint (ORWP) mobility model is proposed by considering the random orientation of the UE during the user’s movement. The performance of ORWP is assessed on the handover rate and it is shown that it is important to take the random orientation into account.

...read moreread less

130 citations

Proceedings Article•DOI•

Lending Orientation to Neural Networks for Cross-View Geo-Localization

[...]

Liu Liu¹, Hongdong Li¹•Institutions (1)

Australian National University¹

15 Jun 2019

TL;DR: OriCNN as discussed by the authors proposes a Siamese network to explicitly encode the orientation (i.e., spherical directions) of each pixel of the images, which significantly boosts the discriminative power of the learned deep features, leading to much higher recall and precision outperforming all previous methods.

...read moreread less

Abstract: This paper studies image-based geo-localization (IBL) problem using ground-to-aerial cross-view matching. The goal is to predict the spatial location of a ground-level query image by matching it to a large geotagged aerial image database (e.g., satellite imagery). This is a challenging task due to the drastic differences in their viewpoints and visual appearances. Existing deep learning methods for this problem have been focused on maximizing feature similarity between spatially close-by image pairs, while minimizing other images pairs which are far apart. They do so by deep feature embedding based on visual appearance in those ground-and-aerial images. However, in everyday life, humans commonly use orientation information as an important cue for the task of spatial localization. Inspired by this insight, this paper proposes a novel method which endows deep neural networks with the `commonsense' of orientation. Given a ground-level spherical panoramic image as query input (and a large georeferenced satellite image database), we design a Siamese network which explicitly encodes the orientation (i.e., spherical directions) of each pixel of the images. Our method significantly boosts the discriminative power of the learned deep features, leading to a much higher recall and precision outperforming all previous methods. Our network is also more compact using only 1/5th number of parameters than a previously best-performing network. To evaluate the generalization of our method, we also created a large-scale cross-view localization benchmark containing 100K geotagged ground-aerial pairs covering a city. Our codes and datasets are available at https://github.com/Liumouliu/OriCNN.

...read moreread less

124 citations

Proceedings Article•DOI•

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection

[...]

Lijie Liu¹, Jiwen Lu¹, Chunjing Xu², Qi Tian², Jie Zhou¹ - Show less +1 more•Institutions (2)

Tsinghua University¹, Huawei²

01 Jun 2019

TL;DR: Zhang et al. as mentioned in this paper proposed to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively.

...read moreread less

Abstract: In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively. Different from most existing monocular frameworks which use tight constraint to get 3D location, our approach achieves high-precision localization through measuring the visual fitting degree between the projected 3D proposals and the object. We first regress the dimension and orientation of the object using an anchor-based method so that a suitable 3D proposal can be constructed. We propose FQNet, which can infer the 3D IoU between the 3D proposals and the object solely based on 2D cues. Therefore, during the detection process, we sample a large number of candidates in the 3D space and project these 3D bounding boxes on 2D image individually. The best candidate can be picked out by simply exploring the spatial overlap between proposals and the object, in the form of the output 3D IoU score of FQNet. Experiments on the KITTI dataset demonstrate the effectiveness of our framework.

...read moreread less

122 citations

Proceedings Article•DOI•

Night-to-Day Image Translation for Retrieval-based Localization

[...]

Asha Anoosheh¹, Torsten Sattler², Radu Timofte¹, Marc Pollefeys¹, Luc Van Gool¹ - Show less +1 more•Institutions (2)

ETH Zurich¹, Chalmers University of Technology²

01 Jan 2019

TL;DR: ToDayGAN as mentioned in this paper uses a modified image-translation model to alter nighttime driving images to a more useful daytime representation, and then compares the translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image.

...read moreread less

Abstract: Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query’s pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings.In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN – a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

...read moreread less

114 citations

Journal Article•DOI•

Multiscale Visual Attention Networks for Object Detection in VHR Remote Sensing Images

[...]

Chen Wang¹, Xiao Bai¹, Shuai Wang¹, Jun Zhou², Peng Ren³ - Show less +1 more•Institutions (3)

Beihang University¹, Griffith University², China University of Petroleum³

04 Oct 2019-IEEE Geoscience and Remote Sensing Letters

TL;DR: An end-to-end multiscale visual attention networks (MS-VANs) method that outperforms several state-of-the-art approaches in remote sensing applications and uses skip-connected encoder–decoder model to extract multiscales features from a full-size image.

...read moreread less

Abstract: Object detection plays an active role in remote sensing applications. Recently, deep convolutional neural network models have been applied to automatically extract features, generate region proposals, and predict corresponding object class. However, these models face new challenges in VHR remote sensing images due to the orientation and scale variations and the cluttered background. In this letter, we propose an end-to-end multiscale visual attention networks (MS-VANs) method. We use skip-connected encoder–decoder model to extract multiscale features from a full-size image. For feature maps in each scale, we learn a visual attention network, which is followed by a classification branch and a regression branch, so as to highlight the features from object region and suppress the cluttered background. We train the MS-VANs model by a hybrid loss function which is a weighted sum of attention loss, classification loss, and regression loss. Experiments on a combined data set consisting of Dataset for Object Detection in Aerial Images and NWPU VHR-10 show that the proposed method outperforms several state-of-the-art approaches.

...read moreread less

107 citations

Proceedings Article•DOI•

Revealing Scenes by Inverting Structure From Motion Reconstructions

[...]

Francesco Pittaluga¹, Sanjeev J. Koppal¹, Sing Bing Kang², Sudipta N. Sinha²•Institutions (2)

University of Florida¹, Microsoft²

05 Apr 2019

TL;DR: In this paper, a U-Net is used to reconstruct color images of the scene from a 3D point cloud using color and SIFT descriptors, which can reveal scene appearance and compromise privacy.

...read moreread less

Abstract: Many 3D vision systems localize cameras within a scene using 3D point clouds. Such point clouds are often obtained using structure from motion (SfM), after which the images are discarded to preserve privacy. In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. We present a privacy attack that reconstructs color images of the scene from the point cloud. Our method is based on a cascaded U-Net that takes as input, a 2D multichannel image of the points rendered from a specific viewpoint containing point depth and optionally color and SIFT descriptors and outputs a color image of the scene from that viewpoint. Unlike previous feature inversion methods, we deal with highly sparse and irregular 2D point distributions and inputs where many point attributes are missing, namely keypoint orientation and scale, the descriptor image source and the 3D point visibility. We evaluate our attack algorithm on public datasets and analyze the significance of the point cloud attributes. Finally, we show that novel views can also be generated thereby enabling compelling virtual tours of the underlying scene.

...read moreread less

106 citations

Posted Content•

Deep Fitting Degree Scoring Network for Monocular 3D Object Detection.

[...]

Lijie Liu¹, Jiwen Lu¹, Chunjing Xu¹, Qi Tian², Jie Zhou² - Show less +1 more•Institutions (2)

Tsinghua University¹, Huawei²

26 Apr 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively, and proposes FQNet, which can infer the 3D IoU between the3D proposals and the object solely based on 2D cues.

...read moreread less

87 citations

Journal Article•DOI•

A Monocular Vision Sensor-Based Efficient SLAM Method for Indoor Service Robots

[...]

Tae-Jae Lee¹, Chul-hong Kim¹, Dong-il Dan Cho¹•Institutions (1)

Seoul National University¹

01 Jan 2019-IEEE Transactions on Industrial Electronics

TL;DR: The orientation of a robot is directly estimated using the direction of the vanishing point using a forward-viewing monocular vision sensor to be applicable in real time on a low-cost embedded system for indoor service robots.

...read moreread less

Abstract: This paper presents a new implementation method for efficient simultaneous localization and mapping using a forward-viewing monocular vision sensor. The method is developed to be applicable in real time on a low-cost embedded system for indoor service robots. In this paper, the orientation of a robot is directly estimated using the direction of the vanishing point. Then, the estimation models for the robot position and the line landmark are derived as simple linear equations. Using these models, the camera poses and landmark positions are efficiently corrected by a local map correction method. The performance of the proposed method is demonstrated under various challenging environments using dataset-based experiments using a desktop computer and real-time experiments using a low-cost embedded system. The experimental environments include a real home-like setting. These conditions contain low-textured areas, moving people, or changing environments. The proposed method is also tested using the robotics advancement through web publishing of sensorial and elaborated extensive datasets benchmark dataset.

...read moreread less

87 citations

Journal Article•DOI•

Combined tract segmentation and orientation mapping for bundle-specific tractography

[...]

Jakob Wasserthal¹, Jakob Wasserthal², Peter F. Neher¹, Dusan Hirjak², Klaus H. Maier-Hein³, Klaus H. Maier-Hein¹ - Show less +2 more•Institutions (3)

German Cancer Research Center¹, Heidelberg University², University Hospital Heidelberg³

01 Dec 2019-Medical Image Analysis

TL;DR: TractSeg as discussed by the authors combines tract orientation mapping (TOM) with accurate segmentations of the tract outline and its start and end regions, which enables automatic creation of bundle-specific tractograms with previously unseen accuracy.

...read moreread less

Proceedings Article•DOI•

Static Hand Gesture Recognition using Convolutional Neural Network with Data Augmentation

[...]

Md. Zahirul Islam¹, Mohammad Shahadat Hossain¹, Raihan Ul Islam², Karl Andersson²•Institutions (2)

University of Chittagong¹, Luleå University of Technology²

01 May 2019

TL;DR: A static hand gesture recognition method deploying CNN was proposed and the model with augmented data achieved accuracy 97.12% which is nearly 4% higher than the model without augmentation.

...read moreread less

Abstract: Computer is a part and parcel in our day to day life and used in various fields. The interaction of human and computer is accomplished by conventional input devices like mouse, keyboard etc. Hand gestures can be a useful medium of human-computer interaction and can make the interaction easier. Gestures vary in orientation and shape from person to person. So, non-linearity exists in this problem. Recent research has proved the supremacy of Convolutional Neural Network (CNN) for image representation and classification. Since, CNN can learn complex and non-linear relationships among images, in this paper, a static hand gesture recognition method deploying CNN was proposed. Data augmentation like re-scaling, zooming, shearing, rotation, width and height shifting was applied to the dataset. The model was trained on 8000 images and tested on 1600 images which were divided into 10 classes. The model with augmented data achieved accuracy 97.12% which is nearly 4% higher than the model without augmentation (92.87%).

...read moreread less

Proceedings Article•DOI•

Human Body Parts Estimation and Detection for Physical Sports Movements

[...]

Ahmad Jalal¹, Amir Nadeem¹, Satoshi Bobasu²•Institutions (2)

Air University (Islamabad)¹, Guangxi Normal University²

06 Mar 2019

TL;DR: The proposed method performs better than the state-of the-art methods in terms of body-parts detection accuracy and pose estimation.

...read moreread less

Abstract: The main purpose of human body part detection is to estimate the size, orientation or position of the human body parts within the digital scene information. Estimation of various body parts of the human from an image is a critical step for several model-based systems and body-parts tracking. In this paper, body parts detection for pose estimation is implemented. During foreground silhouettes detection, the proposed method have used segmentation techniques to obtained salient region areas and skin tone detection. After successful silhouettes extraction, body parts estimation is applied by using body parts model. Five basic body key points was determined and in addition seven more body sub key points was estimated with the help of five basic body key points. The estimated key points of the body are then represented using circular marks on the original image. The experimental results over two challenging video datasets such as KTH-multiview football and UCF sports action datasets showed significant accuracies of 90.01% and 86.67. The proposed method performs better than the state-of the-art methods in terms of body-parts detection accuracy.

...read moreread less

Journal Article•DOI•

Open-source image-based 3D reconstruction pipelines: review, comparison and evaluation

[...]

E. K. Stathopoulou¹, E. K. Stathopoulou², M. Welponer¹, Fabio Remondino¹•Institutions (2)

Kessler Foundation¹, National Technical University of Athens²

29 Nov 2019-ISPRS - International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences

TL;DR: This study investigates three of the available commonly used open-source solutions, namely COLMAP, OpenMVG+OpenMVS and AliceVision, evaluating their results under diverse large scale scenarios and comparing them with respect to the corresponding ground truth data.

...read moreread less

Abstract: . State-of-the-art automated image orientation (Structure from Motion) and dense image matching (Multiple View Stereo) methods commonly used to produce 3D information from 2D images can generate 3D results – such as point cloud or meshes – of varying geometric and visual quality. Pipelines are generally robust and reliable enough, mostly capable to process even large sets of unordered images, yet the final results often lack completeness and accuracy, especially while dealing with real-world cases where objects are typically characterized by complex geometries and textureless surfaces and obstacles or occluded areas may also occur. In this study we investigate three of the available commonly used open-source solutions, namely COLMAP, OpenMVG+OpenMVS and AliceVision, evaluating their results under diverse large scale scenarios. Comparisons and critical evaluation on the image orientation and dense point cloud generation algorithms is performed with respect to the corresponding ground truth data. The presented FBK-3DOM datasets are available for research purposes.

...read moreread less

Journal Article•DOI•

Detecting Power Lines in UAV Images with Convolutional Features and Structured Constraints

[...]

Heng Zhang, Wen Yang, Huai Yu, Haijian Zhang, Gui-Song Xia - Show less +1 more

04 Jun 2019-Remote Sensing

TL;DR: The proposed method fully exploits multiscale and structured prior information to conduct both accurate and efficient detection and achieves competitive performance compared with state-of-the-art methods.

...read moreread less

Abstract: Power line detection plays an important role in an automated UAV-based electricity inspection system, which is crucial for real-time motion planning and navigation along power lines. Previous methods which adopt traditional filters and gradients may fail to capture complete power lines due to noisy backgrounds. To overcome this, we develop an accurate power line detection method using convolutional and structured features. Specifically, we first build a convolutional neural network to obtain hierarchical responses from each layer. Simultaneously, the rich feature maps are integrated to produce a fusion output, then we extract the structured information including length, width, orientation and area from the coarsest feature map. Finally, we combine the fusion output with structured information to get a result with clear background. The proposed method fully exploits multiscale and structured prior information to conduct both accurate and efficient detection. In addition, we release two power line datasets due to the scarcity in the public domain. The method is evaluated on the well-annotated power line datasets and achieves competitive performance compared with state-of-the-art methods.

...read moreread less

Journal Article•DOI•

Geo-Supervised Visual Depth Prediction

[...]

Xiaohan Fei¹, Alexander Wong¹, Stefano Soatto¹•Institutions (1)

University of California, Los Angeles¹

01 Feb 2019

TL;DR: This article used global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual three-dimensional reconstruction, and test the effect of using the resulting prior in-depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it.

...read moreread less

Abstract: We propose using global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual three-dimensional reconstruction We test the effect of using the resulting prior in-depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of gravity as a supervisory signal

...read moreread less

Journal Article•DOI•

Estimating EEG Source Dipole Orientation Based on Singular-value Decomposition for Connectivity Analysis.

[...]

Maria Rubega¹, Margherita Carboni¹, Martin Seeber¹, David Pascucci², Sébastien Tourbier³, Gianpaolo Toscano, P. van Mierlo⁴, Patric Hagmann³, Gijs Plomp², Serge Vulliemoz, Christoph M. Michel¹ - Show less +7 more•Institutions (4)

University of Geneva¹, University of Fribourg², University Hospital of Lausanne³, Ghent University⁴

01 Jul 2019-Brain Topography

TL;DR: The main aim of this paper is to provide a method to compute a signal that explains most of the variability of the data contained in each ROI before computing, for instance, time-varying connectivity.

...read moreread less

Abstract: In the last decade, the use of high-density electrode arrays for EEG recordings combined with the improvements of source reconstruction algorithms has allowed the investigation of brain networks dynamics at a sub-second scale. One powerful tool for investigating large-scale functional brain networks with EEG is time-varying effective connectivity applied to source signals obtained from electric source imaging. Due to computational and interpretation limitations, the brain is usually parcelled into a limited number of regions of interests (ROIs) before computing EEG connectivity. One specific need and still open problem is how to represent the time- and frequency-content carried by hundreds of dipoles with diverging orientation in each ROI with one unique representative time-series. The main aim of this paper is to provide a method to compute a signal that explains most of the variability of the data contained in each ROI before computing, for instance, time-varying connectivity. As the representative time-series for a ROI, we propose to use the first singular vector computed by a singular-value decomposition of all dipoles belonging to the same ROI. We applied this method to two real datasets (visual evoked potentials and epileptic spikes) and evaluated the time-course and the frequency content of the obtained signals. For each ROI, both the time-course and the frequency content of the proposed method reflected the expected time-course and the scalp-EEG frequency content, representing most of the variability of the sources (~ 80%) and improving connectivity results in comparison to other procedures used so far. We also confirm these results in a simulated dataset with a known ground truth.

...read moreread less

Journal Article•DOI•

Solar cells micro crack detection technique using state-of-the-art electroluminescence imaging

[...]

Mahmoud Dhimish¹, Violeta Holmes¹•Institutions (1)

University of Huddersfield¹

01 Dec 2019-Journal of Science: Advanced Materials and Devices

TL;DR: In this article, the authors presented the development of a novel technique that is used to enhance the detection of micro cracks in solar cells using the binary and discreet Fourier transform (DFT) image processing models.

...read moreread less

Posted Content•

IENet: Interacting Embranchment One Stage Anchor Free Detector for Orientation Aerial Object Detection.

[...]

Youtian Lin, Pengming Feng¹, Jian Guan•Institutions (1)

Harbin Engineering University¹

02 Dec 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: A one-stage anchor free detector for orientional object in aerial images, which is built upon a per-pixel prediction fashion detector, by developing a branch interacting module with a self-attention mechanism to fuse features from classification and box regression branchs.

...read moreread less

Abstract: Object detection in aerial images is a challenging task due to the lack of visible features and variant orientation of objects. Significant progress has been made recently for predicting targets from aerial images with horizontal bounding boxes (HBBs) and oriented bounding boxes (OBBs) using two-stage detectors with region based convolutional neural networks (R-CNN), involving object localization in one stage and object classification in the other. However, the computational complexity in two-stage detectors is often high, especially for orientational object detection, due to anchor matching and using regions of interest (RoI) pooling for feature extraction. In this paper, we propose a one-stage anchor free detector for orientational object detection, namely, an interactive embranchment network (IENet), which is built upon a detector with prediction in per-pixel fashion. First, a novel geometric transformation is employed to better represent the oriented object in angle prediction, then a branch interactive module with a self-attention mechanism is developed to fuse features from classification and box regression branches. Finally, we introduce an enhanced intersection over union (IoU) loss for OBB detection, which is computationally more efficient than regular polygon IoU. Experiments conducted demonstrate the effectiveness and the superiority of our proposed method, as compared with state-of-the-art detectors.

...read moreread less

Journal Article•DOI•

Multi-camera calibration for accurate geometric measurements in industrial environments

[...]

Rubén Usamentiaga¹, Daniel F. Garcia¹•Institutions (1)

University of Oviedo¹

01 Feb 2019-Measurement

TL;DR: This work proposes a calibration method based on a calibration plate with protruding cylinders that can be applied to multiple cameras simultaneously on devices with limited computational power and produces a calibration error of only 0.027 mm.

...read moreread less

Journal Article•DOI•

Single Shot Anchor Refinement Network for Oriented Object Detection in Optical Remote Sensing Imagery

[...]

Songze Bao¹, Xing Zhong¹, Zhu Ruifei¹, Xiaonan Zhang¹, Li Zhuqiang, Mengyang Li¹ - Show less +2 more•Institutions (1)

Chinese Academy of Sciences¹

24 Jun 2019-IEEE Access

TL;DR: This work proposes an oriented object detection framework that is based on the single shot detector, namely, single shot anchor refinement network (S2ARN), which obtains the accurate detection results by performing two consecutive regressions.

...read moreread less

Abstract: Object detection is a challenging task in the field of remote sensing applications due to the complex backgrounds and uncertain orientation of targets. Compared with the horizontal bounding box, the oriented bounding box can provide orientation information while retaining the true size. Most existing oriented object detection methods are based on Faster-RCNN and the other one-stage methods that can achieve real-time speed but have shortcomings in localization and detection accuracy. To further enhance the performance of one-stage methods, we propose an oriented object detection framework that is based on the single shot detector, namely, single shot anchor refinement network (S2ARN). The S2ARN obtains the accurate detection results by performing two consecutive regressions. More precisely, the multilevel features of the backbone are used to regress the coordinate offsets between the predefined rotated anchors and the ground-truth boxes to generate the refined anchors. The classification and regression subnetworks assigned to the output features are used to perform the second regression to determine the class labels and further adjust the location of the refined anchors. In addition, receptive field amplification modules (RFAMs) are inserted to enlarge the receptive field and extract more discriminative features. Furthermore, in the anchor matching step, angle-related Intersection over Union (ArIoU) is used to calculate the Intersection over Union (IoU) score instead of the traditional method. Benefiting from the multiple regressions and the insensitivity of the ArIoU score to the angle deviation, the angle sampling interval of the rotated anchor can be reduced. The experimental results for the two public datasets, HRSC2016 and UCAS-AOD, demonstrate the effectiveness of the proposed network.

...read moreread less

Journal Article•DOI•

Convolutional Autoencoder for Feature Extraction in Tactile Sensing

[...]

Marsela Polic¹, Ivona Krajacic¹, Nathan F. Lepora², Matko Orsag¹•Institutions (2)

University of Zagreb¹, University of Bristol²

10 Jul 2019

TL;DR: This letter presents a method of dimensionality reduction of an optical-based tactile sensor image output using a convolutional neural network encoder structure that allows simultaneous online deployment of multiple simple perception algorithms on a common set of black-box features.

...read moreread less

Abstract: A common approach in the field of tactile robotics is the development of a new perception algorithm for each new application of existing hardware solutions. In this letter, we present a method of dimensionality reduction of an optical-based tactile sensor image output using a convolutional neural network encoder structure. Instead of using various complex perception algorithms, and/or manually choosing task-specific data features, this unsupervised feature extraction method allows simultaneous online deployment of multiple simple perception algorithms on a common set of black-box features. The method is validated on a set of benchmarking use cases. Contact object shape, edge position, orientation, and indentation depth are estimated using shallow neural networks and machine learning models. Furthermore, a contact force estimator is trained, affirming that the extracted features contain sufficient information on both spatial and mechanical characteristics of the manipulated object.

...read moreread less

Journal Article•DOI•

SilhoNet: An RGB Method for 6D Object Pose Estimation

[...]

Gideon Billings¹, Matthew Johnson-Roberson¹•Institutions (1)

University of Michigan¹

15 Jul 2019

TL;DR: SilhoNet as discussed by the authors uses a convolutional neural network pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector.

...read moreread less

Abstract: Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this letter, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a convolutional neural network pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two networks for 6D pose estimation from monocular image input.

...read moreread less

Journal Article•DOI•

A classification method based on optical flow for violence detection

[...]

Javad Mahmoodi¹, Afsane Salajeghe¹•Institutions (1)

Islamic Azad University¹

01 Aug 2019-Expert Systems With Applications

TL;DR: A new feature descriptor named Histogram of Optical flow Magnitude and Orientation (HOMO) is introduced, which is used to train a SVM classifier for violence detection in intelligent video surveillance systems.

...read moreread less

Abstract: Violence detection is one of the substantial and challenging topics in intelligent video surveillance systems. As there is a growing demand on video surveillance systems with the capability of automatic violence detection, we focus on existing violence detection methods to improve them. In this paper, we introduce a new feature descriptor named Histogram of Optical flow Magnitude and Orientation (HOMO). First, the proposed method converts input frames to the grayscale format. Next, it computes the optical flow between two consequence frames. Then, the optical flow magnitude and orientation of each pixel in each frame are compared separately with its predecessor frame to obtain meaningful changes of magnitude and orientation. Subsequently, different threshold values are applied to the magnitude and orientation changes for obtaining six binary indicators. Finally, these binary indicators are analyzed to get the HOMO descriptor which is used to train a SVM classifier. The system has been implemented using MATLAB. To evaluate the proposed method, two benchmark datasets have been used. The comparison of HOMO and other descriptors on benchmark datasets demonstrates satisfactory performance.

...read moreread less

Journal Article•DOI•

Feature Preserving Mesh Denoising Based on Graph Spectral Processing

[...]

Gerasimos Arvanitis¹, Aris S. Lalos¹, Konstantinos Moustakas¹, Nikos Fakotakis¹•Institutions (1)

University of Patras¹

01 Mar 2019-IEEE Transactions on Visualization and Computer Graphics

TL;DR: A novel coarse-to-fine graph spectral processing approach that exploits the fact that the sharp features reside in a low dimensional structure hidden in the noisy 3D dataset, in terms of reconstruction quality and computational complexity.

...read moreread less

Abstract: The increasing interest for reliable generation of large scale scenes and objects has facilitated several real-time applications. Although the resolution of the new generation geometry scanners are constantly improving, the output models, are inevitably noisy, requiring sophisticated approaches that remove noise while preserving sharp features. Moreover, we no longer deal exclusively with individual shapes, but with entire scenes resulting in a sequence of 3D surfaces that are affected by noise with different characteristics due to variable environmental factors (e.g., lighting conditions, orientation of the scanning device). In this work, we introduce a novel coarse-to-fine graph spectral processing approach that exploits the fact that the sharp features reside in a low dimensional structure hidden in the noisy 3D dataset. In the coarse step, the mesh is processed in parts, using a model based Bayesian learning method that identifies the noise level in each part and the subspace where the features lie. In the feature-aware fine step, we iteratively smooth face normals and vertices, while preserving geometric features. Extensive evaluation studies carried out under a broad set of complex noise patterns verify the superiority of our approach as compared to the state-of-the-art schemes, in terms of reconstruction quality and computational complexity.

...read moreread less

Posted Content•

Monocular 3D Object Detection via Geometric Reasoning on Keypoints

[...]

Ivan Barabanau¹, Alexey Artemov¹, Evgeny Burnaev¹, Vyacheslav Murashkin²•Institutions (2)

Skolkovo Institute of Science and Technology¹, Yandex²

14 May 2019-arXiv: Computer Vision and Pattern Recognition

TL;DR: This paper proposes a novel keypoint-based approach for 3D object detection and localization from a single RGB image, building a multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method.

...read moreread less

Abstract: Monocular 3D object detection is well-known to be a challenging vision task due to the loss of depth information; attempts to recover depth using separate image-only approaches lead to unstable and noisy depth estimates, harming 3D detections. In this paper, we propose a novel keypoint-based approach for 3D object detection and localization from a single RGB image. We build our multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method. Our network performs in an end-to-end manner, simultaneously and interdependently estimating 2D characteristics, such as 2D bounding boxes, keypoints, and orientation, along with full 3D pose in the scene. We fuse the outputs of distinct branches, applying a reprojection consistency loss during training. The experimental evaluation on the challenging KITTI dataset benchmark demonstrates that our network achieves state-of-the-art results among other monocular 3D detectors.

...read moreread less

Proceedings Article•DOI•

A Passively Adaptive Microspine Grapple for Robust, Controllable Perching

[...]

Hai-Nguyen Nguyen¹, Robert Siddall¹, Brett Stephens¹, Alberto Navarro-Rubio¹, M. Kovac² - Show less +1 more•Institutions (2)

Imperial College London¹, Swiss Federal Laboratories for Materials Science and Technology²

14 Apr 2019

TL;DR: A passively adaptive perching mechanism which allows an aerial vehicle to stably attach to a variety of surfaces including tree branches and pipelines, enabled by a compliant grapple module, which passively conforms to the surface of convex perching targets.

...read moreread less

Abstract: The application of flying systems to practical tasks is consistently limited by the poor endurance of hovering robots. The ability to perch to fixed surfaces allows a robot to gather data and inspect structures in a low power state, while retaining the access and manoeuvrability that flight offers. In this paper we present a passively adaptive perching mechanism which allows an aerial vehicle to stably attach to a variety of surfaces including tree branches and pipelines. This is enabled by a compliant grapple module, which passively conforms to the surface of convex perching targets, ensuring reliable traction and a very high load capacity (tension tested to > 60 kg in some instances) whilst still releasing effortlessly. This is due to the mechanics of the grapple, which is designed to passively tighten and attach to a variety of branch diameters and shapes. The grapple is paired with a hybrid force-motion controller which allows the cable tension to be regulated as the vehicle achieves the desired attitude. The hybrid control approach exploits the mechanical compliance of the system to ensure reliable, stable attachment to irregular natural structures, and the addition of a winch allows the robot to stably orient itself in any position or orientation relative to the branch. This approach demonstrates tensile perching using adaptive anchors. The presented subsystems can be applied to other robots where high force authority is required.

...read moreread less

Proceedings Article•DOI•

How to Fully Exploit The Abilities of Aerial Image Detectors

[...]

Junyi Zhang¹, Junying Huang¹, Xuankun Chen¹, Dongyu Zhang¹•Institutions (1)

Sun Yat-sen University¹

01 Oct 2019

TL;DR: This paper proposes an adaptive cropping method based on a Difficult Region Estimation Network (DREN) to enhance the detection of the difficult targets, which allows the detector to fully exploit its performance during the testing phase.

...read moreread less

Abstract: Detecting objects in aerial images usually faces two major challenges: (1) detecting difficult targets (e.g., small objects, objects that are interfered by the background, or various orientation of the objects, etc.); (2) the imbalance problem inherent in object detection (e.g., imbalanced quantity in different categories, imbalanced sampling method, or imbalanced loss between classification and localization, etc.). Due to these challenges, detectors are often unable to perform the most effective training and testing. In this paper, we propose a simple but effective framework to address these concerns. First, we propose an adaptive cropping method based on a Difficult Region Estimation Network (DREN) to enhance the detection of the difficult targets, which allows the detector to fully exploit its performance during the testing phase. Second, we use the well-trained DREN to generate more diverse and representative training images, which is effective in enhancing the training set. Besides, in order to alleviate the impact of imbalance during training, we add a balance module in which the IoU balanced sampling method and balanced L1 loss are adopted. Finally, we evaluate our method on two aerial image datasets. Without bells and whistles, our framework achieves 8.0 points and 3.3 points higher Average Precision (AP) than the corresponding baselines on VisDrone and UAVDT, respectively.

...read moreread less

Journal Article•DOI•

An automatic zone detection system for safe landing of UAVs

[...]

Maryam Asadzadeh Kaljahi¹, Palaiahnakote Shivakumara¹, Mohd Yamani Idna Idris¹, Mohammad Hossein Anisi², Tong Lu³, Michael Blumenstein⁴, Noorzaily Mohamed Noor¹ - Show less +3 more•Institutions (4)

Information Technology University¹, University of Essex², Nanjing University³, University of Technology, Sydney⁴

15 May 2019-Expert Systems With Applications

TL;DR: An intelligent system for detecting regions to navigate a UAV when it requires an emergency landing due to technical causes is presented and Experimental results on images from different situations for safe landing detection show that the proposed system outperforms the existing systems.

...read moreread less

Abstract: As the demand increases for the use Unmanned Aerial Vehicles (UAVs) to monitor natural disasters, protecting territories, spraying, vigilance in urban areas, etc., detecting safe landing zones becomes a new area that has gained interest. This paper presents an intelligent system for detecting regions to navigate a UAV when it requires an emergency landing due to technical causes. The proposed system explores the fact that safe regions in images have flat surfaces, which are extracted using the Gabor Transform. This results in images of different orientations. The proposed system then performs histogram operations on different Gabor-oriented images to select pixels that contribute to the highest peak, as Candidate Pixels (CP), for the respective Gabor-oriented images. Next, to group candidate pixels as one region, we explore Markov Chain Codes (MCCs), which estimate the probability of pixels being classified as candidates with neighboring pixels. This process results in Candidate Regions (CRs) detection. For each image of the respective Gabor orientation, including CRs, the proposed system finds a candidate region that has the highest area and considers it as a reference. We then estimate the degree of similarity between the reference CR with corresponding CRs in the respective Gabor-oriented images using a Chi square distance measure. Furthermore, the proposed system chooses the CR which gives the highest similarity to the reference CR to fuse with that reference, which results in the establishment of safe landing zones for the UAV. Experimental results on images from different situations for safe landing detection show that the proposed system outperforms the existing systems. Furthermore, experimental results on relative success rates for different emergency conditions of UAVs show that the proposed intelligent system is effective and useful compared to the existing UAV safe landing systems.

...read moreread less

Journal Article•DOI•

Computer-aided detection and visualization of pulmonary embolism using a novel, compact, and discriminative image representation

[...]

Nima Tajbakhsh¹, Jae Y. Shin¹, Michael B. Gotway², Jianming Liang¹•Institutions (2)

Arizona State University¹, Mayo Clinic²

01 Dec 2019-Medical Image Analysis

TL;DR: A novel vessel-oriented image representation (VOIR) is proposed that can improve the machine perception of PE through a consistent, compact, and discriminative image representation, and can also improve radiologists' diagnostic capabilities for PE assessment by serving as the backbone of an effective PE visualization system.

...read moreread less

Collapse