scispace - formally typeset
Search or ask a question

Showing papers on "Orientation (computer vision) published in 2019"


Journal ArticleDOI
TL;DR: A high-level overview of the features of the MRtrix3 framework and general-purpose image processing applications provided with the software is provided.

1,228 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: Li et al. as mentioned in this paper used 3D Part Orientation Fields (POFs) to encode the 3D orientations of all body parts in the common 2D image space, and predicted POFs by a Fully Convolutional Network, along with the joint confidence maps.
Abstract: We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image space. POFs are predicted by a Fully Convolutional Network, along with the joint confidence maps. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. We leverage a 3D deformable human model to reconstruct total body pose from the CNN outputs with the aid of the pose and shape prior in the model. We also present a texture-based tracking method to obtain temporally coherent motion capture output. We perform thorough quantitative evaluations including comparison with the existing body-specific and hand-specific methods, and performance analysis on camera viewpoint and human pose changes. Finally, we demonstrate the results of our total body motion capture on various challenging in-the-wild videos.

294 citations


Journal ArticleDOI
TL;DR: In this paper, an orientation-based random waypoint (ORWP) mobility model is proposed by considering the random orientation of the UE during the user's movement, and the performance of ORWP is assessed on the handover rate.
Abstract: Light-fidelity (LiFi) is a networked optical wireless communication (OWC) solution for high-speed indoor connectivity for fixed and mobile optical communications. Unlike conventional radio frequency wireless systems, the OWC channel is not isotropic, meaning that the device orientation affects the channel gain significantly, particularly for mobile users. However, due to the lack of a proper model for device orientation, many studies have assumed that the receiver is vertically upward and fixed. In this paper, a novel model for device orientation based on experimental measurements of 40 participants has been proposed. It is shown that the probability density function (PDF) of the polar angle can be modeled either based on a Laplace (for static users) or a Gaussian (for mobile users) distribution. In addition, a closed-form expression is obtained for the PDF of the cosine of the incidence angle based on which the line-of-sight (LOS) channel gain is described in OWC channels. An approximation of this PDF based on the truncated Laplace is proposed and the accuracy of this approximation is confirmed by the Kolmogorov–Smirnov distance. Moreover, the statistics of the LOS channel gain are calculated and the random orientation of a user equipment (UE) is modeled as a random process. The influence of the random orientation on signal-to-noise-ratio performance of OWC systems has been evaluated. Finally, an orientation-based random waypoint (ORWP) mobility model is proposed by considering the random orientation of the UE during the user’s movement. The performance of ORWP is assessed on the handover rate and it is shown that it is important to take the random orientation into account.

130 citations


Proceedings ArticleDOI
15 Jun 2019
TL;DR: OriCNN as discussed by the authors proposes a Siamese network to explicitly encode the orientation (i.e., spherical directions) of each pixel of the images, which significantly boosts the discriminative power of the learned deep features, leading to much higher recall and precision outperforming all previous methods.
Abstract: This paper studies image-based geo-localization (IBL) problem using ground-to-aerial cross-view matching. The goal is to predict the spatial location of a ground-level query image by matching it to a large geotagged aerial image database (e.g., satellite imagery). This is a challenging task due to the drastic differences in their viewpoints and visual appearances. Existing deep learning methods for this problem have been focused on maximizing feature similarity between spatially close-by image pairs, while minimizing other images pairs which are far apart. They do so by deep feature embedding based on visual appearance in those ground-and-aerial images. However, in everyday life, humans commonly use orientation information as an important cue for the task of spatial localization. Inspired by this insight, this paper proposes a novel method which endows deep neural networks with the `commonsense' of orientation. Given a ground-level spherical panoramic image as query input (and a large georeferenced satellite image database), we design a Siamese network which explicitly encodes the orientation (i.e., spherical directions) of each pixel of the images. Our method significantly boosts the discriminative power of the learned deep features, leading to a much higher recall and precision outperforming all previous methods. Our network is also more compact using only 1/5th number of parameters than a previously best-performing network. To evaluate the generalization of our method, we also created a large-scale cross-view localization benchmark containing 100K geotagged ground-aerial pairs covering a city. Our codes and datasets are available at https://github.com/Liumouliu/OriCNN.

124 citations


Proceedings ArticleDOI
Lijie Liu1, Jiwen Lu1, Chunjing Xu2, Qi Tian2, Jie Zhou1 
01 Jun 2019
TL;DR: Zhang et al. as mentioned in this paper proposed to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively.
Abstract: In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively. Different from most existing monocular frameworks which use tight constraint to get 3D location, our approach achieves high-precision localization through measuring the visual fitting degree between the projected 3D proposals and the object. We first regress the dimension and orientation of the object using an anchor-based method so that a suitable 3D proposal can be constructed. We propose FQNet, which can infer the 3D IoU between the 3D proposals and the object solely based on 2D cues. Therefore, during the detection process, we sample a large number of candidates in the 3D space and project these 3D bounding boxes on 2D image individually. The best candidate can be picked out by simply exploring the spatial overlap between proposals and the object, in the form of the output 3D IoU score of FQNet. Experiments on the KITTI dataset demonstrate the effectiveness of our framework.

122 citations


Proceedings ArticleDOI
01 Jan 2019
TL;DR: ToDayGAN as mentioned in this paper uses a modified image-translation model to alter nighttime driving images to a more useful daytime representation, and then compares the translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image.
Abstract: Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query’s pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings.In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN – a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

114 citations


Journal ArticleDOI
TL;DR: An end-to-end multiscale visual attention networks (MS-VANs) method that outperforms several state-of-the-art approaches in remote sensing applications and uses skip-connected encoder–decoder model to extract multiscales features from a full-size image.
Abstract: Object detection plays an active role in remote sensing applications. Recently, deep convolutional neural network models have been applied to automatically extract features, generate region proposals, and predict corresponding object class. However, these models face new challenges in VHR remote sensing images due to the orientation and scale variations and the cluttered background. In this letter, we propose an end-to-end multiscale visual attention networks (MS-VANs) method. We use skip-connected encoder–decoder model to extract multiscale features from a full-size image. For feature maps in each scale, we learn a visual attention network, which is followed by a classification branch and a regression branch, so as to highlight the features from object region and suppress the cluttered background. We train the MS-VANs model by a hybrid loss function which is a weighted sum of attention loss, classification loss, and regression loss. Experiments on a combined data set consisting of Dataset for Object Detection in Aerial Images and NWPU VHR-10 show that the proposed method outperforms several state-of-the-art approaches.

107 citations


Proceedings ArticleDOI
05 Apr 2019
TL;DR: In this paper, a U-Net is used to reconstruct color images of the scene from a 3D point cloud using color and SIFT descriptors, which can reveal scene appearance and compromise privacy.
Abstract: Many 3D vision systems localize cameras within a scene using 3D point clouds. Such point clouds are often obtained using structure from motion (SfM), after which the images are discarded to preserve privacy. In this paper, we show, for the first time, that such point clouds retain enough information to reveal scene appearance and compromise privacy. We present a privacy attack that reconstructs color images of the scene from the point cloud. Our method is based on a cascaded U-Net that takes as input, a 2D multichannel image of the points rendered from a specific viewpoint containing point depth and optionally color and SIFT descriptors and outputs a color image of the scene from that viewpoint. Unlike previous feature inversion methods, we deal with highly sparse and irregular 2D point distributions and inputs where many point attributes are missing, namely keypoint orientation and scale, the descriptor image source and the 3D point visibility. We evaluate our attack algorithm on public datasets and analyze the significance of the point cloud attributes. Finally, we show that novel views can also be generated thereby enabling compelling virtual tours of the underlying scene.

106 citations


Posted Content
Lijie Liu1, Jiwen Lu1, Chunjing Xu1, Qi Tian2, Jie Zhou2 
TL;DR: A deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively, and proposes FQNet, which can infer the 3D IoU between the3D proposals and the object solely based on 2D cues.
Abstract: In this paper, we propose to learn a deep fitting degree scoring network for monocular 3D object detection, which aims to score fitting degree between proposals and object conclusively. Different from most existing monocular frameworks which use tight constraint to get 3D location, our approach achieves high-precision localization through measuring the visual fitting degree between the projected 3D proposals and the object. We first regress the dimension and orientation of the object using an anchor-based method so that a suitable 3D proposal can be constructed. We propose FQNet, which can infer the 3D IoU between the 3D proposals and the object solely based on 2D cues. Therefore, during the detection process, we sample a large number of candidates in the 3D space and project these 3D bounding boxes on 2D image individually. The best candidate can be picked out by simply exploring the spatial overlap between proposals and the object, in the form of the output 3D IoU score of FQNet. Experiments on the KITTI dataset demonstrate the effectiveness of our framework.

87 citations


Journal ArticleDOI
TL;DR: The orientation of a robot is directly estimated using the direction of the vanishing point using a forward-viewing monocular vision sensor to be applicable in real time on a low-cost embedded system for indoor service robots.
Abstract: This paper presents a new implementation method for efficient simultaneous localization and mapping using a forward-viewing monocular vision sensor. The method is developed to be applicable in real time on a low-cost embedded system for indoor service robots. In this paper, the orientation of a robot is directly estimated using the direction of the vanishing point. Then, the estimation models for the robot position and the line landmark are derived as simple linear equations. Using these models, the camera poses and landmark positions are efficiently corrected by a local map correction method. The performance of the proposed method is demonstrated under various challenging environments using dataset-based experiments using a desktop computer and real-time experiments using a low-cost embedded system. The experimental environments include a real home-like setting. These conditions contain low-textured areas, moving people, or changing environments. The proposed method is also tested using the robotics advancement through web publishing of sensorial and elaborated extensive datasets benchmark dataset.

87 citations


Journal ArticleDOI
TL;DR: TractSeg as discussed by the authors combines tract orientation mapping (TOM) with accurate segmentations of the tract outline and its start and end regions, which enables automatic creation of bundle-specific tractograms with previously unseen accuracy.

Proceedings ArticleDOI
01 May 2019
TL;DR: A static hand gesture recognition method deploying CNN was proposed and the model with augmented data achieved accuracy 97.12% which is nearly 4% higher than the model without augmentation.
Abstract: Computer is a part and parcel in our day to day life and used in various fields. The interaction of human and computer is accomplished by conventional input devices like mouse, keyboard etc. Hand gestures can be a useful medium of human-computer interaction and can make the interaction easier. Gestures vary in orientation and shape from person to person. So, non-linearity exists in this problem. Recent research has proved the supremacy of Convolutional Neural Network (CNN) for image representation and classification. Since, CNN can learn complex and non-linear relationships among images, in this paper, a static hand gesture recognition method deploying CNN was proposed. Data augmentation like re-scaling, zooming, shearing, rotation, width and height shifting was applied to the dataset. The model was trained on 8000 images and tested on 1600 images which were divided into 10 classes. The model with augmented data achieved accuracy 97.12% which is nearly 4% higher than the model without augmentation (92.87%).

Proceedings ArticleDOI
06 Mar 2019
TL;DR: The proposed method performs better than the state-of the-art methods in terms of body-parts detection accuracy and pose estimation.
Abstract: The main purpose of human body part detection is to estimate the size, orientation or position of the human body parts within the digital scene information. Estimation of various body parts of the human from an image is a critical step for several model-based systems and body-parts tracking. In this paper, body parts detection for pose estimation is implemented. During foreground silhouettes detection, the proposed method have used segmentation techniques to obtained salient region areas and skin tone detection. After successful silhouettes extraction, body parts estimation is applied by using body parts model. Five basic body key points was determined and in addition seven more body sub key points was estimated with the help of five basic body key points. The estimated key points of the body are then represented using circular marks on the original image. The experimental results over two challenging video datasets such as KTH-multiview football and UCF sports action datasets showed significant accuracies of 90.01% and 86.67. The proposed method performs better than the state-of the-art methods in terms of body-parts detection accuracy.

Journal ArticleDOI
TL;DR: This study investigates three of the available commonly used open-source solutions, namely COLMAP, OpenMVG+OpenMVS and AliceVision, evaluating their results under diverse large scale scenarios and comparing them with respect to the corresponding ground truth data.
Abstract: . State-of-the-art automated image orientation (Structure from Motion) and dense image matching (Multiple View Stereo) methods commonly used to produce 3D information from 2D images can generate 3D results – such as point cloud or meshes – of varying geometric and visual quality. Pipelines are generally robust and reliable enough, mostly capable to process even large sets of unordered images, yet the final results often lack completeness and accuracy, especially while dealing with real-world cases where objects are typically characterized by complex geometries and textureless surfaces and obstacles or occluded areas may also occur. In this study we investigate three of the available commonly used open-source solutions, namely COLMAP, OpenMVG+OpenMVS and AliceVision, evaluating their results under diverse large scale scenarios. Comparisons and critical evaluation on the image orientation and dense point cloud generation algorithms is performed with respect to the corresponding ground truth data. The presented FBK-3DOM datasets are available for research purposes.

Journal ArticleDOI
TL;DR: The proposed method fully exploits multiscale and structured prior information to conduct both accurate and efficient detection and achieves competitive performance compared with state-of-the-art methods.
Abstract: Power line detection plays an important role in an automated UAV-based electricity inspection system, which is crucial for real-time motion planning and navigation along power lines. Previous methods which adopt traditional filters and gradients may fail to capture complete power lines due to noisy backgrounds. To overcome this, we develop an accurate power line detection method using convolutional and structured features. Specifically, we first build a convolutional neural network to obtain hierarchical responses from each layer. Simultaneously, the rich feature maps are integrated to produce a fusion output, then we extract the structured information including length, width, orientation and area from the coarsest feature map. Finally, we combine the fusion output with structured information to get a result with clear background. The proposed method fully exploits multiscale and structured prior information to conduct both accurate and efficient detection. In addition, we release two power line datasets due to the scarcity in the public domain. The method is evaluated on the well-annotated power line datasets and achieves competitive performance compared with state-of-the-art methods.

Journal ArticleDOI
01 Feb 2019
TL;DR: This article used global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual three-dimensional reconstruction, and test the effect of using the resulting prior in-depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it.
Abstract: We propose using global orientation from inertial measurements, and the bias it induces on the shape of objects populating the scene, to inform visual three-dimensional reconstruction We test the effect of using the resulting prior in-depth prediction from a single image, where the normal vectors to surfaces of objects of certain classes tend to align with gravity or be orthogonal to it Adding such a prior to baseline methods for monocular depth prediction yields improvements beyond the state-of-the-art and illustrates the power of gravity as a supervisory signal

Journal ArticleDOI
TL;DR: The main aim of this paper is to provide a method to compute a signal that explains most of the variability of the data contained in each ROI before computing, for instance, time-varying connectivity.
Abstract: In the last decade, the use of high-density electrode arrays for EEG recordings combined with the improvements of source reconstruction algorithms has allowed the investigation of brain networks dynamics at a sub-second scale. One powerful tool for investigating large-scale functional brain networks with EEG is time-varying effective connectivity applied to source signals obtained from electric source imaging. Due to computational and interpretation limitations, the brain is usually parcelled into a limited number of regions of interests (ROIs) before computing EEG connectivity. One specific need and still open problem is how to represent the time- and frequency-content carried by hundreds of dipoles with diverging orientation in each ROI with one unique representative time-series. The main aim of this paper is to provide a method to compute a signal that explains most of the variability of the data contained in each ROI before computing, for instance, time-varying connectivity. As the representative time-series for a ROI, we propose to use the first singular vector computed by a singular-value decomposition of all dipoles belonging to the same ROI. We applied this method to two real datasets (visual evoked potentials and epileptic spikes) and evaluated the time-course and the frequency content of the obtained signals. For each ROI, both the time-course and the frequency content of the proposed method reflected the expected time-course and the scalp-EEG frequency content, representing most of the variability of the sources (~ 80%) and improving connectivity results in comparison to other procedures used so far. We also confirm these results in a simulated dataset with a known ground truth.

Journal ArticleDOI
TL;DR: In this article, the authors presented the development of a novel technique that is used to enhance the detection of micro cracks in solar cells using the binary and discreet Fourier transform (DFT) image processing models.

Posted Content
TL;DR: A one-stage anchor free detector for orientional object in aerial images, which is built upon a per-pixel prediction fashion detector, by developing a branch interacting module with a self-attention mechanism to fuse features from classification and box regression branchs.
Abstract: Object detection in aerial images is a challenging task due to the lack of visible features and variant orientation of objects. Significant progress has been made recently for predicting targets from aerial images with horizontal bounding boxes (HBBs) and oriented bounding boxes (OBBs) using two-stage detectors with region based convolutional neural networks (R-CNN), involving object localization in one stage and object classification in the other. However, the computational complexity in two-stage detectors is often high, especially for orientational object detection, due to anchor matching and using regions of interest (RoI) pooling for feature extraction. In this paper, we propose a one-stage anchor free detector for orientational object detection, namely, an interactive embranchment network (IENet), which is built upon a detector with prediction in per-pixel fashion. First, a novel geometric transformation is employed to better represent the oriented object in angle prediction, then a branch interactive module with a self-attention mechanism is developed to fuse features from classification and box regression branches. Finally, we introduce an enhanced intersection over union (IoU) loss for OBB detection, which is computationally more efficient than regular polygon IoU. Experiments conducted demonstrate the effectiveness and the superiority of our proposed method, as compared with state-of-the-art detectors.

Journal ArticleDOI
TL;DR: This work proposes a calibration method based on a calibration plate with protruding cylinders that can be applied to multiple cameras simultaneously on devices with limited computational power and produces a calibration error of only 0.027 mm.

Journal ArticleDOI
TL;DR: This work proposes an oriented object detection framework that is based on the single shot detector, namely, single shot anchor refinement network (S2ARN), which obtains the accurate detection results by performing two consecutive regressions.
Abstract: Object detection is a challenging task in the field of remote sensing applications due to the complex backgrounds and uncertain orientation of targets. Compared with the horizontal bounding box, the oriented bounding box can provide orientation information while retaining the true size. Most existing oriented object detection methods are based on Faster-RCNN and the other one-stage methods that can achieve real-time speed but have shortcomings in localization and detection accuracy. To further enhance the performance of one-stage methods, we propose an oriented object detection framework that is based on the single shot detector, namely, single shot anchor refinement network (S2ARN). The S2ARN obtains the accurate detection results by performing two consecutive regressions. More precisely, the multilevel features of the backbone are used to regress the coordinate offsets between the predefined rotated anchors and the ground-truth boxes to generate the refined anchors. The classification and regression subnetworks assigned to the output features are used to perform the second regression to determine the class labels and further adjust the location of the refined anchors. In addition, receptive field amplification modules (RFAMs) are inserted to enlarge the receptive field and extract more discriminative features. Furthermore, in the anchor matching step, angle-related Intersection over Union (ArIoU) is used to calculate the Intersection over Union (IoU) score instead of the traditional method. Benefiting from the multiple regressions and the insensitivity of the ArIoU score to the angle deviation, the angle sampling interval of the rotated anchor can be reduced. The experimental results for the two public datasets, HRSC2016 and UCAS-AOD, demonstrate the effectiveness of the proposed network.

Journal ArticleDOI
10 Jul 2019
TL;DR: This letter presents a method of dimensionality reduction of an optical-based tactile sensor image output using a convolutional neural network encoder structure that allows simultaneous online deployment of multiple simple perception algorithms on a common set of black-box features.
Abstract: A common approach in the field of tactile robotics is the development of a new perception algorithm for each new application of existing hardware solutions. In this letter, we present a method of dimensionality reduction of an optical-based tactile sensor image output using a convolutional neural network encoder structure. Instead of using various complex perception algorithms, and/or manually choosing task-specific data features, this unsupervised feature extraction method allows simultaneous online deployment of multiple simple perception algorithms on a common set of black-box features. The method is validated on a set of benchmarking use cases. Contact object shape, edge position, orientation, and indentation depth are estimated using shallow neural networks and machine learning models. Furthermore, a contact force estimator is trained, affirming that the extracted features contain sufficient information on both spatial and mechanical characteristics of the manipulated object.

Journal ArticleDOI
15 Jul 2019
TL;DR: SilhoNet as discussed by the authors uses a convolutional neural network pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector.
Abstract: Autonomous robot manipulation involves estimating the translation and orientation of the object to be manipulated as a 6-degree-of-freedom (6D) pose. Methods using RGB-D data have shown great success in solving this problem. However, there are situations where cost constraints or the working environment may limit the use of RGB-D sensors. When limited to monocular camera data only, the problem of object pose estimation is very challenging. In this letter, we introduce a novel method called SilhoNet that predicts 6D object pose from monocular images. We use a convolutional neural network pipeline that takes in region of interest proposals to simultaneously predict an intermediate silhouette representation for objects with an associated occlusion mask and a 3D translation vector. The 3D orientation is then regressed from the predicted silhouettes. We show that our method achieves better overall performance on the YCB-Video dataset than two networks for 6D pose estimation from monocular image input.

Journal ArticleDOI
TL;DR: A new feature descriptor named Histogram of Optical flow Magnitude and Orientation (HOMO) is introduced, which is used to train a SVM classifier for violence detection in intelligent video surveillance systems.
Abstract: Violence detection is one of the substantial and challenging topics in intelligent video surveillance systems. As there is a growing demand on video surveillance systems with the capability of automatic violence detection, we focus on existing violence detection methods to improve them. In this paper, we introduce a new feature descriptor named Histogram of Optical flow Magnitude and Orientation (HOMO). First, the proposed method converts input frames to the grayscale format. Next, it computes the optical flow between two consequence frames. Then, the optical flow magnitude and orientation of each pixel in each frame are compared separately with its predecessor frame to obtain meaningful changes of magnitude and orientation. Subsequently, different threshold values are applied to the magnitude and orientation changes for obtaining six binary indicators. Finally, these binary indicators are analyzed to get the HOMO descriptor which is used to train a SVM classifier. The system has been implemented using MATLAB. To evaluate the proposed method, two benchmark datasets have been used. The comparison of HOMO and other descriptors on benchmark datasets demonstrates satisfactory performance.

Journal ArticleDOI
TL;DR: A novel coarse-to-fine graph spectral processing approach that exploits the fact that the sharp features reside in a low dimensional structure hidden in the noisy 3D dataset, in terms of reconstruction quality and computational complexity.
Abstract: The increasing interest for reliable generation of large scale scenes and objects has facilitated several real-time applications. Although the resolution of the new generation geometry scanners are constantly improving, the output models, are inevitably noisy, requiring sophisticated approaches that remove noise while preserving sharp features. Moreover, we no longer deal exclusively with individual shapes, but with entire scenes resulting in a sequence of 3D surfaces that are affected by noise with different characteristics due to variable environmental factors (e.g., lighting conditions, orientation of the scanning device). In this work, we introduce a novel coarse-to-fine graph spectral processing approach that exploits the fact that the sharp features reside in a low dimensional structure hidden in the noisy 3D dataset. In the coarse step, the mesh is processed in parts, using a model based Bayesian learning method that identifies the noise level in each part and the subspace where the features lie. In the feature-aware fine step, we iteratively smooth face normals and vertices, while preserving geometric features. Extensive evaluation studies carried out under a broad set of complex noise patterns verify the superiority of our approach as compared to the state-of-the-art schemes, in terms of reconstruction quality and computational complexity.

Posted Content
TL;DR: This paper proposes a novel keypoint-based approach for 3D object detection and localization from a single RGB image, building a multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method.
Abstract: Monocular 3D object detection is well-known to be a challenging vision task due to the loss of depth information; attempts to recover depth using separate image-only approaches lead to unstable and noisy depth estimates, harming 3D detections. In this paper, we propose a novel keypoint-based approach for 3D object detection and localization from a single RGB image. We build our multi-branch model around 2D keypoint detection in images and complement it with a conceptually simple geometric reasoning method. Our network performs in an end-to-end manner, simultaneously and interdependently estimating 2D characteristics, such as 2D bounding boxes, keypoints, and orientation, along with full 3D pose in the scene. We fuse the outputs of distinct branches, applying a reprojection consistency loss during training. The experimental evaluation on the challenging KITTI dataset benchmark demonstrates that our network achieves state-of-the-art results among other monocular 3D detectors.

Proceedings ArticleDOI
14 Apr 2019
TL;DR: A passively adaptive perching mechanism which allows an aerial vehicle to stably attach to a variety of surfaces including tree branches and pipelines, enabled by a compliant grapple module, which passively conforms to the surface of convex perching targets.
Abstract: The application of flying systems to practical tasks is consistently limited by the poor endurance of hovering robots. The ability to perch to fixed surfaces allows a robot to gather data and inspect structures in a low power state, while retaining the access and manoeuvrability that flight offers. In this paper we present a passively adaptive perching mechanism which allows an aerial vehicle to stably attach to a variety of surfaces including tree branches and pipelines. This is enabled by a compliant grapple module, which passively conforms to the surface of convex perching targets, ensuring reliable traction and a very high load capacity (tension tested to > 60 kg in some instances) whilst still releasing effortlessly. This is due to the mechanics of the grapple, which is designed to passively tighten and attach to a variety of branch diameters and shapes. The grapple is paired with a hybrid force-motion controller which allows the cable tension to be regulated as the vehicle achieves the desired attitude. The hybrid control approach exploits the mechanical compliance of the system to ensure reliable, stable attachment to irregular natural structures, and the addition of a winch allows the robot to stably orient itself in any position or orientation relative to the branch. This approach demonstrates tensile perching using adaptive anchors. The presented subsystems can be applied to other robots where high force authority is required.

Proceedings ArticleDOI
01 Oct 2019
TL;DR: This paper proposes an adaptive cropping method based on a Difficult Region Estimation Network (DREN) to enhance the detection of the difficult targets, which allows the detector to fully exploit its performance during the testing phase.
Abstract: Detecting objects in aerial images usually faces two major challenges: (1) detecting difficult targets (e.g., small objects, objects that are interfered by the background, or various orientation of the objects, etc.); (2) the imbalance problem inherent in object detection (e.g., imbalanced quantity in different categories, imbalanced sampling method, or imbalanced loss between classification and localization, etc.). Due to these challenges, detectors are often unable to perform the most effective training and testing. In this paper, we propose a simple but effective framework to address these concerns. First, we propose an adaptive cropping method based on a Difficult Region Estimation Network (DREN) to enhance the detection of the difficult targets, which allows the detector to fully exploit its performance during the testing phase. Second, we use the well-trained DREN to generate more diverse and representative training images, which is effective in enhancing the training set. Besides, in order to alleviate the impact of imbalance during training, we add a balance module in which the IoU balanced sampling method and balanced L1 loss are adopted. Finally, we evaluate our method on two aerial image datasets. Without bells and whistles, our framework achieves 8.0 points and 3.3 points higher Average Precision (AP) than the corresponding baselines on VisDrone and UAVDT, respectively.

Journal ArticleDOI
TL;DR: An intelligent system for detecting regions to navigate a UAV when it requires an emergency landing due to technical causes is presented and Experimental results on images from different situations for safe landing detection show that the proposed system outperforms the existing systems.
Abstract: As the demand increases for the use Unmanned Aerial Vehicles (UAVs) to monitor natural disasters, protecting territories, spraying, vigilance in urban areas, etc., detecting safe landing zones becomes a new area that has gained interest. This paper presents an intelligent system for detecting regions to navigate a UAV when it requires an emergency landing due to technical causes. The proposed system explores the fact that safe regions in images have flat surfaces, which are extracted using the Gabor Transform. This results in images of different orientations. The proposed system then performs histogram operations on different Gabor-oriented images to select pixels that contribute to the highest peak, as Candidate Pixels (CP), for the respective Gabor-oriented images. Next, to group candidate pixels as one region, we explore Markov Chain Codes (MCCs), which estimate the probability of pixels being classified as candidates with neighboring pixels. This process results in Candidate Regions (CRs) detection. For each image of the respective Gabor orientation, including CRs, the proposed system finds a candidate region that has the highest area and considers it as a reference. We then estimate the degree of similarity between the reference CR with corresponding CRs in the respective Gabor-oriented images using a Chi square distance measure. Furthermore, the proposed system chooses the CR which gives the highest similarity to the reference CR to fuse with that reference, which results in the establishment of safe landing zones for the UAV. Experimental results on images from different situations for safe landing detection show that the proposed system outperforms the existing systems. Furthermore, experimental results on relative success rates for different emergency conditions of UAVs show that the proposed intelligent system is effective and useful compared to the existing UAV safe landing systems.

Journal ArticleDOI
TL;DR: A novel vessel-oriented image representation (VOIR) is proposed that can improve the machine perception of PE through a consistent, compact, and discriminative image representation, and can also improve radiologists' diagnostic capabilities for PE assessment by serving as the backbone of an effective PE visualization system.