scispace - formally typeset
Search or ask a question

Showing papers on "Orientation (computer vision) published in 2018"


Journal ArticleDOI
06 Oct 2018-Sensors
TL;DR: An improved sparse convolution method for Voxel-based 3D convolutional networks is investigated, which significantly increases the speed of both training and inference and introduces a new form of angle loss regression to improve the orientation estimation performance.
Abstract: LiDAR-based or RGB-D-based object detection is used in numerous applications, ranging from autonomous driving to robot vision. Voxel-based 3D convolutional networks have been used for some time to enhance the retention of information when processing point cloud LiDAR data. However, problems remain, including a slow inference speed and low orientation estimation performance. We therefore investigate an improved sparse convolution method for such networks, which significantly increases the speed of both training and inference. We also introduce a new form of angle loss regression to improve the orientation estimation performance and a new data augmentation approach that can enhance the convergence speed and performance. The proposed network produces state-of-the-art results on the KITTI 3D object detection benchmarks while maintaining a fast inference speed.

1,624 citations


Proceedings ArticleDOI
01 Jun 2018
TL;DR: The Dataset for Object Detection in Aerial Images (DOTA) as discussed by the authors is a large-scale dataset of aerial images collected from different sensors and platforms and contains objects exhibiting a wide variety of scales, orientations, and shapes.
Abstract: Object detection is an important and challenging problem in computer vision. Although the past decade has witnessed major advances in object detection in natural scenes, such successes have been slow to aerial imagery, not only because of the huge variation in the scale, orientation and shape of the object instances on the earth's surface, but also due to the scarcity of well-annotated datasets of objects in aerial scenes. To advance object detection research in Earth Vision, also known as Earth Observation and Remote Sensing, we introduce a large-scale Dataset for Object deTection in Aerial images (DOTA). To this end, we collect 2806 aerial images from different sensors and platforms. Each image is of the size about 4000 A— 4000 pixels and contains objects exhibiting a wide variety of scales, orientations, and shapes. These DOTA images are then annotated by experts in aerial image interpretation using 15 common object categories. The fully annotated DOTA images contains 188, 282 instances, each of which is labeled by an arbitrary (8 d.o.f.) quadrilateral. To build a baseline for object detection in Earth Vision, we evaluate state-of-the-art object detection algorithms on DOTA. Experiments demonstrate that DOTA well represents real Earth Vision applications and are quite challenging.

1,502 citations


Book ChapterDOI
08 Sep 2018
TL;DR: This work proposes a real-time RGB-based pipeline for object detection and 6D pose estimation based on a variant of the Denoising Autoencoder trained on simulated views of a 3D model using Domain Randomization.
Abstract: We propose a real-time RGB-based pipeline for object detection and 6D pose estimation. Our novel 3D orientation estimation is based on a variant of the Denoising Autoencoder that is trained on simulated views of a 3D model using Domain Randomization.

549 citations


Journal ArticleDOI
Yi Li1, Gu Wang1, Xiangyang Ji1, Yu Xiang2, Dieter Fox2 
TL;DR: A novel deep neural network for 6D pose matching named DeepIM is proposed, trained to predict a relative pose transformation using a disentangled representation of 3D location and 3D orientation and an iterative training process.
Abstract: Estimating the 6D pose of objects from images is an important problem in various applications such as robot manipulation and virtual reality. While direct regression of images to object poses has limited accuracy, matching rendered images of an object against the observed image can produce accurate results. In this work, we propose a novel deep neural network for 6D pose matching named DeepIM. Given an initial pose estimation, our network is able to iteratively refine the pose by matching the rendered image against the observed image. The network is trained to predict a relative pose transformation using an untangled representation of 3D location and 3D orientation and an iterative training process. Experiments on two commonly used benchmarks for 6D pose estimation demonstrate that DeepIM achieves large improvements over state-of-the-art methods. We furthermore show that DeepIM is able to match previously unseen objects.

220 citations


Book ChapterDOI
02 Dec 2018
TL;DR: Zhang et al. as discussed by the authors proposed a new method consisting of a joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features, which are fed into rotation-based region proposal and region of interest networks to produce object detections.
Abstract: Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (\(+6\)% and \(+12\)% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.

158 citations


Journal ArticleDOI
TL;DR: The experimental results show that the proposed OS-SIFT algorithm gives a robust registration result for optical-to-SAR images and outperforms other state-of-the-art algorithms in terms of registration accuracy.
Abstract: Although the scale-invariant feature transform (SIFT) algorithm has been successfully applied to both optical image registration and synthetic aperture radar (SAR) image registration, SIFT-like algorithms have failed to register high-resolution (HR) optical and SAR images due to large geometric differences and intensity differences. In this paper, to perform optical-to-SAR (OS) image registration, we proposed an advanced SIFT-like algorithm (OS-SIFT) that consists of three main modules: keypoint detection in two Harris scale spaces, orientation assignment and descriptor extraction, and keypoint matching. Considering the inherent properties of SAR images and optical images, the multiscale ratio of exponentially weighted averages and multiscale Sobel operators are used to calculate consistent gradients for the SAR images and optical images on the basis of which, as a result, two Harris scale spaces can be constructed. Keypoints are detected by finding the local maxima in the scale space followed by a localization refinement method based on the spatial relationship of the keypoints. Moreover, gradient location orientation histogram-like descriptors are extracted using multiple image patches to increase the distinctiveness. The experimental results on simulated images and several HR satellite images show that the proposed OS-SIFT algorithm gives a robust registration result for optical-to-SAR images and outperforms other state-of-the-art algorithms in terms of registration accuracy.

145 citations


Journal ArticleDOI
TL;DR: This architecture represents a substantive advancement over prior approaches, with implications for biomedical image segmentation more generally, as well as for convolutional neural network architecture in general.

111 citations


Posted Content
TL;DR: This paper proposes ToDayGAN – a modified image-translation model to alter nighttime driving images to a more useful daytime representation, and improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.
Abstract: Visual localization is a key step in many robotics pipelines, allowing the robot to (approximately) determine its position and orientation in the world. An efficient and scalable approach to visual localization is to use image retrieval techniques. These approaches identify the image most similar to a query photo in a database of geo-tagged images and approximate the query's pose via the pose of the retrieved database image. However, image retrieval across drastically different illumination conditions, e.g. day and night, is still a problem with unsatisfactory results, even in this age of powerful neural models. This is due to a lack of a suitably diverse dataset with true correspondences to perform end-to-end learning. A recent class of neural models allows for realistic translation of images among visual domains with relatively little training data and, most importantly, without ground-truth pairings. In this paper, we explore the task of accurately localizing images captured from two traversals of the same area in both day and night. We propose ToDayGAN - a modified image-translation model to alter nighttime driving images to a more useful daytime representation. We then compare the daytime and translated night images to obtain a pose estimate for the night image using the known 6-DOF position of the closest day image. Our approach improves localization performance by over 250% compared the current state-of-the-art, in the context of standard metrics in multiple categories.

102 citations


Journal ArticleDOI
TL;DR: In this paper, a thoroughgoing review focusing on the recent applications of borehole image logs for sedimentological and structural description and interpretation, and aims to establish image log facies which can provide guidelines in sedimentary reservoir interpretation.

101 citations


Posted Content
TL;DR: This work presents the first method to capture the 3D total motion of a target person from a monocular view input, and leverages a 3D deformable human model to reconstruct total body pose from the CNN outputs with the aid of the pose and shape prior in the model.
Abstract: We present the first method to capture the 3D total motion of a target person from a monocular view input. Given an image or a monocular video, our method reconstructs the motion from body, face, and fingers represented by a 3D deformable mesh model. We use an efficient representation called 3D Part Orientation Fields (POFs), to encode the 3D orientations of all body parts in the common 2D image space. POFs are predicted by a Fully Convolutional Network (FCN), along with the joint confidence maps. To train our network, we collect a new 3D human motion dataset capturing diverse total body motion of 40 subjects in a multiview system. We leverage a 3D deformable human model to reconstruct total body pose from the CNN outputs by exploiting the pose and shape prior in the model. We also present a texture-based tracking method to obtain temporally coherent motion capture output. We perform thorough quantitative evaluations including comparison with the existing body-specific and hand-specific methods, and performance analysis on camera viewpoint and human pose changes. Finally, we demonstrate the results of our total body motion capture on various challenging in-the-wild videos. Our code and newly collected human motion dataset will be publicly shared.

100 citations


Journal ArticleDOI
TL;DR: This study presents a new image-based indoor localization method using building information modeling (BIM) and convolutional neural networks (CNNs) that constructs a dataset with rendered BIM images and searches the dataset for images most similar to indoor photographs, thereby estimating the indoor position and orientation of the photograph.

Journal ArticleDOI
TL;DR: A vision-based approach is used to build a dynamic hand gesture recognition system and it is concluded that classifier fusion provides satisfactory results compared to other individual classifiers.
Abstract: In this work, a vision-based approach is used to build a dynamic hand gesture recognition system. Various challenges such as complicated background, change in illumination and occlusion make the detection and tracking of hand difficult in any vision-based approaches. To overcome such challenges, a hand detection technique is developed by combining three-frame differencing and skin filtering. The three-frame differencing is performed for both colored and grayscale frames. The hand is then tracked using modified Kanade---Lucas---Tomasi feature tracker where the features were selected using the compact criteria. Velocity and orientation information were added to remove the redundant feature points. Finally, color cue information is used to locate the final hand region in the tracked region. During the feature extraction, 44 features were selected from the existing literatures. Using all the features could lead to overfitting, information redundancy and dimension disaster. Thus, a system with optimal features was selected using analysis of variance combined with incremental feature selection. These selected features were then fed as an input to the ANN, SVM and kNN model. These individual classifiers were combined to produce classifier fusion model. Fivefold cross-validation has been used to evaluate the performance of the proposed model. Based on the experimental results, it may be concluded that classifier fusion provides satisfactory results (92.23 %) compared to other individual classifiers. One-way analysis of variance test, Friedman's test and Kruskal---Wallis test have also been conducted to validate the statistical significance of the results.

Journal ArticleDOI
TL;DR: A data-driven, learning-based approach trained on a very large dataset that estimates reflectance and illumination information from a single image depicting a single-material specular object from a given class under natural illumination is presented.
Abstract: In this paper, we present a method that estimates reflectance and illumination information from a single image depicting a single-material specular object from a given class under natural illumination. We follow a data-driven, learning-based approach trained on a very large dataset, but in contrast to earlier work we do not assume one or more components (shape, reflectance, or illumination) to be known. We propose a two-step approach, where we first estimate the object’s reflectance map, and then further decompose it into reflectance and illumination. For the first step, we introduce a Convolutional Neural Network (CNN) that directly predicts a reflectance map from the input image itself, as well as an indirect scheme that uses additional supervision, first estimating surface orientation and afterwards inferring the reflectance map using a learning-based sparse data interpolation technique. For the second step, we suggest a CNN architecture to reconstruct both Phong reflectance parameters and high-resolution spherical illumination maps from the reflectance map. We also propose new datasets to train these CNNs. We demonstrate the effectiveness of our approach for both steps by extensive quantitative and qualitative evaluation in both synthetic and real data as well as through numerous applications, that show improvements over the state-of-the-art.

Journal ArticleDOI
TL;DR: This paper proposes the first method in the literature able to extract the coordinates of the pores from touch-based, touchless, and latent fingerprint images, and uses specifically designed and trained Convolutional Neural Networks to estimate and refine the centroid of each pore.

Posted Content
TL;DR: This work proposes a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features that can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations.
Abstract: Automatic multi-class object detection in remote sensing images in unconstrained scenarios is of high interest for several applications including traffic monitoring and disaster management. The huge variation in object scale, orientation, category, and complex backgrounds, as well as the different camera sensors pose great challenges for current algorithms. In this work, we propose a new method consisting of a novel joint image cascade and feature pyramid network with multi-size convolution kernels to extract multi-scale strong and weak semantic features. These features are fed into rotation-based region proposal and region of interest networks to produce object detections. Finally, rotational non-maximum suppression is applied to remove redundant detections. During training, we minimize joint horizontal and oriented bounding box loss functions, as well as a novel loss that enforces oriented boxes to be rectangular. Our method achieves 68.16% mAP on horizontal and 72.45% mAP on oriented bounding box detection tasks on the challenging DOTA dataset, outperforming all published methods by a large margin (+6% and +12% absolute improvement, respectively). Furthermore, it generalizes to two other datasets, NWPU VHR-10 and UCAS-AOD, and achieves competitive results with the baselines even when trained on DOTA. Our method can be deployed in multi-class object detection applications, regardless of the image and object scales and orientations, making it a great choice for unconstrained aerial and satellite imagery.

Journal ArticleDOI
TL;DR: It is shown that the CNN solution is able to automatically learn location patterns, thus significantly lower the workforce burden of designing a localization system and achieves an accuracy of about 1 m under different smartphone orientations, users, and use patterns.
Abstract: Wi-Fi and magnetic field fingerprinting have been a hot topic in indoor positioning researches because of their ubiquity and location-related features. Wi-Fi signals can provide rough initial positions, and magnetic fields can further improve the positioning accuracies, therefore many researchers have tried to combine the two signals for high-accuracy indoor localization. Currently, state-of-the-art solutions design separate algorithms to process different indoor signals. Outputs of these algorithms are generally used as inputs of data fusion strategies. These methods rely on computationally expensive particle filters, labor-intensive feature analysis, and time-consuming parameter tuning to achieve better accuracies. Besides, particle filters need to estimate the moving directions of particles, limiting smartphone orientation to be stable, and aligned with the user’s moving directions. In this paper, we adopted a convolutional neural network (CNN) to implement an accurate and orientation-free positioning system. Inspired by the state-of-the-art image classification methods, we design a novel hybrid location image using Wi-Fi and magnetic field fingerprints, and then a CNN is employed to classify the locations of the fingerprint images. In order to prevent the overfitting problem of the positioning CNN on limited training datasets, we also propose to divide the learning process into two steps to adopt proper learning strategies for different network branches. We show that the CNN solution is able to automatically learn location patterns, thus significantly lower the workforce burden of designing a localization system. Our experimental results convincingly reveal that the proposed positioning method achieves an accuracy of about 1 m under different smartphone orientations, users, and use patterns.

Journal ArticleDOI
29 Aug 2018-Sensors
TL;DR: A semantic aggregation method which fuses features in a top-down way and can provide abundant location and semantic information, which is helpful for classification and location is developed.
Abstract: Ship detection and angle estimation in SAR images play an important role in marine surveillance Previous works have detected ships first and estimated their orientations second This is time-consuming and tedious In order to solve the problems above, we attempt to combine these two tasks using a convolutional neural network so that ships may be detected and their orientations estimated simultaneously The proposed method is based on the original SSD (Single Shot Detector), but using a rotatable bounding box This method can learn and predict the class, location, and angle information of ships using only one forward computation The generated oriented bounding box is much tighter than the traditional bounding box and is robust to background disturbances We develop a semantic aggregation method which fuses features in a top-down way This method can provide abundant location and semantic information, which is helpful for classification and location We adopt the attention module for the six prediction layers It can adaptively select meaningful features and neglect weak ones This is helpful for detecting small ships Multi-orientation anchors are designed with different sizes, aspect ratios, and orientations These can consider both speed and accuracy Angular regression is embedded into the existing bounding box regression module, and thus the angle prediction is output with the position and score, without requiring too many extra computations The loss function with angular regression is used for optimizing the model AAP (average angle precision) is used for evaluating the performance The experiments on the dataset demonstrate the effectiveness of our method

Journal ArticleDOI
Fuxi Jia1, Cunzhao Shi1, Kun He1, Chunheng Wang1, Baihua Xiao1 
TL;DR: The structural symmetric pixels (SSPs) are utilized to calculate the local threshold in neighborhood and the voting result of multiple thresholds will determine whether one pixel belongs to the foreground or not and an adaptive global threshold selection algorithm is proposed.

Journal ArticleDOI
TL;DR: In this paper, a real-time G 2 continuous local smoothing method was proposed by replacing the corners of tool position and tool orientation paths with cubic B-splines, which can be implemented on-line and integrated into a self-developed open-architecture CNC system.
Abstract: Five-axis linear toolpaths (or G 01 blocks) are widely used in CNC machine tools. The tangential and curvature discontinuities at the corners of linear toolpath result in feed fluctuation and deteriorate the machining efficiency and quality. Several methods have been proposed to locally smooth the corners for three-axis toolpath, but local smoothing for five-axis toolpath is still challenging due to two main difficulties: control of the tool orientation smoothing error and parameter synchronization between tool position and tool orientation. This paper proposes a real-time G 2 continuous local smoothing method by replacing the corners of tool position and tool orientation paths with cubic B-splines. The two difficulties in five-axis local smoothing are both resolved in simple and analytical ways. With a two-step method, the tool orientation smoothing error is directly and analytically constrained. By converting the remaining linear segments into B-splines, the C 1 continuity of the smooth tool position and smooth tool orientation paths is achieved. The parameter synchronization is realized by sharing the parameter of the tool orientation with that of the tool position. Compared with the existing analytical methods, the proposed method has a higher computation efficiency and a tighter tolerance in control of the orientation smoothing error. We have developed an open-source benchmark which validates the computation efficiency and error control ability of the proposed method. After smoothing and synchronization, the inserted B-spline of tool position is traversed with constant feedrate and the feedrate of the remaining linear segment is planned with jerk-bounded trajectory profile. The proposed smoothing method can be implemented on-line and has been integrated into a self-developed open-architecture CNC system. Its effectiveness for on-line generating smooth motion has been validated via simulations and experiments.

Journal ArticleDOI
TL;DR: A material-based salient object detection method which can effectively distinguish objects with similar perceived color but different spectral responses, and outperforms several existing hyperspectral salient object Detection approaches and the state-of-the-art methods proposed for RGB images.

Journal ArticleDOI
TL;DR: The multi-orientation EPIs and optimal orientation selection are proved to be effective in detecting and excluding occlusions and outperforms state-of-the-art depth estimation methods, especially near occlusion boundaries.

Journal ArticleDOI
TL;DR: Experiments and extensive comparisons show the effectiveness and the over-all superiority of the proposed LoVS descriptor and LoVS-based point cloud registration algorithm for low-quality, e.g., noise and varying data resolutions.

Journal ArticleDOI
TL;DR: An accurate and robust algorithm is developed that quantitatively compares the similarity of the observed CT artifacts with calculated artifact patterns based on the lead’s orientation marker and a geometric model of the segmented electrodes and provides highly accurate results for the orientation of the segmentsed electrodes for all angular constellations that typically occur in clinical cases.
Abstract: Background Directional deep brain stimulation (DBS) allows steering the stimulation in an axial direction which offers greater flexibility in programming. However, accurate anatomical visualization of the lead orientation is required for interpreting the observed stimulation effects and to guide programming. Objectives In this study we aimed to develop and test an accurate and robust algorithm for determining the orientation of segmented electrodes based on standard postoperative CT imaging used in DBS. Methods Orientation angles of directional leads (CartesiaTM; Boston Scientific, Marlborough, MA, USA) were determined using CT imaging. Therefore, a sequential algorithm was developed that quantitatively compares the similarity of the observed CT artifacts with calculated artifact patterns based on the lead's orientation marker and a geometric model of the segmented electrodes. Measurements of seven ground truth phantoms and three leads with 60 different configurations of lead implantation and orientation angles were analyzed for validation. Results The accuracy of the determined electrode orientation angles was -0.6 ± 1.5° (range: -5.4 to 4.2°). This accuracy proved to be sufficiently high to resolve even subtle differences between individual leads. Conclusions The presented algorithm is user independent and provides highly accurate results for the orientation of the segmented electrodes for all angular constellations that typically occur in clinical cases.

Journal ArticleDOI
TL;DR: An appearance-based pedestrian head-pose and full-body orientation prediction by employing a deep-learning mechanism and the comparison with existing state-of-the-art approaches demonstrates the effectiveness of the presented approach.

Journal ArticleDOI
TL;DR: This article introduces a new, non-linear operator, called RORPO (Ranking the Orientation Responses of Path Operators), Inspired by the multidirectional paradigm currently used in linear filtering for thin structure analysis, and built upon the notion of path operator from mathematical morphology.
Abstract: The analysis of thin curvilinear objects in 3D images is a complex and challenging task. In this article, we introduce a new, non-linear operator, called RORPO (Ranking the Orientation Responses of Path Operators). Inspired by the multidirectional paradigm currently used in linear filtering for thin structure analysis, RORPO is built upon the notion of path operator from mathematical morphology. This operator, unlike most operators commonly used for 3D curvilinear structure analysis, is discrete, non-linear and non-local. From this new operator, two main curvilinear structure characteristics can be estimated: an intensity feature, that can be assimilated to a quantitative measure of curvilinearity; and a directional feature, providing a quantitative measure of the structure's orientation. We provide a full description of the structural and algorithmic details for computing these two features from RORPO, and we discuss computational issues. We experimentally assess RORPO by comparison with three of the most popular curvilinear structure analysis filters, namely Frangi Vesselness, Optimally Oriented Flux, and Hybrid Diffusion with Continuous Switch. In particular, we show that our method provides up to 8 percent more true positive and 50 percent less false positives than the next best method, on synthetic and real 3D images.

Patent
25 Sep 2018
TL;DR: The disclosed subject matter is directed to employing machine learning models configured to predict 3D data from 2D images using deep learning techniques to derive 3DData from two-dimensional data (3D-from-2D) neural network models to derive threeD data for the two- dimensional images.
Abstract: The disclosed subject matter is directed to employing machine learning models configured to predict 3D data from 2D images using deep learning techniques to derive 3D data for the 2D images. In some embodiments, a method is provided that comprises receiving, by a system operatively coupled to a processor, a two-dimensional image, and determining, by the system, auxiliary data for the two-dimensional image, wherein the auxiliary data comprises orientation information regarding a capture orientation of the two-dimensional image. The method further comprises, deriving, by the system, three-dimensional information for the two-dimensional image using one or more neural network models configured to infer the three-dimensional information based on the two-dimensional image and the auxiliary data

Journal ArticleDOI
TL;DR: The derived experimental results demonstrate the superior performance of the proposed framework in providing an accurate 3D model, especially when dealing with acquired UAV images containing repetitive pattern and significant image distortions.
Abstract: Accurate 3D reconstruction/modelling from unmanned aerial vehicle (UAV)-based imagery has become the key prerequisite in various applications. Although current commercial software has automated the process of image-based reconstruction, a transparent system, which can be incorporated with different user-defined constraints, is still preferred by the photogrammetric research community. In this regard, this paper presents a transparent framework for the automated aerial triangulation of UAV images. The proposed framework is conducted in three steps. In the first step, two approaches, which take advantage of prior information regarding the flight trajectory, are implemented for reliable relative orientation recovery. Then, initial recovery of image exterior orientation parameters (EOPs) is achieved through either an incremental or global approach. Finally, a global bundle adjustment involving Ground Control Points (GCPs) and check points is carried out to refine all estimated parameters in the defined mapping coordinate system. Four real image datasets, which are acquired by two different UAV platforms, have been utilized to evaluate the feasibility of the proposed framework. In addition, a comparative analysis between the proposed framework and the existing commercial software is performed. The derived experimental results demonstrate the superior performance of the proposed framework in providing an accurate 3D model, especially when dealing with acquired UAV images containing repetitive pattern and significant image distortions.

Journal ArticleDOI
TL;DR: A novel technique called Weber Local Binary Image Cosine Transform (WLBI-CT) extracts and integrates the frequency components of images obtained through Weber local descriptor and local binary descriptor that help in accurate classification of various facial expressions in the challenging domain of multi-scale and multi-orientation facial images.
Abstract: Accurate recognition of facial expression is a challenging problem especially from multi-scale and multi orientation face images. In this article, we propose a novel technique called Weber Local Binary Image Cosine Transform (WLBI-CT). WLBI-CT extracts and integrates the frequency components of images obtained through Weber local descriptor and local binary descriptor. These frequency components help in accurate classification of various facial expressions in the challenging domain of multi-scale and multi-orientation facial images. Identification of significant feature set plays a vital role in the success of any facial expression recognition system. Effect of multiple feature sets with varying block sizes has been investigated using different multi-scale images taken from well-known JAFEE, MMI and CK+ datasets. Extensive experimentation has been performed to demonstrate that the proposed technique outperforms the contemporary techniques in terms of recognition rate and computational time.

Proceedings ArticleDOI
Yang Yang, Shi Jin1, Ruiyang Liu1, Sing Bing Kang2, Jingyi Yu 
18 Jun 2018
TL;DR: A system that automatically extracts 3D geometry of an indoor scene from a single 2D panorama and uses the recovered layout to guide shape estimation of the remaining objects using their normal information is described.
Abstract: We describe a system that automatically extracts 3D geometry of an indoor scene from a single 2D panorama. Our system recovers the spatial layout by finding the floor, walls, and ceiling; it also recovers shapes of typical indoor objects such as furniture. Using sampled perspective sub-views, we extract geometric cues (lines, vanishing points, orientation map, and surface normals) and semantic cues (saliency and object detection information). These cues are used for ground plane estimation and occlusion reasoning. The global spatial layout is inferred through a constraint graph on line segments and planar superpixels. The recovered layout is then used to guide shape estimation of the remaining objects using their normal information. Experiments on synthetic and real datasets show that our approach is state-of-the-art in both accuracy and efficiency. Our system can handle cluttered scenes with complex geometry that are challenging to existing techniques.

Book ChapterDOI
16 Sep 2018
TL;DR: In this article, a multi-scale RL agent framework was employed to find standardized view planes in 3D image acquisitions, which can be used to mimic experienced operators and achieve an accuracy of 1.53 mm, 1.98 mm and 4.84 mm.
Abstract: We propose a fully automatic method to find standardized view planes in 3D image acquisitions. Standard view images are important in clinical practice as they provide a means to perform biometric measurements from similar anatomical regions. These views are often constrained to the native orientation of a 3D image acquisition. Navigating through target anatomy to find the required view plane is tedious and operator-dependent. For this task, we employ a multi-scale reinforcement learning (RL) agent framework and extensively evaluate several Deep Q-Network (DQN) based strategies. RL enables a natural learning paradigm by interaction with the environment, which can be used to mimic experienced operators. We evaluate our results using the distance between the anatomical landmarks and detected planes, and the angles between their normal vector and target. The proposed algorithm is assessed on the mid-sagittal and anterior-posterior commissure planes of brain MRI, and the 4-chamber long-axis plane commonly used in cardiac MRI, achieving accuracy of 1.53 mm, 1.98 mm and 4.84 mm, respectively.