scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2017"


Journal ArticleDOI
TL;DR: The central concept is to use the rich temporal information provided by events to create contexts in the form of time-surfaces which represent the recent temporal activity within a local spatial neighborhood and it is demonstrated that this concept can robustly be used at all stages of an event-based hierarchical model.
Abstract: This paper describes novel event-based spatio-temporal features called time-surfaces and how they can be used to create a hierarchical event-based pattern recognition architecture. Unlike existing hierarchical architectures for pattern recognition, the presented model relies on a time oriented approach to extract spatio-temporal features from the asynchronously acquired dynamics of a visual scene. These dynamics are acquired using biologically inspired frameless asynchronous event-driven vision sensors. Similarly to cortical structures, subsequent layers in our hierarchy extract increasingly abstract features using increasingly large spatio-temporal windows. The central concept is to use the rich temporal information provided by events to create contexts in the form of time-surfaces which represent the recent temporal activity within a local spatial neighborhood. We demonstrate that this concept can robustly be used at all stages of an event-based hierarchical model. First layer feature units operate on groups of pixels, while subsequent layer feature units operate on the output of lower level feature units. We report results on a previously published 36 class character recognition task and a four class canonical dynamic card pip task, achieving near 100 percent accuracy on each. We introduce a new seven class moving face recognition task, achieving 79 percent accuracy.

405 citations


Journal ArticleDOI
TL;DR: This paper attempts to make a contribution to the field of object recognition in X-ray testing by evaluating different computer vision strategies that have been proposed in the last years, and strongly believes that it is possible to design an automated aid for the human inspection task using these computer vision algorithms.
Abstract: X-ray screening systems have been used to safeguard environments in which access control is of paramount importance. Security checkpoints have been placed at the entrances to many public places to detect prohibited items, such as handguns and explosives. Generally, human operators are in charge of these tasks as automated recognition in baggage inspection is still far from perfect. Research and development on X-ray testing is, however, exploring new approaches based on computer vision that can be used to aid human operators. This paper attempts to make a contribution to the field of object recognition in X-ray testing by evaluating different computer vision strategies that have been proposed in the last years. We tested ten approaches. They are based on bag of words, sparse representations, deep learning, and classic pattern recognition schemes among others. For each method, we: 1) present a brief explanation; 2) show experimental results on the same database; and 3) provide concluding remarks discussing pros and cons of each method. In order to make fair comparisons, we define a common experimental protocol based on training, validation, and testing data (selected from the public ${\mathbb {GDX}}$ ray database). The effectiveness of each method was tested in the recognition of three different threat objects: 1) handguns; 2) shuriken (ninja stars); and 3) razor blades. In our experiments, the highest recognition rate was achieved by methods based on visual vocabularies and deep features with more than 95% of accuracy. We strongly believe that it is possible to design an automated aid for the human inspection task using these computer vision algorithms.

126 citations


Proceedings ArticleDOI
14 May 2017
TL;DR: This paper presents a review of various state-of-the-art deep learning-based techniques proposed for human action recognition on three types of datasets, namely, single viewpoint, multiple viewpoint and RGB-depth videos.
Abstract: Video-based human action recognition has become one of the most popular research areas in the field of computer vision and pattern recognition in recent years. It has a wide variety of applications such as surveillance, robotics, health care, video searching and human-computer interaction. There are many challenges involved in human action recognition in videos, such as cluttered backgrounds, occlusions, viewpoint variation, execution rate, and camera motion. A large number of techniques have been proposed to address the challenges over the decades. Three different types of datasets namely, single viewpoint, multiple viewpoint and RGB-depth videos, are used for research. This paper presents a review of various state-of-the-art deep learning-based techniques proposed for human action recognition on the three types of datasets. In light of the growing popularity and the recent developments in video-based human action recognition, this review imparts details of current trends and potential directions for future work to assist researchers.

109 citations


Journal ArticleDOI
TL;DR: Results for 3D-2D face recognition on the UHDB11 3D/2D database with 2D images under large illumination and pose variations support the hypothesis that, in challenging datasets, 3D+2D outperforms 2D- 2D and decreases the performance gap against 3D.

66 citations


Journal ArticleDOI
TL;DR: A 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization.
Abstract: In this paper, we present a 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization. The 4DHOI model represents the geometric, temporal, and semantic relations in daily events involving human-object interactions. In 3D space, the interactions of human poses and contextual objects are modeled by semantic co-occurrence and geometric compatibility. On the time axis, the interactions are represented as a sequence of atomic event transitions with coherent objects. The 4DHOI model is a hierarchical spatial-temporal graph representation which can be used for inferring scene functionality and object affordance. The graph structures and parameters are learned using an ordered expectation maximization algorithm which mines the spatial-temporal structures of events from RGB-D video samples. Given an input RGB-D video, the inference is performed by a dynamic programming beam search algorithm which simultaneously carries out event segmentation, recognition, and object localization. We collected a large multiview RGB-D event dataset which contains 3,815 video sequences and 383,036 RGB-D frames captured by three RGB-D cameras. The experimental results on three challenging datasets demonstrate the strength of the proposed method.

63 citations


Proceedings ArticleDOI
01 Jan 2017
TL;DR: Paper Session I o Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval and LightNet: A Lightweight 3D convolutional neural network for Real-Time 3D Object Recognition.
Abstract: 09.15 10.45 Paper Session I o Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval Konstantinos Sfikas, Theoharis Theoharis and Ioannis Pratikakis o LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition Shuaifeng Zhi, Yongxiang Liu, Xiang Li and Yulan Guo o Unstructured point cloud semantic labeling using deep segmentation networks Alexandre Boulch, Bertrand Le Saux and Nicolas Audebert

58 citations


Journal ArticleDOI
TL;DR: The adapted knowledge is utilized to learn the correlated action semantics by exploring the common components of both labeled videos and images and extended to a semi-supervised framework which can leverage both labeled and unlabeled videos.
Abstract: Human action recognition has been well explored in applications of computer vision. Many successful action recognition methods have shown that action knowledge can be effectively learned from motion videos or still images. For the same action, the appropriate action knowledge learned from different types of media, e.g., videos or images, may be related. However, less effort has been made to improve the performance of action recognition in videos by adapting the action knowledge conveyed from images to videos. Most of the existing video action recognition methods suffer from the problem of lacking sufficient labeled training videos. In such cases, over-fitting would be a potential problem and the performance of action recognition is restrained. In this paper, we propose an adaptation method to enhance action recognition in videos by adapting knowledge from images. The adapted knowledge is utilized to learn the correlated action semantics by exploring the common components of both labeled videos and images. Meanwhile, we extend the adaptation method to a semi-supervised framework which can leverage both labeled and unlabeled videos. Thus, the over-fitting can be alleviated and the performance of action recognition is improved. Experiments on public benchmark datasets and real-world datasets show that our method outperforms several other state-of-the-art action recognition methods.

56 citations


Proceedings ArticleDOI
Liping Yuan1, Zhiyi Qu1, Yufeng Zhao1, Hongshuai Zhang1, Qing Nian 
25 Mar 2017
TL;DR: Experimental results show that the proposed Convolutional Neural Network based on TensorFlow, an open source deep learning framework, has better recognition accuracy and higher robustness in complex environment.
Abstract: Face recognition is a hot research field in computer vision, and it has a high practical value for the detection and recognition of specific sensitive characters. Research found that in traditional hand-crafted features, there are uncontrolled environments such as pose, facial expression, illumination and occlusion influencing the accuracy of recognition and it has poor performance, so the deep learning method is adopted. On the basis of face detection, a Convolutional Neural Network (CNN) based on TensorFlow, an open source deep learning framework, is proposed for face recognition. Experimental results show that the proposed method has better recognition accuracy and higher robustness in complex environment.

54 citations


Book ChapterDOI
01 Jan 2017
TL;DR: It is argued that the computational mechanisms subserving recognition of heavily occluded objects rely on neural circuits with recurrent connectivity that are capable of interpreting incoming inputs in the context of prior knowledge.
Abstract: Pattern completion is a ubiquitous and critical component of visual recognition under natural conditions whereby we need to make inferences from partial information. Pattern recognition from sparse information is essential when objects are rendered under poor illumination or when they are significantly occluded. Here we provide an overview of the behavioral, physiological and computational studies of pattern completion. We argue that the computational mechanisms subserving recognition of heavily occluded objects rely on neural circuits with recurrent connectivity that are capable of interpreting incoming inputs in the context of prior knowledge.

53 citations


Journal ArticleDOI
TL;DR: This tutorial paper reviews typical face recognition algorithms with implications for the design of CFs, and discusses and compares the numerical and optical implementations of correlators.
Abstract: In recent years, correlation-filter (CF)-based face recognition algorithms have attracted increasing interest in the field of pattern recognition and have achieved impressive results in discrimination, efficiency, location accuracy, and robustness. In this tutorial paper, our goal is to help the reader get a broad overview of CFs in three respects: design, implementation, and application. We review typical face recognition algorithms with implications for the design of CFs. We discuss and compare the numerical and optical implementations of correlators. Some newly proposed implementation schemes and application examples are also presented to verify the feasibility and effectiveness of CFs as a powerful recognition tool.

51 citations


Proceedings ArticleDOI
01 May 2017
TL;DR: This work proposes a framework to deal with cross-modal visuo-tactile object recognition, which means that the object recognition algorithm is trained only with visual data and is able to recognize objects leveraging only tactile perception.
Abstract: In this work, we propose a framework to deal with cross-modal visuo-tactile object recognition. By cross-modal visuo-tactile object recognition, we mean that the object recognition algorithm is trained only with visual data and is able to recognize objects leveraging only tactile perception. The proposed cross-modal framework is constituted by three main elements. The first is a unified representation of visual and tactile data, which is suitable for cross-modal perception. The second is a set of features able to encode the chosen representation for classification applications. The third is a supervised learning algorithm, which takes advantage of the chosen descriptor. In order to show the results of our approach, we performed experiments with 15 objects common in domestic and industrial environments. Moreover, we compare the performance of the proposed framework with the performance of 10 humans in a simple cross-modal recognition task.

Journal ArticleDOI
TL;DR: It is proved that good results can be obtained by exploiting color and texture information in a multi-stage process: pre-selection, fine-selection and post processing.
Abstract: An approach for candidate pre-selection based on corners and color information.A robust approach for object detection and recognition; Bag of Words and Deep Neural Networks are compared.A post-processing step, used to combine multiple detection of the same object.A deep experimental evaluation on the complex Grozi-120 public dataset. Object detection and recognition are challenging computer vision tasks receiving great attention due to the large number of applications. This work focuses on the detection/recognition of products in supermarket shelves; this framework has a number of practical applications such as providing additional product/price information to the user or guiding visually impaired customers during shopping. The automatic creation of planograms (i.e., actual layout of products on shelves) is also useful for commercial analysis and management of large stores.Although in many object detection/recognition contexts it can be assumed that training images are representative of the real operational conditions, in our scenario such assumption is not realistic because the only training images available are acquired in well-controlled conditions. This gap between the training and test data makes the object detection and recognition tasks far more complex and requires very robust techniques. In this paper we prove that good results can be obtained by exploiting color and texture information in a multi-stage process: pre-selection, fine-selection and post processing. For fine-selection we compared a classical Bag of Words technique with a more recent Deep Neural Networks approach and found interesting outcomes. Extensive experiments on datasets of varying complexity are discussed to highlight the main issues characterizing this problem, and to guide toward the practical development of a real application.

Journal ArticleDOI
TL;DR: This article presents a vision system for assistive robots that is able to detect and recognize objects from a visual input in ordinary environments in real time, taking some inspiration from vision science.
Abstract: Technological advances are being made to assist humans in performing ordinary tasks in everyday settings. A key issue is the interaction with objects of varying size, shape, and degree of mobility. Autonomous assistive robots must be provided with the ability to process visual data in real time so that they can react adequately for quickly adapting to changes in the environment. Reliable object detection and recognition is usually a necessary early step to achieve this goal. In spite of significant research achievements, this issue still remains a challenge when real-life scenarios are considered. In this article, we present a vision system for assistive robots that is able to detect and recognize objects from a visual input in ordinary environments in real time. The system computes color, motion, and shape cues, combining them in a probabilistic manner to accurately achieve object detection and recognition, taking some inspiration from vision science. In addition, with the purpose of processing the input visual data in real time, a graphical processing unit (GPU) has been employed. The presented approach has been implemented and evaluated on a humanoid robot torso located at realistic scenarios. For further experimental validation, a public image repository for object recognition has been used, allowing a quantitative comparison with respect to other state-of-the-art techniques when realworld scenes are considered. Finally, a temporal analysis of the performance is provided with respect to image resolution and the number of target objects in the scene.

Proceedings ArticleDOI
25 Mar 2017
TL;DR: Convolutional Neural Networks is one of the most representative network structures in deep learning technology, and it has achieved great success in the field of image processing and recognition.
Abstract: Face recognition is a kind of biometrics which based on the facial feature information of human. And face recognition has wide application value in computer information security, medical treatment, security monitoring, human-computer interaction and finance. Face feature extraction is the key of face recognition technology, and it is related to the selection and recognition of face recognition algorithm. Local Binary Pattern is a texture description method that describes the local texture feature of an image in a gray-scale range. In recent years, many researchers have successfully applied it to facial feature description and recognition in face recognition, and achieved remarkable results. Convolutional Neural Networks is one of the most representative network structures in deep learning technology, and it has achieved great success in the field of image processing and recognition.

Journal ArticleDOI
TL;DR: It is observed that humans relied on specific (diagnostic) object regions for accurate recognition which remained relatively consistent across variations; but feed-forward feature-extraction models selected view-specific (non-invariant) features across variations, suggesting that models can develop different strategies, but reach human-level recognition performance.
Abstract: One key ability of human brain is invariant object recognition, which refers to rapid and accurate recognition of objects in the presence of variations such as size, rotation and position. Despite decades of research into the topic, it remains unknown how the brain constructs invariant representations of objects. Providing brain-plausible object representations and reaching human-level accuracy in recognition, hierarchical models of human vision have suggested that, human brain implements similar feed-forward operations to obtain invariant representations. However, conducting two psychophysical object recognition experiments on humans with systematically controlled variations of objects, we observed that humans relied on specific (diagnostic) object regions for accurate recognition which remained relatively consistent (invariant) across variations; but feed-forward feature-extraction models selected view-specific (non-invariant) features across variations. This suggests that models can develop different strategies, but reach human-level recognition performance. Moreover, human individuals largely disagreed on their diagnostic features and flexibly shifted their feature extraction strategy from view-invariant to view-specific when objects became more similar. This implies that, even in rapid object recognition, rather than a set of feed-forward mechanisms which extract diagnostic features from objects in a hard-wired fashion, the bottom-up visual pathways receive, through top-down connections, task-related information possibly processed in prefrontal cortex.

Journal ArticleDOI
TL;DR: An object detection system finds objects of the real world present either in a digital image or a video, where the object can belong to any class of objects namely humans, cars, etc.
Abstract: An object detection system finds objects of the real world present either in a digital image or a video, where the object can belong to any class of objects namely humans, cars, etc. In order to detect an object in an image or a video the system needs to have a few components in order to complete the task of detecting an object, they are a model database, a feature detector, a hypothesiser and a hypothesiser verifier. This paper presents a review of the various techniques that are used to detect an object, localise an object, categorise an object, extract features, appearance information, and many more, in images and videos. The comments are drawn based on the studied literature and key issues are also identified relevant to the object detection. Information about the source codes and online datasets is provided to facilitate the new researcher in object detection area. An idea about the possible solution for the multi class object detection is also presented. This paper is suitable for the researchers who are the beginners in this domain.

Proceedings ArticleDOI
06 Nov 2017
TL;DR: A new network layer is introduced that can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts, and is end-to-end trainable.
Abstract: This paper addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However, the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is labor-intensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation-and translation-invariant, and are robust to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at https://github.com/yafangshih/Deep-COOC.

Proceedings ArticleDOI
21 Jul 2017
TL;DR: The deep learning paradigm is introduced to the problem for the first time, developing a number of novel neuro-biologically and neuro-physiologically inspired architectures that utilize state-of-the-art neural networks for fusing the available information sources in multiple ways.
Abstract: It is well-established by cognitive neuroscience that human perception of objects constitutes a complex process, where object appearance information is combined with evidence about the so-called object affordances, namely the types of actions that humans typically perform when interacting with them. This fact has recently motivated the sensorimotor approach to the challenging task of automatic object recognition, where both information sources are fused to improve robustness. In this work, the aforementioned paradigm is adopted, surpassing current limitations of sensorimotor object recognition research. Specifically, the deep learning paradigm is introduced to the problem for the first time, developing a number of novel neuro-biologically and neuro-physiologically inspired architectures that utilize state-of-the-art neural networks for fusing the available information sources in multiple ways. The proposed methods are evaluated using a large RGB-D corpus, which is specifically collected for the task of sensorimotor object recognition and is made publicly available. Experimental results demonstrate the utility of affordance information to object recognition, achieving an up to 29% relative error reduction by its inclusion.

Book ChapterDOI
01 Jan 2017
TL;DR: The proposed work judges their performance in different circumstances such as rotational effect scaling effect, illumination effect and blurring effect, and investigates the speed of each algorithm in different situations.
Abstract: Object recognition can be done based on local feature description algorithm or through global feature description algorithm. Both types of these descriptors have the efficiency in recognizing an object quickly and accurately. The proposed work judges their performance in different circumstances such as rotational effect scaling effect, illumination effect and blurring effect. Authors also investigate the speed of each algorithm in different situations. The experimental result shows that each one has some advantages as well as some drawbacks. SIFT (Scale Invariant Feature Transformation) and SURF (Speeded Up Robust Features) performs relatively better under scale and rotation change. MSER (Maximally stable extremal regions) performs better under scale change, MinEigen in affine change and illumination change while FAST (Feature from Accelerated segment test) and SURF consume less time.

Journal ArticleDOI
TL;DR: This work proposes to connect, in a closed-loop, detectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences and obtains three to four points of improvement in mAP and a detection time that is lower than Faster Regions with CNN features (R-CNN), which is the fastest Convolutional Neural Network (CNN) based generic object detector known at the moment.
Abstract: Object detection is one of the most important tasks of computer vision. It is usually performed by evaluating a subset of the possible locations of an image, that are more likely to contain the object of interest. Exhaustive approaches have now been superseded by object proposal methods. The interplay of detectors and proposal algorithms has not been fully analyzed and exploited up to now, although this is a very relevant problem for object detection in video sequences. We propose to connect, in a closed-loop, detectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences. Different from tracking we only require a previous frame to improve both proposal and detection: no prediction based on local motion is performed, thus avoiding tracking errors. We obtain three to four points of improvement in mAP and a detection time that is lower than Faster Regions with CNN features (R-CNN), which is the fastest Convolutional Neural Network (CNN) based generic object detector known at the moment.

Posted Content
Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng 
26 Jun 2017
TL;DR: This paper studies object detection using a large pool of unlabeled images and only a few labeled images per category, named “few-shot object detection”, and embeds multiple detection models in this framework, which has proven to outperform the single model baseline and the model ensemble method.
Abstract: In this paper, we study object detection using a large pool of unlabeled images and only a few labeled images per category, named “few-shot object detection”. The key challenge consists in generating trustworthy training samples as many as possible from the pool. Using few training examples as seeds, our method iterates between model training and high-confidence sample selection. In training, easy samples are generated first and, then the poorly initialized model undergoes improvement. As the model becomes more discriminative, challenging but reliable samples are selected. After that, another round of model improvement takes place. To further improve the precision and recall of the generated training samples, we embed multiple detection models in our framework, which has proven to outperform the single model baseline and the model ensemble method. Experiments on PASCAL VOC’07 and ILSVRC’13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.

Proceedings ArticleDOI
01 Mar 2017
TL;DR: A novel method to update assets for telecommunication infrastructure using google street view (GSV) images using HOG descriptors with SVM, Deformable parts model (DPM), and Deep learning using faster RCNNs is presented.
Abstract: We present a novel method to update assets for telecommunication infrastructure using google street view (GSV) images. The problem is formulated as a object recognition task, followed by use of triangulation to estimate the object coordinates from sensor plane coordinates, To this end, we have explored different state-of-the-art object recognition techniques both from feature engineering and using deep learning namely HOG descriptors with SVM, Deformable parts model (DPM), and Deep learning (DL) using faster RCNNs. While HOG+SVM has proved to be robust human detector, DPM which is based on probabilistic graphical models and DL which is a non-linear classifier have proved their versatility in different types of object recognition problems. Asset recognition from the street view images however pose unique challenge as they could be installed on the ground in various poses, orientations and with occlusions, objects camouflaged in the background and in some cases inter class variation is small. We present comparative performance of these techniques for specific use-case involving telecom equipment for highest precision and recall. The blocks of proposed pipeline are detailed and compared to traditional inventory management methods.

Proceedings ArticleDOI
01 Apr 2017
TL;DR: This paper presents a method for face recognition adapted to real-world conditions that can be trained using very few training examples and is computationally efficient and significantly outperform state-of-the-art methods.
Abstract: Face recognition systems are designed to handle well-aligned images captured under controlled situations. However real-world images present varying orientations, expressions, and illumination conditions. Traditional face recognition algorithms perform poorly on such images. In this paper we present a method for face recognition adapted to real-world conditions that can be trained using very few training examples and is computationally efficient. Our method consists of performing a novel alignment process followed by classification using sparse representation techniques. We present our recognition rates on a difficult dataset that represents real-world faces where we significantly outperform state-of-the-art methods.

Proceedings ArticleDOI
10 Jul 2017
TL;DR: This work proposes a method to integrate spatial distribution of object to facilitate visual relation detection, and establishes a modeling method to make these three aspects working together to facilitateVisual relationship detection.
Abstract: Recently, object recognition techniques have been rapidly developed. Most of existing object recognition focused on recognizing several independent concepts. The relationship of objects is also an important problem, which shows in-depth semantic information of images. In this work, toward general visual relationship detection, we propose a method to integrate spatial distribution of object to facilitate visual relation detection. Spatial distribution can not only reflect positional relation of object but also describe structural information between objects. Spatial distributions are described with different features such as positional relation, size relation, shape relation, and so on. By combing spatial distribution features with visual and concept features, we establish a modeling method to make these three aspects working together to facilitate visual relationship detection. To evaluate the proposed method, we conduct experiments on two datasets, which are the Stanford VRD dataset, and a newly proposed larger new dataset which contains 15k images. Experimental results demonstrate that our approach is effective.

Journal ArticleDOI
TL;DR: Experimental results support the theory that non-affine variations such as pose and lighting may need top-down feedback information from higher areas such as IT and PFC for precise object recognition.

Journal ArticleDOI
10 May 2017
TL;DR: This paper introduces an innovative method for generating thermal-mapped point clouds of a robot’s work environment and performing automatic object recognition with the aid of thermal data fused to 3D point clouds.
Abstract: Many of the civil structures are more than half way through or nearing their intended service life; frequently assessing and maintaining structural integrity is a top maintenance priority. Robotic inspection technologies using ground and aerial robots with 3D scanning and imaging capabilities have the potential to improve safety and efficiency of infrastructure management. To provide more valuable information to inspectors and agency decision makers, automatic environment sensing and semantic information extraction are fundamental issues in this field. This paper introduces an innovative method for generating thermal-mapped point clouds of a robot’s work environment and performing automatic object recognition with the aid of thermal data fused to 3D point clouds. The laser scanned point cloud and thermal data were collected using a custom-designed mobile robot. The multimodal data was combined with a data fusion process based on texture mapping. The automatic object recognition was performed by two processes: segmentation with thermal data and classification with scanned geometric features. The proposed method was validated with the scan data collected in an entire building floor. Experimental results show that the thermal integrated object recognition approach achieved better performance than a geometry only-based approach, with an average recognition accuracy of 93%, precision of 83%, and recall rate of 86% for objects in the tested environment including humans, display monitors and light fixtures.

Proceedings ArticleDOI
26 Apr 2017
TL;DR: This paper describes the different classifier methods with minimum means of clusters to achieve face recognition rate of humans from the feature extracted of training face image data for many sets of images as a data base.
Abstract: This paper describes the different classifier methods with minimum means of clusters to achieve face recognition rate of humans from the feature extracted of training face image data for many sets of images as a data base. Principal Component Analysis (PCA) is a robust method used as feature extraction techniques for face recognition but the recognition decreases with the variation of person's actions. The features extracted for face images are light insensitive, individual, hidden, and activity effective to biometric recognition. The face recognition treats as two dimensions recognition problems, the fact is to take the advantage of these human faces are straight pose in general may be represented as a small set of two dimension characteristics view. The training and testing face images are selected from Research Laboratory for Olivetti and Oracle (ORL) face database, which have minimum pose variation. Three classifier methods are used to obtain the distance of recognition. These classifiers are: the Euclidian distance method, the Squared Euclidian Distance method, and the City-Block Distance method. By Clustering the difference of training image with images set for each person and determined the mean to it, the minimum mean is representing the recognition of the person. The cluster method with Squared Euclidian Distance method produces higher a recognition rate 100% near the Euclidian Distance method which gives a human face recognition rate 98% higher than the City-Block Distance method which gives a recognition rate 95%.

Journal ArticleDOI
TL;DR: A new control strategy, based on a minimal spring model of the objects, is presented and used for the control of the robot hand and an adaptable tactile-servo control scheme is presented that can be used in in-hand manipulation tasks of deformable objects.
Abstract: Grasping and manipulating objects with robotic hands depend largely on the features of the object to be used. Especially, features such as softness and deformability are crucial to take into account during the manipulation tasks. Indeed, positions of the fingers and forces to be applied by the robot hand when manipulating an object must be adapted to the caused deformation. For unknown objects, a previous recognition stage is usually needed to get the features of the object, and the manipulation strategies must be adapted depending on that recognition stage. To obtain a precise control in the manipulation task, a complex object model is usually needed and performed, for example using the Finite Element Method. However, these models require a complete discretization of the object and they are time-consuming for the performance of the manipulation tasks. For that reason, in this paper a new control strategy, based on a minimal spring model of the objects, is presented and used for the control of the robot hand. This paper also presents an adaptable tactile-servo control scheme that can be used in in-hand manipulation tasks of deformable objects. Tactile control is based on achieving and maintaining a force value at the contact points which changes according to the object softness, a feature estimated in an initial recognition stage.

Book ChapterDOI
10 Jul 2017
TL;DR: In this article, the authors propose a data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system, which can be used with any convolutional deep architecture.
Abstract: Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7%, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available.

Proceedings ArticleDOI
01 Aug 2017
TL;DR: The aim is to develop an efficient method which uses a custom image to train the classifier, which extract distinct features from the input image for classifying its contents as characters specifically letters and digits.
Abstract: The aim is to develop an efficient method which uses a custom image to train the classifier. This OCR extract distinct features from the input image for classifying its contents as characters specifically letters and digits. Input to the system is digital images containing the patterns to be classified. The analysis and recognition of the patterns in images are becoming more complex, yet easy with advances in technological knowledge. Therefore it is proposed to develop sophisticated strategies of pattern analysis to cope with these difficulties. The present work involves application of pattern recognition using KNN to recognize handwritten or printed text.