Showing papers on "3D single-object recognition published in 2017"

PDF

Open Access

Journal Article•DOI•

HOTS: A Hierarchy of Event-Based Time-Surfaces for Pattern Recognition

[...]

Xavier Lagorce¹, Garrick Orchard², Francesco Galluppi¹, Bertram E. Shi, Ryad Benosman¹ - Show less +1 more•Institutions (2)

French Institute of Health and Medical Research¹, National University of Singapore²

01 Jul 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The central concept is to use the rich temporal information provided by events to create contexts in the form of time-surfaces which represent the recent temporal activity within a local spatial neighborhood and it is demonstrated that this concept can robustly be used at all stages of an event-based hierarchical model.

...read moreread less

Abstract: This paper describes novel event-based spatio-temporal features called time-surfaces and how they can be used to create a hierarchical event-based pattern recognition architecture. Unlike existing hierarchical architectures for pattern recognition, the presented model relies on a time oriented approach to extract spatio-temporal features from the asynchronously acquired dynamics of a visual scene. These dynamics are acquired using biologically inspired frameless asynchronous event-driven vision sensors. Similarly to cortical structures, subsequent layers in our hierarchy extract increasingly abstract features using increasingly large spatio-temporal windows. The central concept is to use the rich temporal information provided by events to create contexts in the form of time-surfaces which represent the recent temporal activity within a local spatial neighborhood. We demonstrate that this concept can robustly be used at all stages of an event-based hierarchical model. First layer feature units operate on groups of pixels, while subsequent layer feature units operate on the output of lower level feature units. We report results on a previously published 36 class character recognition task and a four class canonical dynamic card pip task, achieving near 100 percent accuracy on each. We introduce a new seven class moving face recognition task, achieving 79 percent accuracy.

...read moreread less

405 citations

Journal Article•DOI•

Modern Computer Vision Techniques for X-Ray Testing in Baggage Inspection

[...]

Domingo Mery¹, Erick Svec¹, Marco Arias¹, Vladimir Riffo², Jose M. Saavedra, Sandipan Banerjee³ - Show less +2 more•Institutions (3)

Pontifical Catholic University of Chile¹, University of Atacama², University of Notre Dame³

01 Apr 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: This paper attempts to make a contribution to the field of object recognition in X-ray testing by evaluating different computer vision strategies that have been proposed in the last years, and strongly believes that it is possible to design an automated aid for the human inspection task using these computer vision algorithms.

...read moreread less

Abstract: X-ray screening systems have been used to safeguard environments in which access control is of paramount importance. Security checkpoints have been placed at the entrances to many public places to detect prohibited items, such as handguns and explosives. Generally, human operators are in charge of these tasks as automated recognition in baggage inspection is still far from perfect. Research and development on X-ray testing is, however, exploring new approaches based on computer vision that can be used to aid human operators. This paper attempts to make a contribution to the field of object recognition in X-ray testing by evaluating different computer vision strategies that have been proposed in the last years. We tested ten approaches. They are based on bag of words, sparse representations, deep learning, and classic pattern recognition schemes among others. For each method, we: 1) present a brief explanation; 2) show experimental results on the same database; and 3) provide concluding remarks discussing pros and cons of each method. In order to make fair comparisons, we define a common experimental protocol based on training, validation, and testing data (selected from the public ${\mathbb {GDX}}$ ray database). The effectiveness of each method was tested in the recognition of three different threat objects: 1) handguns; 2) shuriken (ninja stars); and 3) razor blades. In our experiments, the highest recognition rate was achieved by methods based on visual vocabularies and deep features with more than 95% of accuracy. We strongly believe that it is possible to design an automated aid for the human inspection task using these computer vision algorithms.

...read moreread less

126 citations

Proceedings Article•DOI•

Recent advances in video-based human action recognition using deep learning: A review

[...]

Di Wu¹, Nabin Sharma¹, Michael Blumenstein¹•Institutions (1)

University of Technology, Sydney¹

14 May 2017

TL;DR: This paper presents a review of various state-of-the-art deep learning-based techniques proposed for human action recognition on three types of datasets, namely, single viewpoint, multiple viewpoint and RGB-depth videos.

...read moreread less

Abstract: Video-based human action recognition has become one of the most popular research areas in the field of computer vision and pattern recognition in recent years. It has a wide variety of applications such as surveillance, robotics, health care, video searching and human-computer interaction. There are many challenges involved in human action recognition in videos, such as cluttered backgrounds, occlusions, viewpoint variation, execution rate, and camera motion. A large number of techniques have been proposed to address the challenges over the decades. Three different types of datasets namely, single viewpoint, multiple viewpoint and RGB-depth videos, are used for research. This paper presents a review of various state-of-the-art deep learning-based techniques proposed for human action recognition on the three types of datasets. In light of the growing popularity and the recent developments in video-based human action recognition, this review imparts details of current trends and potential directions for future work to assist researchers.

...read moreread less

109 citations

Journal Article•DOI•

3D-2D face recognition with pose and illumination normalization

[...]

Ioannis A. Kakadiaris¹, George Toderici¹, Georgios Evangelopoulos¹, G. Passalis², Dat Chu¹, Xi Zhao¹, Shishir K. Shah¹, Theoharis Theoharis² - Show less +4 more•Institutions (2)

University of Houston¹, National and Kapodistrian University of Athens²

01 Jan 2017-Computer Vision and Image Understanding

TL;DR: Results for 3D-2D face recognition on the UHDB11 3D/2D database with 2D images under large illumination and pose variations support the hypothesis that, in challenging datasets, 3D+2D outperforms 2D- 2D and decreases the performance gap against 3D.

...read moreread less

66 citations

Journal Article•DOI•

Modeling 4D Human-Object Interactions for Joint Event Segmentation, Recognition, and Object Localization

[...]

Ping Wei¹, Yibiao Zhao², Nanning Zheng¹, Song-Chun Zhu²•Institutions (2)

Xi'an Jiaotong University¹, University of California, Los Angeles²

01 Jun 2017-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: A 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization.

...read moreread less

Abstract: In this paper, we present a 4D human-object interaction (4DHOI) model for solving three vision tasks jointly: i) event segmentation from a video sequence, ii) event recognition and parsing, and iii) contextual object localization. The 4DHOI model represents the geometric, temporal, and semantic relations in daily events involving human-object interactions. In 3D space, the interactions of human poses and contextual objects are modeled by semantic co-occurrence and geometric compatibility. On the time axis, the interactions are represented as a sequence of atomic event transitions with coherent objects. The 4DHOI model is a hierarchical spatial-temporal graph representation which can be used for inferring scene functionality and object affordance. The graph structures and parameters are learned using an ordered expectation maximization algorithm which mines the spatial-temporal structures of events from RGB-D video samples. Given an input RGB-D video, the inference is performed by a dynamic programming beam search algorithm which simultaneously carries out event segmentation, recognition, and object localization. We collected a large multiview RGB-D event dataset which contains 3,815 video sequences and 383,036 RGB-D frames captured by three RGB-D cameras. The experimental results on three challenging datasets demonstrate the strength of the proposed method.

...read moreread less

63 citations

Proceedings Article•DOI•

LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition

[...]

Shuaifeng Zhi¹, Yongxiang Liu¹, Xiang Li¹, Yulan Guo¹•Institutions (1)

National University of Defense Technology¹

01 Jan 2017

TL;DR: Paper Session I o Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval and LightNet: A Lightweight 3D convolutional neural network for Real-Time 3D Object Recognition.

...read moreread less

Abstract: 09.15 10.45 Paper Session I o Exploiting the PANORAMA Representation for Convolutional Neural Network Classification and Retrieval Konstantinos Sfikas, Theoharis Theoharis and Ioannis Pratikakis o LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition Shuaifeng Zhi, Yongxiang Liu, Xiang Li and Yulan Guo o Unstructured point cloud semantic labeling using deep segmentation networks Alexandre Boulch, Bertrand Le Saux and Nicolas Audebert

...read moreread less

58 citations

Journal Article•DOI•

Semi-Supervised Image-to-Video Adaptation for Video Action Recognition

[...]

Jianguang Zhang¹, Yahong Han¹, Jinhui Tang², Qinghua Hu¹, Jianmin Jiang³ - Show less +1 more•Institutions (3)

Tianjin University¹, Nanjing University of Science and Technology², Shenzhen University³

01 Apr 2017-IEEE Transactions on Systems, Man, and Cybernetics

TL;DR: The adapted knowledge is utilized to learn the correlated action semantics by exploring the common components of both labeled videos and images and extended to a semi-supervised framework which can leverage both labeled and unlabeled videos.

...read moreread less

Abstract: Human action recognition has been well explored in applications of computer vision. Many successful action recognition methods have shown that action knowledge can be effectively learned from motion videos or still images. For the same action, the appropriate action knowledge learned from different types of media, e.g., videos or images, may be related. However, less effort has been made to improve the performance of action recognition in videos by adapting the action knowledge conveyed from images to videos. Most of the existing video action recognition methods suffer from the problem of lacking sufficient labeled training videos. In such cases, over-fitting would be a potential problem and the performance of action recognition is restrained. In this paper, we propose an adaptation method to enhance action recognition in videos by adapting knowledge from images. The adapted knowledge is utilized to learn the correlated action semantics by exploring the common components of both labeled videos and images. Meanwhile, we extend the adaptation method to a semi-supervised framework which can leverage both labeled and unlabeled videos. Thus, the over-fitting can be alleviated and the performance of action recognition is improved. Experiments on public benchmark datasets and real-world datasets show that our method outperforms several other state-of-the-art action recognition methods.

...read moreread less

56 citations

Proceedings Article•DOI•

A convolutional neural network based on TensorFlow for face recognition

[...]

Liping Yuan¹, Zhiyi Qu¹, Yufeng Zhao¹, Hongshuai Zhang¹, Qing Nian - Show less +1 more•Institutions (1)

Lanzhou University¹

25 Mar 2017

TL;DR: Experimental results show that the proposed Convolutional Neural Network based on TensorFlow, an open source deep learning framework, has better recognition accuracy and higher robustness in complex environment.

...read moreread less

Abstract: Face recognition is a hot research field in computer vision, and it has a high practical value for the detection and recognition of specific sensitive characters. Research found that in traditional hand-crafted features, there are uncontrolled environments such as pose, facial expression, illumination and occlusion influencing the accuracy of recognition and it has poor performance, so the deep learning method is adopted. On the basis of face detection, a Convolutional Neural Network (CNN) based on TensorFlow, an open source deep learning framework, is proposed for face recognition. Experimental results show that the proposed method has better recognition accuracy and higher robustness in complex environment.

...read moreread less

54 citations

Book Chapter•DOI•

Recognition of Occluded Objects

[...]

Hanlin Tang¹, Gabriel Kreiman¹•Institutions (1)

Harvard University¹

01 Jan 2017

TL;DR: It is argued that the computational mechanisms subserving recognition of heavily occluded objects rely on neural circuits with recurrent connectivity that are capable of interpreting incoming inputs in the context of prior knowledge.

...read moreread less

Abstract: Pattern completion is a ubiquitous and critical component of visual recognition under natural conditions whereby we need to make inferences from partial information. Pattern recognition from sparse information is essential when objects are rendered under poor illumination or when they are significantly occluded. Here we provide an overview of the behavioral, physiological and computational studies of pattern completion. We argue that the computational mechanisms subserving recognition of heavily occluded objects rely on neural circuits with recurrent connectivity that are capable of interpreting incoming inputs in the context of prior knowledge.

...read moreread less

53 citations

Journal Article•DOI•

New perspectives in face correlation research: a tutorial

[...]

Qu Wang¹, Ayman Alfalou, C. Brosseau•Institutions (1)

Guangdong University of Technology¹

31 Mar 2017-Advances in Optics and Photonics

TL;DR: This tutorial paper reviews typical face recognition algorithms with implications for the design of CFs, and discusses and compares the numerical and optical implementations of correlators.

...read moreread less

Abstract: In recent years, correlation-filter (CF)-based face recognition algorithms have attracted increasing interest in the field of pattern recognition and have achieved impressive results in discrimination, efficiency, location accuracy, and robustness. In this tutorial paper, our goal is to help the reader get a broad overview of CFs in three respects: design, implementation, and application. We review typical face recognition algorithms with implications for the design of CFs. We discuss and compare the numerical and optical implementations of correlators. Some newly proposed implementation schemes and application examples are also presented to verify the feasibility and effectiveness of CFs as a powerful recognition tool.

...read moreread less

51 citations

Proceedings Article•DOI•

Cross-modal visuo-tactile object recognition using robotic active exploration

[...]

Pietro Falco¹, Shuang Lu¹, Andrea Cirillo², Ciro Natale², Salvatore Pirozzi², Dongheui Lee¹ - Show less +2 more•Institutions (2)

Technische Universität München¹, Seconda Università degli Studi di Napoli²

01 May 2017

TL;DR: This work proposes a framework to deal with cross-modal visuo-tactile object recognition, which means that the object recognition algorithm is trained only with visual data and is able to recognize objects leveraging only tactile perception.

...read moreread less

Abstract: In this work, we propose a framework to deal with cross-modal visuo-tactile object recognition. By cross-modal visuo-tactile object recognition, we mean that the object recognition algorithm is trained only with visual data and is able to recognize objects leveraging only tactile perception. The proposed cross-modal framework is constituted by three main elements. The first is a unified representation of visual and tactile data, which is suitable for cross-modal perception. The second is a set of features able to encode the chosen representation for classification applications. The third is a supervised learning algorithm, which takes advantage of the chosen descriptor. In order to show the results of our approach, we performed experiments with 15 objects common in domestic and industrial environments. Moreover, we compare the performance of the proposed framework with the performance of 10 humans in a simple cross-modal recognition task.

...read moreread less

Journal Article•DOI•

Grocery product detection and recognition

[...]

Annalisa Franco, Davide Maltoni, Serena Papi

15 Sep 2017-Expert Systems With Applications

TL;DR: It is proved that good results can be obtained by exploiting color and texture information in a multi-stage process: pre-selection, fine-selection and post processing.

...read moreread less

Abstract: An approach for candidate pre-selection based on corners and color information.A robust approach for object detection and recognition; Bag of Words and Deep Neural Networks are compared.A post-processing step, used to combine multiple detection of the same object.A deep experimental evaluation on the complex Grozi-120 public dataset. Object detection and recognition are challenging computer vision tasks receiving great attention due to the large number of applications. This work focuses on the detection/recognition of products in supermarket shelves; this framework has a number of practical applications such as providing additional product/price information to the user or guiding visually impaired customers during shopping. The automatic creation of planograms (i.e., actual layout of products on shelves) is also useful for commercial analysis and management of large stores.Although in many object detection/recognition contexts it can be assumed that training images are representative of the real operational conditions, in our scenario such assumption is not realistic because the only training images available are acquired in well-controlled conditions. This gap between the training and test data makes the object detection and recognition tasks far more complex and requires very robust techniques. In this paper we prove that good results can be obtained by exploiting color and texture information in a multi-stage process: pre-selection, fine-selection and post processing. For fine-selection we compared a classical Bag of Words technique with a more recent Deep Neural Networks approach and found interesting outcomes. Extensive experiments on datasets of varying complexity are discussed to highlight the main issues characterizing this problem, and to guide toward the practical development of a real application.

...read moreread less

Journal Article•DOI•

Object Detection and Recognition for Assistive Robots: Experimentation and Implementation

[...]

Ester Martinez-Martin¹, Angel P. del Pobil¹•Institutions (1)

James I University¹

22 Mar 2017-IEEE Robotics & Automation Magazine

TL;DR: This article presents a vision system for assistive robots that is able to detect and recognize objects from a visual input in ordinary environments in real time, taking some inspiration from vision science.

...read moreread less

Abstract: Technological advances are being made to assist humans in performing ordinary tasks in everyday settings. A key issue is the interaction with objects of varying size, shape, and degree of mobility. Autonomous assistive robots must be provided with the ability to process visual data in real time so that they can react adequately for quickly adapting to changes in the environment. Reliable object detection and recognition is usually a necessary early step to achieve this goal. In spite of significant research achievements, this issue still remains a challenge when real-life scenarios are considered. In this article, we present a vision system for assistive robots that is able to detect and recognize objects from a visual input in ordinary environments in real time. The system computes color, motion, and shape cues, combining them in a probabilistic manner to accurately achieve object detection and recognition, taking some inspiration from vision science. In addition, with the purpose of processing the input visual data in real time, a graphical processing unit (GPU) has been employed. The presented approach has been implemented and evaluated on a humanoid robot torso located at realistic scenarios. For further experimental validation, a public image repository for object recognition has been used, allowing a quantitative comparison with respect to other state-of-the-art techniques when realworld scenes are considered. Finally, a temporal analysis of the performance is provided with respect to image resolution and the number of target objects in the scene.

...read moreread less

Proceedings Article•DOI•

A face recognition method based on LBP feature for CNN

[...]

Hongshuai Zhang¹, Zhiyi Qu¹, Liping Yuan¹, Gang Li•Institutions (1)

Lanzhou University¹

25 Mar 2017

TL;DR: Convolutional Neural Networks is one of the most representative network structures in deep learning technology, and it has achieved great success in the field of image processing and recognition.

...read moreread less

Abstract: Face recognition is a kind of biometrics which based on the facial feature information of human. And face recognition has wide application value in computer information security, medical treatment, security monitoring, human-computer interaction and finance. Face feature extraction is the key of face recognition technology, and it is related to the selection and recognition of face recognition algorithm. Local Binary Pattern is a texture description method that describes the local texture feature of an image in a gray-scale range. In recent years, many researchers have successfully applied it to facial feature description and recognition in face recognition, and achieved remarkable results. Convolutional Neural Networks is one of the most representative network structures in deep learning technology, and it has achieved great success in the field of image processing and recognition.

...read moreread less

Journal Article•DOI•

Invariant object recognition is a personalized selection of invariant features in humans, not simply explained by hierarchical feed-forward vision models.

[...]

Hamid Karimi-Rouzbahani¹, Nasour Bagheri¹, Reza Ebrahimpour¹•Institutions (1)

Shahid Rajaee Teacher Training University¹

31 Oct 2017-Scientific Reports

TL;DR: It is observed that humans relied on specific (diagnostic) object regions for accurate recognition which remained relatively consistent across variations; but feed-forward feature-extraction models selected view-specific (non-invariant) features across variations, suggesting that models can develop different strategies, but reach human-level recognition performance.

...read moreread less

Abstract: One key ability of human brain is invariant object recognition, which refers to rapid and accurate recognition of objects in the presence of variations such as size, rotation and position. Despite decades of research into the topic, it remains unknown how the brain constructs invariant representations of objects. Providing brain-plausible object representations and reaching human-level accuracy in recognition, hierarchical models of human vision have suggested that, human brain implements similar feed-forward operations to obtain invariant representations. However, conducting two psychophysical object recognition experiments on humans with systematically controlled variations of objects, we observed that humans relied on specific (diagnostic) object regions for accurate recognition which remained relatively consistent (invariant) across variations; but feed-forward feature-extraction models selected view-specific (non-invariant) features across variations. This suggests that models can develop different strategies, but reach human-level recognition performance. Moreover, human individuals largely disagreed on their diagnostic features and flexibly shifted their feature extraction strategy from view-invariant to view-specific when objects became more similar. This implies that, even in rapid object recognition, rather than a set of feed-forward mechanisms which extract diagnostic features from objects in a hard-wired fashion, the bottom-up visual pathways receive, through top-down connections, task-related information possibly processed in prefrontal cortex.

...read moreread less

Journal Article•DOI•

A review and an approach for object detection in images

[...]

Kartik U. Sharma¹, Nileshsingh V. Thakur¹•Institutions (1)

College of Engineering and Management, Kolaghat¹

01 Jan 2017-International Journal of Computational Vision and Robotics

TL;DR: An object detection system finds objects of the real world present either in a digital image or a video, where the object can belong to any class of objects namely humans, cars, etc.

...read moreread less

Abstract: An object detection system finds objects of the real world present either in a digital image or a video, where the object can belong to any class of objects namely humans, cars, etc. In order to detect an object in an image or a video the system needs to have a few components in order to complete the task of detecting an object, they are a model database, a feature detector, a hypothesiser and a hypothesiser verifier. This paper presents a review of the various techniques that are used to detect an object, localise an object, categorise an object, extract features, appearance information, and many more, in images and videos. The comments are drawn based on the studied literature and key issues are also identified relevant to the object detection. Information about the source codes and online datasets is provided to facilitate the new researcher in object detection area. An idea about the possible solution for the multi class object detection is also presented. This paper is suitable for the researchers who are the beginners in this domain.

...read moreread less

Proceedings Article•DOI•

Deep Co-occurrence Feature Learning for Visual Object Recognition

[...]

Ya-Fang Shih¹, Yang-Ming Yeh¹, Yen-Yu Lin¹, Ming-Fang Weng², Yi-Chang Lu³, Yung-Yu Chuang³ - Show less +2 more•Institutions (3)

Center for Information Technology¹, Institute for Information Industry², National Taiwan University³

06 Nov 2017

TL;DR: A new network layer is introduced that can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts, and is end-to-end trainable.

...read moreread less

Abstract: This paper addresses three issues in integrating part-based representations into convolutional neural networks (CNNs) for object recognition. First, most part-based models rely on a few pre-specified object parts. However, the optimal object parts for recognition often vary from category to category. Second, acquiring training data with part-level annotation is labor-intensive. Third, modeling spatial relationships between parts in CNNs often involves an exhaustive search of part templates over multiple network streams. We tackle the three issues by introducing a new network layer, called co-occurrence layer. It can extend a convolutional layer to encode the co-occurrence between the visual parts detected by the numerous neurons, instead of a few pre-specified parts. To this end, the feature maps serve as both filters and images, and mutual correlation filtering is conducted between them. The co-occurrence layer is end-to-end trainable. The resultant co-occurrence features are rotation-and translation-invariant, and are robust to object deformation. By applying this new layer to the VGG-16 and ResNet-152, we achieve the recognition rates of 83.6% and 85.8% on the Caltech-UCSD bird benchmark, respectively. The source code is available at https://github.com/yafangshih/Deep-COOC.

...read moreread less

Proceedings Article•DOI•

Deep Affordance-Grounded Sensorimotor Object Recognition

[...]

Spyridon Thermos¹, Georgios Th. Papadopoulos², Petros Daras², Gerasimos Potamianos¹•Institutions (2)

University of Thessaly¹, Information Technology Institute²

21 Jul 2017

TL;DR: The deep learning paradigm is introduced to the problem for the first time, developing a number of novel neuro-biologically and neuro-physiologically inspired architectures that utilize state-of-the-art neural networks for fusing the available information sources in multiple ways.

...read moreread less

Abstract: It is well-established by cognitive neuroscience that human perception of objects constitutes a complex process, where object appearance information is combined with evidence about the so-called object affordances, namely the types of actions that humans typically perform when interacting with them. This fact has recently motivated the sensorimotor approach to the challenging task of automatic object recognition, where both information sources are fused to improve robustness. In this work, the aforementioned paradigm is adopted, surpassing current limitations of sensorimotor object recognition research. Specifically, the deep learning paradigm is introduced to the problem for the first time, developing a number of novel neuro-biologically and neuro-physiologically inspired architectures that utilize state-of-the-art neural networks for fusing the available information sources in multiple ways. The proposed methods are evaluated using a large RGB-D corpus, which is specifically collected for the task of sensorimotor object recognition and is made publicly available. Experimental results demonstrate the utility of affordance information to object recognition, achieving an up to 29% relative error reduction by its inclusion.

...read moreread less

Book Chapter•DOI•

Analysis of Different Feature Description Algorithm in object Recognition

[...]

Sirshendu Hore, Sankhadeep Chatterjee¹, Shouvik Chakraborty², Rahul Kumar Shaw•Institutions (2)

University of Calcutta¹, Kalyani Government Engineering College²

01 Jan 2017

TL;DR: The proposed work judges their performance in different circumstances such as rotational effect scaling effect, illumination effect and blurring effect, and investigates the speed of each algorithm in different situations.

...read moreread less

Abstract: Object recognition can be done based on local feature description algorithm or through global feature description algorithm. Both types of these descriptors have the efficiency in recognizing an object quickly and accurately. The proposed work judges their performance in different circumstances such as rotational effect scaling effect, illumination effect and blurring effect. Authors also investigate the speed of each algorithm in different situations. The experimental result shows that each one has some advantages as well as some drawbacks. SIFT (Scale Invariant Feature Transformation) and SURF (Speeded Up Robust Features) performs relatively better under scale and rotation change. MSER (Maximally stable extremal regions) performs better under scale change, MinEigen in affine change and illumination change while FAST (Feature from Accelerated segment test) and SURF consume less time.

...read moreread less

Journal Article•DOI•

Spatio-Temporal Closed-Loop Object Detection

[...]

Leonardo Galteri¹, Lorenzo Seidenari¹, Marco Bertini¹, Alberto Del Bimbo¹•Institutions (1)

University of Florence¹

01 Mar 2017-IEEE Transactions on Image Processing

TL;DR: This work proposes to connect, in a closed-loop, detectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences and obtains three to four points of improvement in mAP and a detection time that is lower than Faster Regions with CNN features (R-CNN), which is the fastest Convolutional Neural Network (CNN) based generic object detector known at the moment.

...read moreread less

Abstract: Object detection is one of the most important tasks of computer vision. It is usually performed by evaluating a subset of the possible locations of an image, that are more likely to contain the object of interest. Exhaustive approaches have now been superseded by object proposal methods. The interplay of detectors and proposal algorithms has not been fully analyzed and exploited up to now, although this is a very relevant problem for object detection in video sequences. We propose to connect, in a closed-loop, detectors and object proposal generator functions exploiting the ordered and continuous nature of video sequences. Different from tracking we only require a previous frame to improve both proposal and detection: no prediction based on local motion is performed, thus avoiding tracking errors. We obtain three to four points of improvement in mAP and a detection time that is lower than Faster Regions with CNN features (R-CNN), which is the fastest Convolutional Neural Network (CNN) based generic object detector known at the moment.

...read moreread less

Posted Content•

Few-shot Object Detection

[...]

Xuanyi Dong, Liang Zheng, Fan Ma, Yi Yang, Deyu Meng - Show less +1 more

26 Jun 2017

TL;DR: This paper studies object detection using a large pool of unlabeled images and only a few labeled images per category, named “few-shot object detection”, and embeds multiple detection models in this framework, which has proven to outperform the single model baseline and the model ensemble method.

...read moreread less

Abstract: In this paper, we study object detection using a large pool of unlabeled images and only a few labeled images per category, named “few-shot object detection”. The key challenge consists in generating trustworthy training samples as many as possible from the pool. Using few training examples as seeds, our method iterates between model training and high-confidence sample selection. In training, easy samples are generated first and, then the poorly initialized model undergoes improvement. As the model becomes more discriminative, challenging but reliable samples are selected. After that, another round of model improvement takes place. To further improve the precision and recall of the generated training samples, we embed multiple detection models in our framework, which has proven to outperform the single model baseline and the model ensemble method. Experiments on PASCAL VOC’07 and ILSVRC’13 indicate that by using as few as three or four samples selected for each category, our method produces very competitive results when compared to the state-of-the-art weakly-supervised approaches using a large number of image-level labels.

...read moreread less

Proceedings Article•DOI•

Telecom Inventory Management via Object Recognition and Localisation on Google Street View Images

[...]

Ramya Hebbalaguppe, Gaurav Garg, Ehtesham Hassan, Hiranmay Ghosh, Ankit Verma - Show less +1 more

01 Mar 2017

TL;DR: A novel method to update assets for telecommunication infrastructure using google street view (GSV) images using HOG descriptors with SVM, Deformable parts model (DPM), and Deep learning using faster RCNNs is presented.

...read moreread less

Abstract: We present a novel method to update assets for telecommunication infrastructure using google street view (GSV) images. The problem is formulated as a object recognition task, followed by use of triangulation to estimate the object coordinates from sensor plane coordinates, To this end, we have explored different state-of-the-art object recognition techniques both from feature engineering and using deep learning namely HOG descriptors with SVM, Deformable parts model (DPM), and Deep learning (DL) using faster RCNNs. While HOG+SVM has proved to be robust human detector, DPM which is based on probabilistic graphical models and DL which is a non-linear classifier have proved their versatility in different types of object recognition problems. Asset recognition from the street view images however pose unique challenge as they could be installed on the ground in various poses, orientations and with occlusions, objects camouflaged in the background and in some cases inter class variation is small. We present comparative performance of these techniques for specific use-case involving telecom equipment for highest precision and recall. The blocks of proposed pipeline are detailed and compared to traditional inventory management methods.

...read moreread less

Proceedings Article•DOI•

Face recognition in real-world images

[...]

Xavier Fontaine¹, Radhakrishna Achanta¹, Sabine Süsstrunk¹•Institutions (1)

École Polytechnique Fédérale de Lausanne¹

01 Apr 2017

TL;DR: This paper presents a method for face recognition adapted to real-world conditions that can be trained using very few training examples and is computationally efficient and significantly outperform state-of-the-art methods.

...read moreread less

Abstract: Face recognition systems are designed to handle well-aligned images captured under controlled situations. However real-world images present varying orientations, expressions, and illumination conditions. Traditional face recognition algorithms perform poorly on such images. In this paper we present a method for face recognition adapted to real-world conditions that can be trained using very few training examples and is computationally efficient. Our method consists of performing a novel alignment process followed by classification using sparse representation techniques. We present our recognition rates on a difficult dataset that represents real-world faces where we significantly outperform state-of-the-art methods.

...read moreread less

Proceedings Article•DOI•

Visual relationship detection with object spatial distribution

[...]

Yaohui Zhu¹, Shuqiang Jiang¹, Xiangyang Li¹•Institutions (1)

Chinese Academy of Sciences¹

10 Jul 2017

TL;DR: This work proposes a method to integrate spatial distribution of object to facilitate visual relation detection, and establishes a modeling method to make these three aspects working together to facilitateVisual relationship detection.

...read moreread less

Abstract: Recently, object recognition techniques have been rapidly developed. Most of existing object recognition focused on recognizing several independent concepts. The relationship of objects is also an important problem, which shows in-depth semantic information of images. In this work, toward general visual relationship detection, we propose a method to integrate spatial distribution of object to facilitate visual relation detection. Spatial distribution can not only reflect positional relation of object but also describe structural information between objects. Spatial distributions are described with different features such as positional relation, size relation, shape relation, and so on. By combing spatial distribution features with visual and concept features, we establish a modeling method to make these three aspects working together to facilitate visual relationship detection. To evaluate the proposed method, we conduct experiments on two datasets, which are the Stanford VRD dataset, and a newly proposed larger new dataset which contains 15k images. Experimental results demonstrate that our approach is effective.

...read moreread less

Journal Article•DOI•

Hard-wired feed-forward visual mechanisms of the brain compensate for affine variations in object recognition.

[...]

Hamid Karimi-Rouzbahani¹, Nasour Bagheri¹, Reza Ebrahimpour¹•Institutions (1)

Shahid Rajaee Teacher Training University¹

04 May 2017-Neuroscience

TL;DR: Experimental results support the theory that non-affine variations such as pose and lighting may need top-down feedback information from higher areas such as IT and PFC for precise object recognition.

...read moreread less

Journal Article•DOI•

Robotic sensing and object recognition from thermal-mapped point clouds

[...]

Pileun Kim¹, Jingdao Chen¹, Yong K. Cho¹•Institutions (1)

Georgia Institute of Technology¹

10 May 2017

TL;DR: This paper introduces an innovative method for generating thermal-mapped point clouds of a robot’s work environment and performing automatic object recognition with the aid of thermal data fused to 3D point clouds.

...read moreread less

Abstract: Many of the civil structures are more than half way through or nearing their intended service life; frequently assessing and maintaining structural integrity is a top maintenance priority. Robotic inspection technologies using ground and aerial robots with 3D scanning and imaging capabilities have the potential to improve safety and efficiency of infrastructure management. To provide more valuable information to inspectors and agency decision makers, automatic environment sensing and semantic information extraction are fundamental issues in this field. This paper introduces an innovative method for generating thermal-mapped point clouds of a robot’s work environment and performing automatic object recognition with the aid of thermal data fused to 3D point clouds. The laser scanned point cloud and thermal data were collected using a custom-designed mobile robot. The multimodal data was combined with a data fusion process based on texture mapping. The automatic object recognition was performed by two processes: segmentation with thermal data and classification with scanned geometric features. The proposed method was validated with the scan data collected in an entire building floor. Experimental results show that the thermal integrated object recognition approach achieved better performance than a geometry only-based approach, with an average recognition accuracy of 93%, precision of 83%, and recall rate of 86% for objects in the tested environment including humans, display monitors and light fixtures.

...read moreread less

Proceedings Article•DOI•

Face recognition rate using different classifier methods based on PCA

[...]

Eyad I. Abbas¹, Mohammed E. Safi¹, Khalida S. Rijab¹•Institutions (1)

University of Technology, Iraq¹

26 Apr 2017

TL;DR: This paper describes the different classifier methods with minimum means of clusters to achieve face recognition rate of humans from the feature extracted of training face image data for many sets of images as a data base.

...read moreread less

Abstract: This paper describes the different classifier methods with minimum means of clusters to achieve face recognition rate of humans from the feature extracted of training face image data for many sets of images as a data base. Principal Component Analysis (PCA) is a robust method used as feature extraction techniques for face recognition but the recognition decreases with the variation of person's actions. The features extracted for face images are light insensitive, individual, hidden, and activity effective to biometric recognition. The face recognition treats as two dimensions recognition problems, the fact is to take the advantage of these human faces are straight pose in general may be represented as a small set of two dimension characteristics view. The training and testing face images are selected from Research Laboratory for Olivetti and Oracle (ORL) face database, which have minimum pose variation. Three classifier methods are used to obtain the distance of recognition. These classifiers are: the Euclidian distance method, the Squared Euclidian Distance method, and the City-Block Distance method. By Clustering the difference of training image with images set for each person and determined the mean to it, the minimum mean is representing the recognition of the person. The cluster method with Squared Euclidian Distance method produces higher a recognition rate 100% near the Euclidian Distance method which gives a human face recognition rate 98% higher than the City-Block Distance method which gives a recognition rate 95%.

...read moreread less

Journal Article•DOI•

In-hand recognition and manipulation of elastic objects using a servo-tactile control strategy

[...]

A. Delgado¹, Carlos A. Jara¹, Fernando Torres¹•Institutions (1)

University of Alicante¹

01 Dec 2017-Robotics and Computer-integrated Manufacturing

TL;DR: A new control strategy, based on a minimal spring model of the objects, is presented and used for the control of the robot hand and an adaptable tactile-servo control scheme is presented that can be used in in-hand manipulation tasks of deformable objects.

...read moreread less

Abstract: Grasping and manipulating objects with robotic hands depend largely on the features of the object to be used. Especially, features such as softness and deformability are crucial to take into account during the manipulation tasks. Indeed, positions of the fingers and forces to be applied by the robot hand when manipulating an object must be adapted to the caused deformation. For unknown objects, a previous recognition stage is usually needed to get the features of the object, and the manipulation strategies must be adapted depending on that recognition stage. To obtain a precise control in the manipulation task, a complex object model is usually needed and performed, for example using the Finite Element Method. However, these models require a complete discretization of the object and they are time-consuming for the performance of the manipulation tasks. For that reason, in this paper a new control strategy, based on a minimal spring model of the objects, is presented and used for the control of the robot hand. This paper also presents an adaptable tactile-servo control scheme that can be used in in-hand manipulation tasks of deformable objects. Tactile control is based on achieving and maintaining a force value at the contact points which changes according to the object softness, a feature estimated in an initial recognition stage.

...read moreread less

Book Chapter•DOI•

Bridging Between Computer and Robot Vision Through Data Augmentation: A Case Study on Object Recognition

[...]

Antonio D'Innocente¹, Fabio Maria Carlucci¹, Mirco Colosi¹, Barbara Caputo¹•Institutions (1)

Sapienza University of Rome¹

10 Jul 2017

TL;DR: In this article, the authors propose a data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system, which can be used with any convolutional deep architecture.

...read moreread less

Abstract: Despite the impressive progress brought by deep network in visual object recognition, robot vision is still far from being a solved problem. The most successful convolutional architectures are developed starting from ImageNet, a large scale collection of images of object categories downloaded from the Web. This kind of images is very different from the situated and embodied visual experience of robots deployed in unconstrained settings. To reduce the gap between these two visual experiences, this paper proposes a simple yet effective data augmentation layer that zooms on the object of interest and simulates the object detection outcome of a robot vision system. The layer, that can be used with any convolutional deep architecture, brings to an increase in object recognition performance of up to 7%, in experiments performed over three different benchmark databases. An implementation of our robot data augmentation layer has been made publicly available.

...read moreread less

Proceedings Article•DOI•

Optical character recognition using KNN on custom image dataset

[...]

Tapan Kumar Hazra¹, Dhirendra Pratap Singh¹, Nikunj Daga¹•Institutions (1)

Information Technology Institute¹

01 Aug 2017

TL;DR: The aim is to develop an efficient method which uses a custom image to train the classifier, which extract distinct features from the input image for classifying its contents as characters specifically letters and digits.

...read moreread less

Abstract: The aim is to develop an efficient method which uses a custom image to train the classifier. This OCR extract distinct features from the input image for classifying its contents as characters specifically letters and digits. Input to the system is digital images containing the patterns to be classified. The analysis and recognition of the patterns in images are becoming more complex, yet easy with advances in technological knowledge. Therefore it is proposed to develop sophisticated strategies of pattern analysis to cope with these difficulties. The present work involves application of pattern recognition using KNN to recognize handwritten or printed text.

...read moreread less