scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2010"


Book ChapterDOI
05 Sep 2010
TL;DR: This paper introduces a method that adapts object models acquired in a particular visual domain to new imaging conditions by learning a transformation that minimizes the effect of domain-induced changes in the feature distribution.
Abstract: Domain adaptation is an important emerging topic in computer vision. In this paper, we present one of the first studies of domain shift in the context of object recognition. We introduce a method that adapts object models acquired in a particular visual domain to new imaging conditions by learning a transformation that minimizes the effect of domain-induced changes in the feature distribution. The transformation is learned in a supervised manner and can be applied to categories for which there are no labeled examples in the new domain. While we focus our evaluation on object recognition tasks, the transform-based adaptation technique we develop is general and could be applied to nonimage data. Another contribution is a new multi-domain object database, freely available for download. We experimentally demonstrate the ability of our method to improve recognition on categories with few or no target domain labels and moderate to large changes in the imaging conditions.

2,624 citations


Book ChapterDOI
05 Sep 2010
TL;DR: It is argued that the appearance of words in the wild spans this range of difficulties and a new word recognition approach based on state-of-the-art methods from generic object recognition is proposed, in which object categories are considered to be the words themselves.
Abstract: We present a method for spotting words in the wild, i.e., in real images taken in unconstrained environments. Text found in the wild has a surprising range of difficulty. At one end of the spectrum, Optical Character Recognition (OCR) applied to scanned pages of well formatted printed text is one of the most successful applications of computer vision to date. At the other extreme lie visual CAPTCHAs - text that is constructed explicitly to fool computer vision algorithms. Both tasks involve recognizing text, yet one is nearly solved while the other remains extremely challenging. In this work, we argue that the appearance of words in the wild spans this range of difficulties and propose a new word recognition approach based on state-of-the-art methods from generic object recognition, in which we consider object categories to be the words themselves. We compare performance of leading OCR engines - one open source and one proprietary - with our new approach on the ICDAR Robust Reading data set and a new word spotting data set we introduce in this paper: the Street View Text data set. We show improvements of up to 16% on the data sets, demonstrating the feasibility of a new approach to a seemingly old problem.

503 citations


Book ChapterDOI
05 Sep 2010
TL;DR: The results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.
Abstract: We present an interactive, hybrid human-computer method for object classification. The method applies to classes of objects that are recognizable by people with appropriate expertise (e.g., animal species or airplane model), but not (in general) by people without such expertise. It can be seen as a visual version of the 20 questions game, where questions based on simple visual attributes are posed interactively. The goal is to identify the true class while minimizing the number of questions asked, using the visual content of the image. We introduce a general framework for incorporating almost any off-the-shelf multi-class object recognition algorithm into the visual 20 questions game, and provide methodologies to account for imperfect user responses and unreliable computer vision algorithms. We evaluate our methods on Birds-200, a difficult dataset of 200 tightly-related bird species, and on the Animals With Attributes dataset. Our results demonstrate that incorporating user input drives up recognition accuracy to levels that are good enough for practical applications, while at the same time, computer vision reduces the amount of human interaction required.

492 citations


Journal ArticleDOI
TL;DR: This work addresses the problem of incorporating different types of contextual information for robust object categorization in computer vision by considering the most common levels of extraction of context and the different levels of contextual interactions.

383 citations


Posted Content
TL;DR: This work proposes a simple and efficient algorithm to learn basis functions, which provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks.
Abstract: Adaptive sparse coding methods learn a possibly overcomplete set of basis functions, such that natural image patches can be reconstructed by linearly combining a small subset of these bases. The applicability of these methods to visual object recognition tasks has been limited because of the prohibitive cost of the optimization algorithms required to compute the sparse representation. In this work we propose a simple and efficient algorithm to learn basis functions. After training, this model also provides a fast and smooth approximator to the optimal representation, achieving even better accuracy than exact sparse coding algorithms on visual object recognition tasks.

266 citations


Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work develops a bottom-up motion-based approach to robustly segment out foreground objects in egocentric video and shows that it greatly improves object recognition accuracy.
Abstract: Identifying handled objects, i.e. objects being manipulated by a user, is essential for recognizing the person's activities. An egocentric camera as worn on the body enjoys many advantages such as having a natural first-person view and not needing to instrument the environment. It is also a challenging setting, where background clutter is known to be a major source of problems and is difficult to handle with the camera constantly and arbitrarily moving. In this work we develop a bottom-up motion-based approach to robustly segment out foreground objects in egocentric video and show that it greatly improves object recognition accuracy. Our key insight is that egocentric video of object manipulation is a special domain and many domain-specific cues can readily help. We compute dense optical flow and fit it into multiple affine layers. We then use a max-margin classifier to combine motion with empirical knowledge of object location and background movement as well as temporal cues of support region and color appearance. We evaluate our segmentation algorithm on the large Intel Egocentric Object Recognition dataset with 42 objects and 100K frames. We show that, when combined with temporal integration, figure-ground segmentation improves the accuracy of a SIFT-based recognition system from 33% to 60%, and that of a latent-HOG system from 64% to 86%.

191 citations


Dissertation
01 Jan 2010
TL;DR: This research investigates the combination of domain adaptation, dictionary learning, object recognition, activity recognition, and shape representation in machine learning to solve the challenge of sparse representation in signal/Image processing.
Abstract: Research Interests Security and privacy: Active authentication, biometrics template protection, biometrics recognition. Computer vision: Domain adaptation, dictionary learning, object recognition, activity recognition, shape representation. Machine learning: Dimensionality reduction, clustering, kernel methods, weakly-supervised learning. Signal/Image processing: Sparse representation, compressive sampling, synthetic aperture radar imaging, millimeter wave imaging.

160 citations


Patent
13 Oct 2010
TL;DR: In this article, a system and method for controlling a device based on computer vision is described, which is based on receiving a sequence of images of a field of view; detecting movement of at least one object in the images; applying a shape recognition algorithm on the at least moving object; confirming that the object is a user hand by combining information from at least two images of the object; and tracking the object to control the device.
Abstract: A system and method are provided for controlling a device based on computer vision. Embodiments of the system and method of the invention are based on receiving a sequence of images of a field of view; detecting movement of at least one object in the images; applying a shape recognition algorithm on the at least one moving object; confirming that the object is a user hand by combining information from at least two images of the object; and tracking the object to control the device.

155 citations


Journal ArticleDOI
TL;DR: In this paper, a probabilistic framework for encoding the relationships between context and object properties is proposed, which can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places.
Abstract: Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places. Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.

147 citations


Journal ArticleDOI
TL;DR: This work builds a probabilistic model to transfer the labels from the retrieval set to the input image, and demonstrates the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database.
Abstract: Current object recognition systems can only recognize a limited number of object categories; scaling up to many categories is the next challenge. We seek to build a system to recognize and localize many different object categories in complex scenes. We achieve this through a simple approach: by matching the input image, in an appropriate representation, to images in a large training set of labeled images. Due to regularities in object identities across similar scenes, the retrieved matches provide hypotheses for object identities and locations. We build a probabilistic model to transfer the labels from the retrieval set to the input image. We demonstrate the effectiveness of this approach and study algorithm component contributions using held-out test sets from the LabelMe database.

119 citations


Journal ArticleDOI
TL;DR: Interaction between image segmentation (using different edge detection methods) and object recognition are discussed and Expectation-Maximization (EM) algorithm, OSTU and Genetic algorithms were used to demonstrate the synergy between the segmented images andobject recognition.
Abstract: Image segmentation is to partition an image into meaningful regions with respect to a particular application. Object recognition is the task of finding a given object in an image or video sequence. In this paper, interaction between image segmentation (using different edge detection methods) and object recognition are discussed. Edge detection methods such as Sobel, Prewitt, Roberts, Canny, Laplacian of Guassian(LoG) are used for segmenting the image. Expectation-Maximization (EM) algorithm, OSTU and Genetic algorithms were used to demonstrate the synergy between the segmented images and object recognition.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: A framework that retains ambiguity in feature matching to increase the performance of 3D object recognition systems is presented and vector quantize and match model features in a hierarchical manner to preserve ambiguity during matching.
Abstract: We present a framework that retains ambiguity in feature matching to increase the performance of 3D object recognition systems. Whereas previous systems removed ambiguous correspondences during matching, we show that ambiguity should be resolved during hypothesis testing and not at the matching phase. To preserve ambiguity during matching, we vector quantize and match model features in a hierarchical manner. This matching technique allows our system to be more robust to the distribution of model descriptors in feature space. We also show that we can address recognition under arbitrary viewpoint by using our framework to facilitate matching of additional features extracted from affine transformed model images. The evaluation of our algorithms in 3D object recognition is demonstrated on a difficult dataset of 620 images.

Patent
07 Dec 2010
TL;DR: In this article, the authors describe a set of Embodiments that facilitate or enhance the implementation of image recognition processes which can perform recognition on images to identify objects and/or faces by class or by people.
Abstract: Embodiments described herein facilitate or enhance the implementation of image recognition processes which can perform recognition on images to identify objects and/or faces by class or by people.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work has the unique ability to jointly reduce false alarm and false negative object detection rate and recover object location and supporting planes within the 3D camera reference system and infer camera parameters from just one single uncalibrated image.
Abstract: Detecting objects in complex scenes while recovering the scene layout is a critical functionality in many vision-based applications. Inspired by the work of [18], we advocate the importance of geometric contextual reasoning for object recognition. We start from the intuition that objects' location and pose in the 3D space are not arbitrarily distributed but rather constrained by the fact that objects must lie on one or multiple supporting surfaces. We model such supporting surfaces by means of hidden parameters (i.e. not explicitly observed) and formulate the problem of joint scene reconstruction and object recognition as the one of finding the set of parameters that maximizes the joint probability of having a number of detected objects on K supporting planes given the observations. As a key ingredient for solving this optimization problem, we have demonstrated a novel relationship between object location and pose in the image, and the scene layout parameters (i.e. normal of one or more supporting planes in 3D and camera pose, location and focal length). Using the probabilistic formulation and the above relationship our method has the unique ability to jointly: i) reduce false alarm and false negative object detection rate; ii) recover object location and supporting planes within the 3D camera reference system; iii) infer camera parameters (view point and the focal length) from just one single uncalibrated image. Quantitative and qualitative experimental evaluation on a number of datasets (a novel in-house dataset and label-me[28] on car and pedestrian) demonstrates our theoretical claims.

Book ChapterDOI
21 Jun 2010
TL;DR: A novel face recognition technique that computes the SIFT descriptors at predefined (fixed) locations learned during the training stage is presented, which renders the approach more robust to illumination changes than related approaches from the literature.
Abstract: The Scale Invariant Feature Transform (SIFT) is an algorithm used to detect and describe scale-, translation- and rotation-invariant local features in images The original SIFT algorithm has been successfully applied in general object detection and recognition tasks, panorama stitching and others One of its more recent uses also includes face recognition, where it was shown to deliver encouraging results SIFT-based face recognition techniques found in the literature rely heavily on the so-called keypoint detector, which locates interest points in the given image that are ultimately used to compute the SIFT descriptors While these descriptors are known to be among others (partially) invariant to illumination changes, the keypoint detector is not Since varying illumination is one of the main issues affecting the performance of face recognition systems, the keypoint detector represents the main source of errors in face recognition systems relying on SIFT features To overcome the presented shortcoming of SIFT-based methods, we present in this paper a novel face recognition technique that computes the SIFT descriptors at predefined (fixed) locations learned during the training stage By doing so, it eliminates the need for keypoint detection on the test images and renders our approach more robust to illumination changes than related approaches from the literature Experiments, performed on the Extended Yale B face database, show that the proposed technique compares favorably with several popular techniques from the literature in terms of performance

Journal ArticleDOI
TL;DR: A robust image processing methodology to effectively extract the objects of interest from construction-site digital images makes use of advanced imaging algorithms and a three-dimensional computer aided design perspective view to increase the accuracy of the object recognition.
Abstract: Construction-site images that are now easily obtained from digital cameras have the potential to automatically provide the project status information. For example, once construction objects such as concrete columns are accurately identified and counted, the current level of project progress in the column installation activity can easily be measured. However, in order to identify and count the number of concrete columns installed at a particular point of time, a robust object recognition methodology is required. Without the successful recognition and extraction of the construction object of interest, it is almost impossible to understand the current level of project progress. This paper presents a robust image processing methodology to effectively extract the objects of interest from construction-site digital images. The proposed methodology makes use of advanced imaging algorithms and a three-dimensional computer aided design perspective view to increase the accuracy of the object recognition. Tests show that the methodology is promising and expected to provide a solid base for the successful, automatic acquisition of project information.

BookDOI
19 Nov 2010
TL;DR: In this article, the application of graph theory to low-level processing of digital images, presents graph-theoretic learning algorithms for high-level computer vision and pattern recognition applications, and provides detailed descriptions of several applications of graph-based methods to real-world pattern recognition tasks.
Abstract: This book presents novel graph-theoretic methods for complex computer vision and pattern recognition tasks. It presents the application of graph theory to low-level processing of digital images, presents graph-theoretic learning algorithms for high-level computer vision and pattern recognition applications, and provides detailed descriptions of several applications of graph-based methods to real-world pattern recognition tasks.

Proceedings ArticleDOI
03 May 2010
TL;DR: This paper presents a systems for autonomous acquisition of visual object representations, which endows a humanoid robot with the ability to enrich its internal object representation and allows the realization of complex visual tasks.
Abstract: The autonomous acquisition of object representations which allow recognition, localization and grasping of objects in the environment is a challenging task, which has shown to be difficult. In this paper, we present a systems for autonomous acquisition of visual object representations, which endows a humanoid robot with the ability to enrich its internal object representation and allows the realization of complex visual tasks. More precisely, we present techniques for segmentation and modeling of objects held in the five-fingered robot hand. Multiple object views are generated by rotating the held objects in the robot's field of view. The acquired object representations are evaluated in the context of visual search and object recognition tasks in cluttered environments. Experimental results show successful implementation of the complete cycle from object exploration to object recognition on a humanoid robot.

Book ChapterDOI
Caifeng Shan1
01 Jan 2010
TL;DR: This chapter reviews existing research on face recognition and retrieval in video, and the relevant techniques are comprehensively surveyed and discussed.
Abstract: Automatic face recognition has long been established as one of the most active research areas in computer vision. Face recognition in unconstrained environments remains challenging for most practical applications. In contrast to traditional still-image based approaches, recently the research focus has shifted towards videobased approaches. Video data provides rich and redundant information, which can be exploited to resolve the inherent ambiguities of image-based recognition like sensitivity to low resolution, pose variations and occlusion, leading to more accurate and robust recognition. Face recognition has also been considered in the content-based video retrieval setup, for example, character-based video search. In this chapter, we review existing research on face recognition and retrieval in video. The relevant techniques are comprehensively surveyed and discussed.

Patent
20 Aug 2010
TL;DR: In this article, a system for translating user motion into multiple object responses of an on-screen object based on user interaction of an application executing on a computing device is provided, where user motion data is received from a capture device from one or more users.
Abstract: A system for translating user motion into multiple object responses of an on-screen object based on user interaction of an application executing on a computing device is provided. User motion data is received from a capture device from one or more users. The user motion data corresponds to user interaction with an on-screen object presented in the application. The on-screen object corresponds to an object other than an on-screen representation of a user that is displayed by the computing device. The user motion data is automatically translated into multiple object responses of the on-screen object. The multiple object responses of the on-screen object are simultaneously displayed to the users.

Proceedings ArticleDOI
26 Jul 2010
TL;DR: This paper presents a public multiple-view object recognition database, called the Berkeley Multiview Wireless (BMW), and proposes a fast multiple- view classification method to jointly classify the object observed by the cameras.
Abstract: We propose an efficient distributed object recognition system for sensing, compression, and recognition of 3-D objects and landmarks using a network of wireless smart cameras. The foundation is based on a recent work that shows the representation of scale-invariant image features exhibit certain degree of sparsity: If a common object is observed by multiple cameras from different vantage points, the corresponding features can be efficiently compressed in a distributed fashion, and the joint signals can be simultaneously decoded based on distributed compressive sensing theory. In this paper, we first present a public multiple-view object recognition database, called the Berkeley Multiview Wireless (BMW) database. It captures the 3-D appearance of 20 landmark buildings sampled by five low-power, low-resolution camera sensors from multiple vantage points. Then we review and benchmark state-of-the-art methods to extract image features and compress their sparse representations. Finally, we propose a fast multiple-view recognition method to jointly classify the object observed by the cameras. To this end, a distributed object recognition system is implemented on the Berkeley CITRIC smart camera platform. The system is capable of adapting to different network configurations and the wireless bandwidth. The multiple-view classification improves the performance of object recognition upon the traditional per-view classification algorithms.

Book ChapterDOI
05 Sep 2010
TL;DR: Experimental results demonstrate, for the first time, the feasibility and effectiveness of a high-level syntactic method in face recognition, showing a new strategy for face representation and recognition.
Abstract: Automatically recognizing human faces with partial occlusions is one of the most challenging problems in face analysis community. This paper presents a novel string-based face recognition approach to address the partial occlusion problem in face recognition. In this approach, a new face representation, Stringface, is constructed to integrate the relational organization of intermediate-level features (line segments) into a high-level global structure (string). The matching of two faces is done by matching two Stringfaces through a string-to-string matching scheme, which is able to efficiently find the most discriminative local parts (substrings) for recognition without making any assumption on the distributions of the deformed facial regions. The proposed approach is compared against the state-of-the-art algorithms using both the AR database and FRGC (Face Recognition Grand Challenge) ver2.0 database. Very encouraging experimental results demonstrate, for the first time, the feasibility and effectiveness of a high-level syntactic method in face recognition, showing a new strategy for face representation and recognition.

Journal ArticleDOI
TL;DR: This work investigates the effects of two kinds of image processing methods, two common shapes of pixels (square and circular) and six resolutions (8x8, 16x16, 24x24, 32x32, 48x48 and 64x64) and shows that the mean recognition accuracy increased with the number of pixels.

Proceedings Article
06 Dec 2010
TL;DR: This work develops an efficient algorithm for multi-label multiple kernel learning (ML-MKL) that combines the worst-case analysis with stochastic approximation and shows that the complexity of the algorithm is O(m1/3√lnm), where m is the number of classes.
Abstract: Recent studies have shown that multiple kernel learning is very effective for object recognition, leading to the popularity of kernel learning in computer vision problems. In this work, we develop an efficient algorithm for multi-label multiple kernel learning (ML-MKL). We assume that all the classes under consideration share the same combination of kernel functions, and the objective is to find the optimal kernel combination that benefits all the classes. Although several algorithms have been developed for ML-MKL, their computational cost is linear in the number of classes, making them unscalable when the number of classes is large, a challenge frequently encountered in visual object recognition. We address this computational challenge by developing a framework for ML-MKL that combines the worst-case analysis with stochastic approximation. Our analysis shows that the complexity of our algorithm is O(m1/3√lnm), where m is the number of classes. Empirical studies with object recognition show that while achieving similar classification accuracy, the proposed method is significantly more efficient than the state-of-the-art algorithms for ML-MKL.

Journal ArticleDOI
TL;DR: This work presents a statistical framework for 3D passive object recognition in presence of noise and suggests that with proper translation of physical characteristics of the imaging system into the information processing algorithms, photon-counting imagery can be used for object classification.
Abstract: Three dimensional (3D) imaging systems have been recently suggested for passive sensing and recognition of objects in photon-starved environments where only a few photons are emitted or reflected from the object. In this paradigm, it is important to make optimal use of limited information carried by photons. We present a statistical framework for 3D passive object recognition in presence of noise. Since in quantum-limited regime, detector dark noise is present, our approach takes into account the effect of noise on information bearing photons. The model is tested when background noise and dark noise sources are present for identifying a target in a 3D scene. It is shown that reliable object recognition is possible in photon-counting domain. The results suggest that with proper translation of physical characteristics of the imaging system into the information processing algorithms, photon-counting imagery can be used for object classification.

Journal ArticleDOI
TL;DR: This paper has considered the performance of about twenty five different subspace algorithms on data taken from four standard face and object databases namely ORL, Yale, FERET and the COIL-20 object database.

Proceedings ArticleDOI
13 Jun 2010
TL;DR: This work details a discriminative approach for optimizing one-shot recognition using micro-sets and presents experiments on the Animals with Attributes and Caltech-101 datasets that demonstrate the benefits of the formulation.
Abstract: For object category recognition to scale beyond a small number of classes, it is important that algorithms be able to learn from a small amount of labeled data per additional class One-shot recognition aims to apply the knowledge gained from a set of categories with plentiful data to categories for which only a single exemplar is available for each As with earlier efforts motivated by transfer learning, we seek an internal representation for the domain that generalizes across classes However, in contrast to existing work, we formulate the problem in a fundamentally new manner by optimizing the internal representation for the one-shot task using the notion of micro-sets A micro-set is a sample of data that contains only a single instance of each category, sampled from the pool of available data, which serves as a mechanism to force the learned representation to explicitly address the variability and noise inherent in the one-shot recognition task We optimize our learned domain features so that they minimize an expected loss over micro-sets drawn from the training set and show that these features generalize effectively to previously unseen categories We detail a discriminative approach for optimizing one-shot recognition using micro-sets and present experiments on the Animals with Attributes and Caltech-101 datasets that demonstrate the benefits of our formulation

Book
02 Aug 2010
TL;DR: This book is written in a tutorial style and is suitable as an introduction into the field of object recognition for interested readers who are not yet experts and avoids extensive usage of mathematics.
Abstract: Object recognition has been an area of extensive research for a long time. During the last decades, a large number of algorithms have been proposed. This is due to the fact that, at a closer look, "object recognition" is an umbrella term for different algorithms designed for a wide variety of applications, where each application has its specific requirements and constraints. This book demonstrates the diversity of applications and highlights some important algorithm classes by presenting representative example algorithms for each class. This book is written in a tutorial style and is therefore suitable as an introduction into the field of object recognition for interested readers who are not yet experts. The presentation of each algorithm focuses on the main idea, which is described in detail, and avoids extensive usage of mathematics. Graphic illustrations of the algorithm flow facilitate understanding. The algorithms presented are classified according to the following categories: global approaches, transformation-search-based methods, geometrical model driven methods, 3D object recognition schemes, flexible contour fitting algorithms and feature-based methods. Typical example algorithms are presented for each of the categories.

Proceedings ArticleDOI
03 Dec 2010
TL;DR: A novel 3D measurement system, which yields both depth and color information in real time, by calibrating a time-of-flight and two CCD cameras, and a robust object recognition using the 3D visual sensor is presented.
Abstract: This paper presents a novel 3D measurement system, which yields both depth and color information in real time, by calibrating a time-of-flight and two CCD cameras. The problem of occlusions is solved by the proposed fast occluded-pixel detection algorithm. Since the system uses two CCD cameras, missing color information of occluded pixels is covered by one another. We also propose a robust object recognition using the 3D visual sensor. Multiple cues, such as color, texture and 3D (depth) information, are integrated in order to recognize various types of objects under varying lighting conditions. We have implemented the system on our autonomous robot and made the robot do recognition tasks (object learning, detection, and recognition) in various environments. The results revealed that the proposed recognition system provides far better performance than the previous system that is based only on color and texture information.

Proceedings ArticleDOI
07 Jul 2010
TL;DR: The object search and recognition scheme proposed in the paper can improve the accuracy rate of object recognition, reduce the impact of light, and have a high recognition rate even the target is occluded partly.
Abstract: A complete program of object search and recognition is proposed in the paper in order to realize the object recognition in the complex indoor environment. We design a kind of new object mark to assist object recognition. The mark is composed of two parts: the inner information representation and the outer logo. The inner information including attribute information and operating information is stored in QR Code. The outer logo includes two concentric colored circles and four orientation regions. The concentric red circles are used to locate the mark from a little far distance, and the orientation religions are helpful to assist robot to operate the target properly. The mark can only be recognized in a close distance, so the RFID technology is used to locate the object in a large scale. The large furniture is tagged with reference tag, and the target is pasted with target tag. When the robot moves around the space, he can read the tags one by one, and can obtain the rough position of target from the time sequence of tags. The object search and recognition scheme proposed in the paper can improve the accuracy rate of object recognition, reduce the impact of light, and have a high recognition rate even the target is occluded partly. The experiments demonstrate the effectiveness and feasibility of the scheme.