scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2012"


Journal ArticleDOI
TL;DR: A hypergraph analysis approach to address the problem of view-based 3-D object retrieval and recognition by avoiding the estimation of the distance between objects by constructing multiple hypergraphs based on their 2-D views.
Abstract: View-based 3-D object retrieval and recognition has become popular in practice, e.g., in computer aided design. It is difficult to precisely estimate the distance between two objects represented by multiple views. Thus, current view-based 3-D object retrieval and recognition methods may not perform well. In this paper, we propose a hypergraph analysis approach to address this problem by avoiding the estimation of the distance between objects. In particular, we construct multiple hypergraphs for a set of 3-D objects based on their 2-D views. In these hypergraphs, each vertex is an object, and each edge is a cluster of views. Therefore, an edge connects multiple vertices. We define the weight of each edge based on the similarities between any two views within the cluster. Retrieval and recognition are performed based on the hypergraphs. Therefore, our method can explore the higher order relationship among objects and does not use the distance between objects. We conduct experiments on the National Taiwan University 3-D model dataset and the ETH 3-D object collection. Experimental results demonstrate the effectiveness of the proposed method by comparing with the state-of-the-art methods.

573 citations


Journal ArticleDOI
TL;DR: A rapidly growing group of people can acquire 3- D data cheaply and in real time, as these sensors are commodity hardware and sold at low cost.
Abstract: With the advent of new-generation depth sensors, the use of three-dimensional (3-D) data is becoming increasingly popular. As these sensors are commodity hardware and sold at low cost, a rapidly growing group of people can acquire 3- D data cheaply and in real time.

368 citations


Journal ArticleDOI
10 Jan 2012-PLOS ONE
TL;DR: A simple method is identified that generally works best for face and object recognition, and two that work well for recognizing textures, which are tested using a modern descriptor-based image recognition framework.
Abstract: In image recognition it is often assumed the method used to convert color images to grayscale has little impact on recognition performance. We compare thirteen different grayscale algorithms with four types of image descriptors and demonstrate that this assumption is wrong: not all color-to-grayscale algorithms work equally well, even when using descriptors that are robust to changes in illumination. These methods are tested using a modern descriptor-based image recognition framework, on face, object, and texture datasets, with relatively few training instances. We identify a simple method that generally works best for face and object recognition, and two that work well for recognizing textures.

278 citations


Journal ArticleDOI
TL;DR: The system described in this article was constructed specifically for the generation of model data for object recognition, localization and manipulation tasks and it allows 2D image and 3D geometric data of everyday objects to be obtained semi-automatically.
Abstract: For the execution of object recognition, localization and manipulation tasks, most algorithms use object models. Most models are derived from, or consist of two-dimensional (2D) images and/or three-dimensional (3D) geometric data. The system described in this article was constructed specifically for the generation of such model data. It allows 2D image and 3D geometric data of everyday objects be obtained semi-automatically. The calibration provided allows 2D data to be related to 3D data. Through the use of high-quality sensors, high-accuracy data is achieved. So far over 100 objects have been digitized using this system and the data has been successfully used in several international research projects. All of the models are freely available on the web via a front-end that allows preview and filtering of the data.

220 citations


Journal ArticleDOI
TL;DR: The paper reviews some domains that appeared as emerging fields in the last years of the 20th century and have been developed later on in the 21st century, such as three-dimensional object recognition, biometric pattern matching, optical security and hybrid optical–digital processors.
Abstract: On the verge of the 50th anniversary of Vander Lugt’s formulation for pattern matching based on matched filtering and optical correlation, we acknowledge the very intense research activity developed in the field of correlation-based pattern recognition during this period of time. The paper reviews some domains that appeared as emerging fields in the last years of the 20th century and have been developed later on in the 21st century. Such is the case of three-dimensional (3D) object recognition, biometric pattern matching, optical security and hybrid optical–digital processors. 3D object recognition is a challenging case of multidimensional image recognition because of its implications in the recognition of real-world objects independent of their perspective. Biometric recognition is essentially pattern recognition for which the personal identification is based on the authentication of a specific physiological characteristic possessed by the subject (e.g. fingerprint, face, iris, retina, and multifactor combinations). Biometric recognition often appears combined with encryption–decryption processes to secure information. The optical implementations of correlation-based pattern recognition processes still rely on the 4f-correlator, the joint transform correlator, or some of their variants. But the many applications developed in the field have been pushing the systems for a continuous improvement of their architectures and algorithms, thus leading towards merged optical–digital solutions.

197 citations


Book ChapterDOI
07 Oct 2012
TL;DR: Peculiar to this approach is the inherent ability to detect significantly occluded objects without increasing the amount of false positives, so that the operating point of the object recognition algorithm can nicely move toward a higher recall without sacrificing precision.
Abstract: We propose a novel approach for verifying model hypotheses in cluttered and heavily occluded 3D scenes. Instead of verifying one hypothesis at a time, as done by most state-of-the-art 3D object recognition methods, we determine object and pose instances according to a global optimization stage based on a cost function which encompasses geometrical cues. Peculiar to our approach is the inherent ability to detect significantly occluded objects without increasing the amount of false positives, so that the operating point of the object recognition algorithm can nicely move toward a higher recall without sacrificing precision. Our approach outperforms state-of-the-art on a challenging dataset including 35 household models obtained with the Kinect sensor, as well as on the standard 3D object recognition benchmark dataset.

168 citations


Patent
26 Mar 2012
TL;DR: In this paper, a language model is applied to the concatenated word recognition lattice to determine the relationships between the word-recognition lattices and repeated until the generated word-reconfigurable lattices are acceptable or differ from a predetermined value only by a threshold amount.
Abstract: Speech recognition models are dynamically re-configurable based on user information, application information, background information such as background noise and transducer information such as transducer response characteristics to provide users with alternate input modes to keyboard text entry. Word recognition lattices are generated for each data field of an application and dynamically concatenated into a single word recognition lattice. A language model is applied to the concatenated word recognition lattice to determine the relationships between the word recognition lattices and repeated until the generated word recognition lattices are acceptable or differ from a predetermined value only by a threshold amount. These techniques of dynamic re-configurable speech recognition provide for deployment of speech recognition on small devices such as mobile phones and personal digital assistants as well environments such as office, home or vehicle while maintaining the accuracy of the speech recognition.

161 citations


Proceedings Article
03 Dec 2012
TL;DR: This work proposes a template model for the purpose of fine-grained recognition, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns, and achieves significantly outperform the state-of-the-art algorithms.
Abstract: Fine-grained recognition refers to a subordinate level of recognition, such as recognizing different species of animals and plants. It differs from recognition of basic categories, such as humans, tables, and computers, in that there are global similarities in shape and structure shared cross different categories, and the differences are in the details of object parts. We suggest that the key to identifying the fine-grained differences lies in finding the right alignment of image regions that contain the same object parts. We propose a template model for the purpose, which captures common shape patterns of object parts, as well as the cooccurrence relation of the shape patterns. Once the image regions are aligned, extracted features are used for classification. Learning of the template model is efficient, and the recognition results we achieve significantly outperform the state-of-the-art algorithms.

153 citations


Proceedings ArticleDOI
14 May 2012
TL;DR: This work introduces a methodology for learning 3D descriptors from synthetic CAD-models and classification of never-before-seen objects at the first glance, where classification rates and speed are suited for robotics tasks.
Abstract: 3D object and object class recognition gained momentum with the arrival of low-cost RGB-D sensors and enables robotics tasks not feasible years ago. Scaling object class recognition to hundreds of classes still requires extensive time and many objects for learning. To overcome the training issue, we introduce a methodology for learning 3D descriptors from synthetic CAD-models and classification of never-before-seen objects at the first glance, where classification rates and speed are suited for robotics tasks. We provide this in 3DNet (3d-net.org), a free resource for object class recognition and 6DOF pose estimation from point cloud data. 3DNet provides a large-scale hierarchical CAD-model databases with increasing numbers of classes and difficulty with 10, 50, 100 and 200 object classes together with evaluation datasets that contain thousands of scenes captured with a RGB-D sensor. 3DNet further provides an open-source framework based on the Point Cloud Library (PCL) for testing new descriptors and benchmarking of state-of-the-art descriptors together with pose estimation procedures to enable robotics tasks such as search and grasping.

150 citations


Proceedings ArticleDOI
14 May 2012
TL;DR: An object recognition system which leverages the additional sensing and calibration information available in a robotics setting together with large amounts of training data to build high fidelity object models for a dataset of textured household objects is presented.
Abstract: We present an object recognition system which leverages the additional sensing and calibration information available in a robotics setting together with large amounts of training data to build high fidelity object models for a dataset of textured household objects. We then demonstrate how these models can be used for highly accurate detection and pose estimation in an end-to-end robotic perception system incorporating simultaneous segmentation, object classification, and pose fitting. The system can handle occlusions, illumination changes, multiple objects, and multiple instances of the same object. The system placed first in the ICRA 2011 Solutions in Perception instance recognition challenge. We believe the presented paradigm of building rich 3D models at training time and including depth information at test time is a promising direction for practical robotic perception systems.

140 citations


Proceedings ArticleDOI
16 Jun 2012
TL;DR: This paper's framework is capable of accurately estimating pose and location of objects, regions, and points in the 3D scene; it recognizes objects and regions more accurately than state-of-the-art single image recognition methods.
Abstract: Structure from motion (SFM) aims at jointly recovering the structure of a scene as a collection of 3D points and estimating the camera poses from a number of input images. In this paper we generalize this concept: not only do we want to recover 3D points, but also recognize and estimate the location of high level semantic scene components such as regions and objects in 3D. As a key ingredient for this joint inference problem, we seek to model various types of interactions between scene components. Such interactions help regularize our solution and obtain more accurate results than solving these problems in isolation. Experiments on public datasets demonstrate that: 1) our framework estimates camera poses more robustly than SFM algorithms that use points only; 2) our framework is capable of accurately estimating pose and location of objects, regions, and points in the 3D scene; 3) our framework recognizes objects and regions more accurately than state-of-the-art single image recognition methods.

Journal ArticleDOI
TL;DR: It is suggested that non-rigid motion provides a source of information for recognizing dynamic objects, which is not affected by changes to viewpoint, as well as sequence-reversal decreased sensitivity across different tasks.
Abstract: There is evidence that observers use learned object motion to recognize objects. For instance, studies have shown that reversing the learned direction in which a rigid object rotated in depth impaired recognition accuracy. This motion reversal can be achieved by playing animation sequences of moving objects in reverse frame order. In the current study, we used this sequence-reversal manipulation to investigate whether observers encode the motion of dynamic objects in visual memory, and whether such dynamic representations are encoded in a way that is dependent on the viewing conditions. Participants first learned dynamic novel objects, presented as animation sequences. Following learning, they were then tested on their ability to recognize these learned objects when their animation sequence was shown in the same sequence order as during learning or in the reverse sequence order. In Experiment 1, we found that non-rigid motion contributed to recognition performance; that is, sequence-reversal decreased sensitivity across different tasks. In subsequent experiments, we tested the recognition of non-rigidly deforming (Experiment 2) and rigidly rotating (Experiment 3) objects across novel viewpoints. Recognition performance was affected by viewpoint changes for both experiments. Learned non-rigid motion continued to contribute to recognition performance and this benefit was the same across all viewpoint changes. By comparison, learned rigid motion did not contribute to recognition performance. These results suggest that non-rigid motion provides a source of information for recognizing dynamic objects, which is not affected by changes to viewpoint.

Journal ArticleDOI
TL;DR: A hierarchical view-based approach that addresses typical problems of previous methods is applied and is robust to noise, occlusions, and clutter to an extent that is sufficient for many practical applications, and is invariant to contrast changes.
Abstract: This paper describes an approach for recognizing instances of a 3D object in a single camera image and for determining their 3D poses. A hierarchical model is generated solely based on the geometry information of a 3D CAD model of the object. The approach does not rely on texture or reflectance information of the object's surface, making it useful for a wide range of industrial and robotic applications, e.g., bin-picking. A hierarchical view-based approach that addresses typical problems of previous methods is applied: It handles true perspective, is robust to noise, occlusions, and clutter to an extent that is sufficient for many practical applications, and is invariant to contrast changes. For the generation of this hierarchical model, a new model image generation technique by which scale-space effects can be taken into account is presented. The necessary object views are derived using a similarity-based aspect graph. The high robustness of an exhaustive search is combined with an efficient hierarchical search. The 3D pose is refined by using a least-squares adjustment that minimizes geometric distances in the image, yielding a position accuracy of up to 0.12 percent with respect to the object distance, and an orientation accuracy of up to 0.35 degree in our tests. The recognition time is largely independent of the complexity of the object, but depends mainly on the range of poses within which the object may appear in front of the camera. For efficiency reasons, the approach allows the restriction of the pose range depending on the application. Typical runtimes are in the range of a few hundred ms.

Proceedings ArticleDOI
24 Dec 2012
TL;DR: This work describes and experimentally verify a semantic querying system aboard a mobile robot equipped with a Microsoft Kinect RGB-D sensor, which allows the system to operate in large, dynamic, and uncon-strained environments, where modeling every object that occurs or might occur is impractical.
Abstract: Recent years have seen rising interest in robotic mapping algorithms that operate at the level of objects, rather than two- or three-dimensional occupancy. Such “semantic maps” permit higher-level reasoning than occupancy maps, and are useful for any application that involves dealing with objects, including grasping, change detection, and object search. We describe and experimentally verify such a system aboard a mobile robot equipped with a Microsoft Kinect RGB-D sensor. Our representation is object-based, and makes uniquely weak assumptions about the quality of the perceptual data available; in particular, we perform no explicit object recognition. This allows our system to operate in large, dynamic, and uncon-strained environments, where modeling every object that occurs (or might occur) is impractical. Our dataset, which is publicly available, consists of 67 autonomous runs of our robot over a six-week period in a roughly 1600m2 office environment. We demonstrate two applications built on our system: semantic querying and change detection.

Journal ArticleDOI
TL;DR: A 2D ear recognition approach based on local information fusion to deal with ear recognition under partial occlusion is proposed and results have illustrated that using only few sub-windows the authors can represent the most meaningful region of the ear, and the multi-classifier model gets higher recognition rate than using the whole image for recognition.

Proceedings ArticleDOI
14 May 2012
TL;DR: A perception-driven, multisensory exploration and recognition scheme to actively resolve ambiguities that emerge at certain viewpoints and takes into account proprioceptive information to create more reliable hypotheses.
Abstract: Interaction with its environment is a key requisite for a humanoid robot. Especially the ability to recognize and manipulate unknown objects is crucial to successfully work in natural environments. Visual object recognition, however, still remains a challenging problem, as three-dimensional objects often give rise to ambiguous, two-dimensional views. Here, we propose a perception-driven, multisensory exploration and recognition scheme to actively resolve ambiguities that emerge at certain viewpoints. We define an efficient method to acquire two-dimensional views in an object-centered task space and sample characteristic views on a view sphere. Information is accumulated during the recognition process and used to select actions expected to be most beneficial in discriminating similar objects. Besides visual information we take into account proprioceptive information to create more reliable hypotheses. Simulation and real-world results clearly demonstrate the efficiency of active, multisensory exploration over passive, visiononly recognition methods.

Journal ArticleDOI
Wei Wu1, Zheng Liu2, Mo Chen1, Xiaomin Yang1, Xiaohai He1 
TL;DR: An automatic container-code recognition system is developed by using computer vision to segment characters for various imaging conditions and the efficiency and effectiveness of the proposed technique for practical usage are demonstrated.
Abstract: Highlights? An automatic container-code recognition system is developed by using computer vision. ? The characteristics of characters are made full use of to locate container-code. ? A two-step method is proposed to segment characters for various imaging conditions. Automatic container-code recognition is of great importance to the modern container management system. Similar techniques have been proposed for vehicle license plate recognition in past decades. Compared with license plate recognition, automatic container-code recognition faces more challenges due to the severity of nonuniform illumination and invalidation of color information. In this paper, a computer vision based container-code recognition technique is proposed. The system consists of three function modules, namely location, isolation, and character recognition. In location module, we propose a text-line region location algorithm, which takes into account the characteristics of single character as well as the spatial relationship between successive characters. This module locates the text-line regions by using a horizontal high-pass filter and scanline analysis. To resolve nonuniform illumination, a two-step procedure is applied to segment container-code characters, and a projection process is adopted to isolate characters in the isolation module. In character recognition module, the character recognition is achieved by classifying the extracted features, which represent the character image, with trained support vector machines (SVMs). The experimental results demonstrate the efficiency and effectiveness of the proposed technique for practical usage.

Book ChapterDOI
07 Oct 2012
TL;DR: This paper works with a generic sparse coding feature, inspired from object recognition, for multi-view facial expression recognition, and presents detailed analysis of the variations in expression recognition performance for various pose changes.
Abstract: Expression recognition from non-frontal faces is a challenging research area with growing interest. This paper works with a generic sparse coding feature, inspired from object recognition, for multi-view facial expression recognition. Our extensive experiments on face images with seven pan angles and five tilt angles, rendered from the BU-3DFE database, achieve state-of-the-art results. We achieve a recognition rate of 69.1% on all images with four expression intensity levels, and a recognition performance of 76.1% on images with the strongest expression intensity. We then also present detailed analysis of the variations in expression recognition performance for various pose changes.

Proceedings ArticleDOI
01 Dec 2012
TL;DR: It is proposed that number of finger tips and the distance of fingertips from the centroid of the hand can be used along with PCA for robustness and efficient results and recognition with neural networks is proposed.
Abstract: Understanding human motions can be posed as a pattern recognition problem. Applications of pattern recognition in information processing problems are diverse ranging from Speech, Handwritten character recognition to medical research and astronomy. Humans express time-varying motion patterns (gestures), such as a wave, in order to convey a message to a recipient. If a computer can detect and distinguish these human motion patterns, the desired message can be reconstructed, and the computer can respond appropriately. This paper represents a framework for a human computer interface capable of recognizing gestures from the Indian sign language. The complexity of Indian sign language recognition system increases due to the involvement of both the hands and also the overlapping of the hands. Alphabets and numbers have been recognized successfully. This system can be extended for words and sentences Recognition is done with PCA (Principal Component analysis). This paper also proposes recognition with neural networks. Further it is proposed that number of finger tips and the distance of fingertips from the centroid of the hand can be used along with PCA for robustness and efficient results.

Journal ArticleDOI
27 Feb 2012-PLOS ONE
TL;DR: The proposed model extended a hierarchical model, which is motivated by biology, for different object recognition tasks, and used an evolutionary algorithm approach to select a set of informative patches, which are more informative than usual random patches.
Abstract: Humans can effectively and swiftly recognize objects in complex natural scenes. This outstanding ability has motivated many computational object recognition models. Most of these models try to emulate the behavior of this remarkable system. The human visual system hierarchically recognizes objects in several processing stages. Along these stages a set of features with increasing complexity is extracted by different parts of visual system. Elementary features like bars and edges are processed in earlier levels of visual pathway and as far as one goes upper in this pathway more complex features will be spotted. It is an important interrogation in the field of visual processing to see which features of an object are selected and represented by the visual cortex. To address this issue, we extended a hierarchical model, which is motivated by biology, for different object recognition tasks. In this model, a set of object parts, named patches, extracted in the intermediate stages. These object parts are used for training procedure in the model and have an important role in object recognition. These patches are selected indiscriminately from different positions of an image and this can lead to the extraction of non-discriminating patches which eventually may reduce the performance. In the proposed model we used an evolutionary algorithm approach to select a set of informative patches. Our reported results indicate that these patches are more informative than usual random patches. We demonstrate the strength of the proposed model on a range of object recognition tasks. The proposed model outperforms the original model in diverse object recognition tasks. It can be seen from the experiments that selected features are generally particular parts of target images. Our results suggest that selected features which are parts of target objects provide an efficient set for robust object recognition.

Proceedings ArticleDOI
18 Jul 2012
TL;DR: This work presents a robust approach for ear recognition using multi-scale dense HOG features as a descriptor of 2D ear images to avoid feature redundancy and provide a more efficient recognition process while being prone to over-fitting.
Abstract: Ear recognition is a promising biometric measure, especially with the growing interest in multi-modal biometrics. Histogram of Oriented Gradients (HOG) have been effectively and efficiently used solving the problems of object detection and recognition, especially when illumination variations are present. This work presents a robust approach for ear recognition using multi-scale dense HOG features as a descriptor of 2D ear images. The multi-scale features assure to capture the different and complicated structures of ear images. Dimensionality reduction was performed to avoid feature redundancy and provide a more efficient recognition process while being prone to over-fitting. Finally, a test was performed on a large and realistic database and the results were compared to the state of the art ear recognition approaches tested on the same dataset and under the same test procedure.

Book ChapterDOI
07 Oct 2012
TL;DR: The co-detector designed in this paper obtains more accurate detection results than if objects were to be detected from each image individually, and the relevance of the scheme to other recognition problems such as single instance object recognition, wide-baseline matching, and image query is demonstrated.
Abstract: In this paper we introduce a new problem which we call object co-detection. Given a set of images with objects observed from two or multiple images, the goal of co-detection is to detect the objects, establish the identity of individual object instance, as well as estimate the viewpoint transformation of corresponding object instances. In designing a co-detector, we follow the intuition that an object has consistent appearance when observed from the same or different viewpoints. By modeling an object using state-of-the-art part-based representations such as [1,2], we measure appearance consistency between objects by comparing part appearance and geometry across images. This allows to effectively account for object self-occlusions and viewpoint transformations. Extensive experimental evaluation indicates that our co-detector obtains more accurate detection results than if objects were to be detected from each image individually. Moreover, we demonstrate the relevance of our co-detection scheme to other recognition problems such as single instance object recognition, wide-baseline matching, and image query.

Patent
29 Nov 2012
TL;DR: In this article, a moving object recognition system includes a camera that is installed in a vehicle and captures continuous single-view images, a detecting unit that detects moving objects from the images captured by the camera, a relative approach angle estimating unit, and a collision risk calculating unit that calculates the risk of the moving object colliding with the vehicle.
Abstract: The moving object recognition system includes: a camera that is installed in a vehicle and captures continuous single-view images; a moving object detecting unit that detects a moving object from the images captured by the camera; a relative approach angle estimating unit that estimates the relative approach angle of the moving object detected by the moving object detecting unit with respect to the camera; a collision risk calculating unit that calculates the risk of the moving object colliding with the vehicle, based on the relationship between the relative approach angle and the moving object direction from the camera toward the moving object; and a reporting unit that reports a danger to the driver of the vehicle in accordance with the risk calculated by the collision risk calculating unit.

Journal ArticleDOI
TL;DR: An object recognition approach that is based on shape masks—generalizations of segmentation masks that can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time is proposed.
Abstract: In this paper we propose an object recognition approach that is based on shape masks--generalizations of segmentation masks. As shape masks carry information about the extent (outline) of objects, they provide a convenient tool to exploit the geometry of objects. We apply our ideas to two common object class recognition tasks--classification and localization. For classification, we extend the orderless bag-of-features image representation. In the proposed setup shape masks can be seen as weak geometrical constraints over bag-of-features. Those constraints can be used to reduce background clutter and help recognition. For localization, we propose a new recognition scheme based on high-dimensional hypothesis clustering. Shape masks allow to go beyond bounding boxes and determine the outline (approximate segmentation) of the object during localization. Furthermore, the method easily learns and detects possible object viewpoints and articulations, which are often well characterized by the object outline. Our experiments reveal that shape masks can improve recognition accuracy of state-of-the-art methods while returning richer recognition answers at the same time. We evaluate the proposed approach on the challenging natural-scene Graz-02 object classes dataset.

Proceedings ArticleDOI
TL;DR: This paper explores the efficacy of several object recognition algorithms at classifying ships and other ocean vessels in commercial panchromatic satellite imagery and discusses how these algorithms are being used in existing systems to detect and classify vessels in satellite imagery.
Abstract: Recognition and classification of vessels in maritime imagery is a challenging problem with applications to security and military scenarios. Aspects of this problem are similar to well-studied problems in object recognition, but it is in many ways more complex than a problem such as face recognition. A vessel's appearance can vary significantly from image to image depending on factors such as lighting condition, viewing geometry, and sea state, and there is often wide variation between ships of the same class. This paper explores the efficacy of several object recognition algorithms at classifying ships and other ocean vessels in commercial panchromatic satellite imagery. The recognition algorithms tested include traditional classification methods as well as more recent methods utilizing sparse matrix representations and dictionary learning. The impacts on classification accuracy of various pre-processing steps on vessel imagery are explored, and we discuss how these algorithms are being used in existing systems to detect and classify vessels in satellite imagery.

Journal ArticleDOI
Seokwon Yeom1, Dong-Su Lee1, Yushin Jang2, Mun-Kyo Lee2, Sang-Won Jung2 
TL;DR: Experiments confirm that the proposed methods provide fast and reliable recognition of the concealed object carried by a moving human subject.
Abstract: Millimeter wave (MMW) imaging is finding rapid adoption in security applications such as concealed object detection under clothing. A passive MMW imaging system can operate as a stand-off type sensor that scans people in both indoors and outdoors. However, the imaging system often suffers from the diffraction limit and the low signal level. Therefore, suitable intelligent image processing algorithms would be required for automatic detection and recognition of the concealed objects. This paper proposes real-time outdoor concealed-object detection and recognition with a radiometric imaging system. The concealed object region is extracted by the multi-level segmentation. A novel approach is proposed to measure similarity between two binary images. Principal component analysis (PCA) regularizes the shape in terms of translation and rotation. A geometric-based feature vector is composed of shape descriptors, which can achieve scale and orientation-invariant and distortion-tolerant property. Class is decided by minimum Euclidean distance between normalized feature vectors. Experiments confirm that the proposed methods provide fast and reliable recognition of the concealed object carried by a moving human subject.

Proceedings ArticleDOI
29 Oct 2012
TL;DR: In this paper, the authors proposed an image representation called Detection Bank, which is based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections and extended to video by aggregating the key frame level image representations through mean and max pooling.
Abstract: While low-level image features have proven to be effective representations for visual recognition tasks such as object recognition and scene classification, they are inadequate to capture complex semantic meaning required to solve high-level visual tasks such as multimedia event detection and recognition Recognition or retrieval of events and activities can be improved if specific discriminative objects are detected in a video sequence In this paper, we propose an image representation, called Detection Bank, based on the detection images from a large number of windowed object detectors where an image is represented by different statistics derived from these detections This representation is extended to video by aggregating the key frame level image representations through mean and max pooling We empirically show that it captures complementary information to state-of-the-art representations such as Spatial Pyramid Matching and Object Bank These descriptors combined with our Detection Bank representation significantly outperforms any of the representations alone on TRECVID MED 2011 data

Journal ArticleDOI
TL;DR: This paper investigates the visual extent of an object on the Pascal VOC dataset using a Bag-of-Words implementation with (colour) SIFT descriptors and confirms an early observation from human psychology: in the ideal situation with known object locations, recognition is no longer improved by considering surroundings, in contrast, in the normal situation with unknown objects, the surroundings significantly contribute to the recognition of most classes.
Abstract: The visual extent of an object reaches beyond the object itself. This is a long standing fact in psychology and is reflected in image retrieval techniques which aggregate statistics from the whole image in order to identify the object within. However, it is unclear to what degree and how the visual extent of an object affects classification performance. In this paper we investigate the visual extent of an object on the Pascal VOC dataset using a Bag-of-Words implementation with (colour) SIFT descriptors. Our analysis is performed from two angles. (a) Not knowing the object location, we determine where in the image the support for object classification resides. We call this the normal situation. (b) Assuming that the object location is known, we evaluate the relative potential of the object and its surround, and of the object border and object interior. We call this the ideal situation. Our most important discoveries are: (i) Surroundings can adequately distinguish between groups of classes: furniture, animals, and land-vehicles. For distinguishing categories within one group the surroundings become a source of confusion. (ii) The physically rigid plane, bike, bus, car, and train classes are recognised by interior boundaries and shape, not by texture. The non-rigid animals dog, cat, cow, and sheep are recognised primarily by texture, i.e. fur, as their projected shape varies greatly. (iii) We confirm an early observation from human psychology (Biederman in Perceptual Organization, pp. 213---263, 1981): in the ideal situation with known object locations, recognition is no longer improved by considering surroundings. In contrast, in the normal situation with unknown object locations, the surroundings significantly contribute to the recognition of most classes.

Proceedings Article
01 Nov 2012
TL;DR: This paper proposes a new approach to address the problem of Car Make and Model recognition by combining global and local information and utilizing discriminative information labeled by a human expert and it is validated through experiments on recognizing the make and model of sedan cars from single view images.
Abstract: This paper addresses the problem of Car Make and Model recognition as an example of within-category object class recognition. In this problem, it is assumed that the general category of the object is given and the goal is to recognize the object class within the same category. As compared to general object recognition, this problem is more challenging because the variations among classes within the same category are subtle, mostly dominated by the category overall characteristics, and easily missed due to pose and illumination variations. Therefore, this specific problem may not be effectively addressed using generic object recognition approaches. In this paper, we propose a new approach to address this specific problem by combining global and local information and utilizing discriminative information labeled by a human expert. We validate our approach through experiments on recognizing the make and model of sedan cars from single view images.

Book ChapterDOI
07 Oct 2012
TL;DR: Results show that text helps for object class recognition if the text is not uniquely coupled to individual object instances.
Abstract: We propose to use text recognition to aid in visual object class recognition. To this end we first propose a new algorithm for text detection in natural images. The proposed text detection is based on saliency cues and a context fusion step. The algorithm does not need any parameter tuning and can deal with varying imaging conditions. We evaluate three different tasks: 1. Scene text recognition, where we increase the state-of-the-art by 0.17 on the ICDAR 2003 dataset. 2. Saliency based object recognition, where we outperform other state-of-the-art saliency methods for object recognition on the PASCAL VOC 2011 dataset. 3. Object recognition with the aid of recognized text, where we are the first to report multi-modal results on the IMET set. Results show that text helps for object class recognition if the text is not uniquely coupled to individual object instances.