scispace - formally typeset
Search or ask a question
Journal ArticleDOI

3D object recognition and classification: a systematic literature review

01 Nov 2019-Pattern Analysis and Applications (Springer London)-Vol. 22, Iss: 4, pp 1243-1292
TL;DR: A systematic literature review concerning 3D object recognition and classification published between 2006 and 2016 is presented, using the methodology for systematic review proposed by Kitchenham.
Abstract: In this paper, we present a systematic literature review concerning 3D object recognition and classification. We cover articles published between 2006 and 2016 available in three scientific databases (ScienceDirect, IEEE Xplore and ACM), using the methodology for systematic review proposed by Kitchenham. Based on this methodology, we used tags and exclusion criteria to select papers about the topic under study. After the works selection, we applied a categorization process aiming to group similar object representation types, analyzing the steps applied for object recognition, the tests and evaluation performed and the databases used. Lastly, we compressed all the obtained information in a general overview and presented future prospects for the area.
Citations
More filters
Journal ArticleDOI
01 Sep 2021-Displays
TL;DR: A comprehensive review and classification of the latest developments in the deep learning methods for multi-view 3D object recognition is presented, which summarizes the results of these methods on a few mainstream datasets, provides an insightful summary, and puts forward enlightening future research directions.

101 citations

Journal ArticleDOI
TL;DR: A comprehensive review of the state-of-the-art object detection technologies focusing on both the sensory systems and algorithms used is presented in this article, where different sensory systems employed on existing AVs are elaborated while illustrating their advantages, limitations and applications.

17 citations

Journal ArticleDOI
TL;DR: This paper presents the first fully annotated MicroCT acquired microfossils dataset made publicly available, and proposes and validate a method for fully automated microFossil identification and segmentation.

16 citations

Journal ArticleDOI
TL;DR: Zhang et al. as discussed by the authors explored the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition and extensively evaluated the performance of object recognition approaches in three different configurations, including color-only, shape-only and combinations of color and shape, in both offline and online settings.
Abstract: Despite the recent success of state-of-the-art 3D object recognition approaches, service robots still frequently fail to recognize many objects in real human-centric environments. For these robots, object recognition is a challenging task due to the high demand for accurate and real-time response under changing and unpredictable environmental conditions. Most of the recent approaches use either the shape information only and ignore the role of color information or vice versa. Furthermore, they mainly utilize the $$L_n$$ Minkowski family functions to measure the similarity of two object views, while there are various distance measures that are applicable to compare two object views. In this paper, we explore the importance of shape information, color constancy, color spaces, and various similarity measures in open-ended 3D object recognition. Toward this goal, we extensively evaluate the performance of object recognition approaches in three different configurations, including color-only, shape-only, and combinations of color and shape, in both offline and online settings. Experimental results concerning scalability, memory usage, and object recognition performance show that all of the combinations of color and shape yield significant improvements over the shape-only and color-only approaches. The underlying reason is that color information is an important feature to distinguish objects that have very similar geometric properties with different colors and vice versa. Moreover, by combining color and shape information, we demonstrate that the robot can learn new object categories from very few training examples in a real-world setting.

9 citations

References
More filters
Journal ArticleDOI
TL;DR: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene and can robustly identify objects among clutter and occlusion while achieving near real-time performance.
Abstract: This paper presents a method for extracting distinctive invariant features from images that can be used to perform reliable matching between different views of an object or scene. The features are invariant to image scale and rotation, and are shown to provide robust matching across a substantial range of affine distortion, change in 3D viewpoint, addition of noise, and change in illumination. The features are highly distinctive, in the sense that a single feature can be correctly matched with high probability against a large database of features from many images. This paper also describes an approach to using these features for object recognition. The recognition proceeds by matching individual features to a database of features from known objects using a fast nearest-neighbor algorithm, followed by a Hough transform to identify clusters belonging to a single object, and finally performing verification through least-squares solution for consistent pose parameters. This approach to recognition can robustly identify objects among clutter and occlusion while achieving near real-time performance.

46,906 citations


"3D object recognition and classific..." refers background or methods in this paper

  • ...The local features are extracted via SIFT chain features, which are employed for subspace construction through PCA....

    [...]

  • ..., in the feature-based representation there is a variety of feature descriptors (SHOT [288], SI[116], VFH [254], SIFT [175] and so on)....

    [...]

  • ...The proposed method is composed of two stages: the offline stage, where the Bundler motion method structure is applied into the object data set and the background point are manually removed, to obtain the object point cloud model and generate, from this model, the aspect graph aware representation, and the online stage, where coarse 2D–3D correspondences are produced, by similarity computation between SIFT descriptors from input image and 3D model, and refined via a two-stage filter application, which removes false correspondences....

    [...]

  • ...In the recognition process, candidate evidences include global and local features extracted from the RGB-D data, i.e., the 3D SIFT, CLB and shape descriptors extracted from the point cloud....

    [...]

  • ...The traditional pooling in the space domain cannot be applied directly to high-dimensional pooling domains such as SIFT and FPFH due to the exponential pooling bins growth number....

    [...]

Journal ArticleDOI
01 Jan 1998
TL;DR: In this article, a graph transformer network (GTN) is proposed for handwritten character recognition, which can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters.
Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional patterns, such as handwritten characters, with minimal preprocessing. This paper reviews various methods applied to handwritten character recognition and compares them on a standard handwritten digit recognition task. Convolutional neural networks, which are specifically designed to deal with the variability of 2D shapes, are shown to outperform all other techniques. Real-life document recognition systems are composed of multiple modules including field extraction, segmentation recognition, and language modeling. A new learning paradigm, called graph transformer networks (GTN), allows such multimodule systems to be trained globally using gradient-based methods so as to minimize an overall performance measure. Two systems for online handwriting recognition are described. Experiments demonstrate the advantage of global training, and the flexibility of graph transformer networks. A graph transformer network for reading a bank cheque is also described. It uses convolutional neural network character recognizers combined with global training techniques to provide record accuracy on business and personal cheques. It is deployed commercially and reads several million cheques per day.

42,067 citations

Proceedings ArticleDOI
20 Sep 1999
TL;DR: Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.
Abstract: An object recognition system has been developed that uses a new class of local image features. The features are invariant to image scaling, translation, and rotation, and partially invariant to illumination changes and affine or 3D projection. These features share similar properties with neurons in inferior temporal cortex that are used for object recognition in primate vision. Features are efficiently detected through a staged filtering approach that identifies stable points in scale space. Image keys are created that allow for local geometric deformations by representing blurred image gradients in multiple orientation planes and at multiple scales. The keys are used as input to a nearest neighbor indexing method that identifies candidate object matches. Final verification of each match is achieved by finding a low residual least squares solution for the unknown model parameters. Experimental results show that robust object recognition can be achieved in cluttered partially occluded images with a computation time of under 2 seconds.

16,989 citations


"3D object recognition and classific..." refers methods in this paper

  • ...The scale-invariant feature transform (SIFT), presented by Lowe [174], used for describing salient points (keypoints) and representing the objects, was employed in several analyzed works as a form to extract keypoints....

    [...]

Proceedings ArticleDOI
01 Jan 1988
TL;DR: The problem the authors are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work.
Abstract: The problem we are addressing in Alvey Project MMI149 is that of using computer vision to understand the unconstrained 3D world, in which the viewed scenes will in general contain too wide a diversity of objects for topdown recognition techniques to work. For example, we desire to obtain an understanding of natural scenes, containing roads, buildings, trees, bushes, etc., as typified by the two frames from a sequence illustrated in Figure 1. The solution to this problem that we are pursuing is to use a computer vision system based upon motion analysis of a monocular image sequence from a mobile camera. By extraction and tracking of image features, representations of the 3D analogues of these features can be constructed.

13,993 citations


"3D object recognition and classific..." refers methods in this paper

  • ...Using this theory and the steps presented in [112] and [98] the 3D keypoint detector, BIK-BUS is generated....

    [...]

Journal ArticleDOI
TL;DR: In this article, a visual attention system inspired by the behavior and the neuronal architecture of the early primate visual system is presented, where multiscale image features are combined into a single topographical saliency map.
Abstract: A visual attention system, inspired by the behavior and the neuronal architecture of the early primate visual system, is presented. Multiscale image features are combined into a single topographical saliency map. A dynamical neural network then selects attended locations in order of decreasing saliency. The system breaks down the complex problem of scene understanding by rapidly selecting, in a computationally efficient manner, conspicuous locations to be analyzed in detail.

10,525 citations