scispace - formally typeset
Search or ask a question

Showing papers on "3D single-object recognition published in 2018"


Journal ArticleDOI
TL;DR: A CNN knowledge transfer framework for underwater object recognition and tackle the problem of extracting discriminative features from relatively low contrast images by introducing a weighted probabilities decision mechanism to identify the object from an underwater video.

68 citations


Journal ArticleDOI
TL;DR: This paper considers the problem of online learning a feature transformation matrix expressed in the original feature space and proposes an online passive aggressive feature transformation algorithm, which can further improve the performance of online feature transformation learning in large-scale application.
Abstract: In this paper, we introduce a new research problem termed online feature transformation learning in the context of multiclass object category recognition. The learning of a feature transformation is viewed as learning a global similarity metric function in an online manner. We first consider the problem of online learning a feature transformation matrix expressed in the original feature space and propose an online passive aggressive feature transformation algorithm. Then these original features are mapped to kernel space and an online single kernel feature transformation (OSKFT) algorithm is developed to learn a nonlinear feature transformation. Based on the OSKFT and the existing Hedge algorithm, a novel online multiple kernel feature transformation algorithm is also proposed, which can further improve the performance of online feature transformation learning in large-scale application. The classifier is trained with k nearest neighbor algorithm together with the learned similarity metric function. Finally, we experimentally examined the effect of setting different parameter values in the proposed algorithms and evaluate the model performance on several multiclass object recognition data sets. The experimental results demonstrate the validity and good performance of our methods on cross-domain and multiclass object recognition application.

55 citations


Journal ArticleDOI
TL;DR: Experimental results suggest that the key features that distinguish an object can be derived around the 10 mm level and any further increase in the level of detail do not significantly increase the recognition accuracy.

45 citations


Journal ArticleDOI
01 Feb 2018
TL;DR: The proposed 3-D object recognition method is ideal for detecting structural objects and has high scalability and parallelism, and its implementation on a robotic navigation aid to allow real-time detection of indoor structural objects for the navigation of a blind person.
Abstract: This paper presents a 3-D object recognition method and its implementation on a robotic navigation aid to allow real-time detection of indoor structural objects for the navigation of a blind person. The method segments a point cloud into numerous planar patches and extracts their inter-plane relationships (IPRs).Based on the existing IPRs of the object models, the method defines six high level features (HLFs) and determines the HLFs for each patch. A Gaussian-mixture-model-based plane classifier is then devised to classify each planar patch into one belonging to a particular object model. Finally, a recursive plane clustering procedure is used to cluster the classified planes into the model objects. As the proposed method uses geometric context to detect an object, it is robust to the object’s visual appearance change. As a result, it is ideal for detecting structural objects (e.g., stairways, doorways, and so on). In addition, it has high scalability and parallelism. The method is also capable of detecting some indoor non-structural objects. Experimental results demonstrate that the proposed method has a high success rate in object recognition.

41 citations


Journal ArticleDOI
TL;DR: This paper proposes a robust and efficient sensor-assisted face recognition system on smart glasses by exploring the power of multimodal sensors including the camera and Inertial Measurement Unit (IMU) sensors and proposes two novel sampling optimization strategies using the less expensive inertial sensors.
Abstract: Face recognition is a hot research topic with a variety of application possibilities, including video surveillance and mobile payment. It has been well researched in traditional computer vision community. However, new research issues arise when it comes to resource constrained devices, such as smart glasses, due to the overwhelming computation and energy requirements of the accurate face recognition methods. In this paper, we propose a robust and efficient sensor-assisted face recognition system on smart glasses by exploring the power of multimodal sensors including the camera and Inertial Measurement Unit (IMU) sensors. The system is based on a novel face recognition algorithm, namely Multi-view Sparse Representation Classification (MVSRC), by exploiting the prolific information among multi-view face images. To improve the efficiency of MVSRC on smart glasses, we propose two novel sampling optimization strategies using the less expensive inertial sensors. Our evaluations on public and private datasets show that the proposed method is up to 10 percent more accurate than the state-of-the-art multi-view face recognition methods while its computation cost is the same order as an efficient benchmark method (e.g., Eigenfaces). Finally, extensive real-world experiments show that our proposed system improves recognition accuracy by up to 15 percent while achieving the same level of system overhead compared to the existing face recognition system (OpenCV algorithms) on smart glasses.

37 citations


Journal ArticleDOI
TL;DR: A novel pipeline for the first-person daily activity recognition that significantly outperforms the state-of-the-art recognition performance on a challenging first- person daily activity benchmark and develops a non-linear feature fusion scheme, which better combines object and motion features.
Abstract: Most previous works on the first-person video recognition focus on measuring the similarity of different actions by using low-level features of objects interacting with humans. However, due to noisy camera motion and frequent changes in viewpoint and scale, they fail to capture and model highly discriminative object features. In this paper, we propose a novel pipeline for the first-person daily activity recognition. Our object feature extraction pipeline is inspired by the recent success of object hypotheses and deep convolutional neural network (CNN)-based detection frameworks. Our key contribution is a simple yet effective manipulated object proposal generation scheme. This scheme leverages motion cues, such as motion boundary and motion magnitude (in contrast, camera motion is usually considered as “noise” for most previous methods), to generate a more compact and discriminative set of object proposals, which are more closely related to the objects, which are being manipulated. Then, we learn more discriminative object detectors from these manipulated object proposals based on region-based CNN. Meanwhile, we develop a non-linear feature fusion scheme, which better combines object and motion features. We show in experiments that the proposed framework significantly outperforms the state-of-the-art recognition performance on a challenging first-person daily activity benchmark.

35 citations


Journal ArticleDOI
TL;DR: A modified geometric mapping technique has been proposed for the 3D reconstruction of the recognized object and the performance of the algorithm was superior and enabled recognition of objects with 80% occlusion or less.
Abstract: Object recognition is one of the key areas in computer vision which comprises of object detection, recognition and reconstruction. The image of the object to be recognized is captured using camera and matched with pre-stored templates of the model object. Recognizing 3D view of the object is difficult in the presence of object occlusion and view-point invariants. This paper focuses on the problem of occlusion and provides a solution for handling self and inter-object occlusion. Self-occlusion has been addressed by the suitable calibration of the cameras and a novel algorithm has been proposed to address inter-object occlusion. A modified geometric mapping technique has been proposed for the 3D reconstruction of the recognized object. Real-time setup has been used to test the proposed solutions to identify objects of multiple shapes and sizes. The results show that the performance of the algorithm was superior and enabled recognition of objects with 80% occlusion or less.

16 citations


Journal ArticleDOI
TL;DR: A combination technique of two algorithms well-known among machine learning practitioners to build a ranking function which will be used to select the bounding box that ranks higher, i.e., that will likely enclose the target object.
Abstract: Object tracking is one of the most important processes for object recognition in the field of computer vision. The aim is to find accurately a target object in every frame of a video sequence. In this paper we propose a combination technique of two algorithms well-known among machine learning practitioners. Firstly, we propose a deep learning approach to automatically extract the features that will be used to represent the original images. Deep learning has been successfully applied in different computer vision applications. Secondly, object tracking can be seen as a ranking problem, since the regions of an image can be ranked according to their level of overlapping with the target object (ground truth in each video frame). During object tracking, the target position and size can change, so the algorithms have to propose several candidate regions in which the target can be found. We propose to use a preference learning approach to build a ranking function which will be used to select the bounding box that ranks higher, i.e., that will likely enclose the target object. The experimental results obtained by our method, called $$ DPL ^{2}$$ (Deep and Preference Learning), are competitive with respect to other algorithms.

14 citations


Journal ArticleDOI
TL;DR: The detailed experimentation reveals that the proposed approach performs better than state-of-the-art action localization and recognition approaches.
Abstract: This paper addresses the problem of activity localization and recognition in large scale video datasets by the collaborative use of holistic and motion based information (called motion cues). The concept of salient objects is used to obtain the holistic information while the motion cues are obtained by affine motion model and optical flow. The motion cues compensate the camera motion and localize the object of interest in a set of object proposals. Furthermore, the holistic information and motion cues are fused to get a reliable object of interest. In recognition phase, the holistic and motion based features are extracted from the object of interest for the training and testing of classifier. The extreme learning machine is adopted as a classifier to reduce the training and testing time and increase the classification accuracy. The effectiveness of the proposed approach is tested on UCF sports dataset. The detailed experimentation reveals that the proposed approach performs better than state-of-the-art action localization and recognition approaches.

9 citations


Book ChapterDOI
01 Jan 2018
TL;DR: An information-theoretic framework that combines and unifies two common techniques: view planning for resolving ambiguities and occlusions and online feature selection for reducing computational costs is described.
Abstract: Adaptive action selection is crucial for object recognition on robots with limited operating capabilities. It offers much desired flexibility in trading off between the costs of acquiring information and making robust and reliable inference under uncertainty. In this paper, we describe an information-theoretic framework that combines and unifies two common techniques: view planning for resolving ambiguities and occlusions and online feature selection for reducing computational costs. Concretely, our algorithm adaptively chooses two strategies: utilize simple-to-compute features that are the most informative for the recognition task or move to new viewpoints that optimally reduce the expected uncertainties on the identity of the object. Extensive empirical studies have validated the effectiveness of the proposed framework. On a large RGB-D dataset, dynamic feature selection alone reduces the computation time at runtime by five folds, and when combining it with viewpoint selection, we significantly increase the recognition accuracy on average by 8–15% absolute, compared to systems that do not use these two strategies. Lastly, we have also demonstrated successfully the effectiveness of the framework on a quadcopter platform with limited operating time.

9 citations


Journal ArticleDOI
TL;DR: A human action recognition system called “RegFrame” is developed, which can rapidly and accurately recognize simple human actions, including 3D actions, on a stand-alone mobile device and can be flexibly integrated with a variety of applications.
Abstract: In recent years, human action recognition in videos has become an active research topic, being applied in surveillance, security, somatic games, interactive operations, etc. Since most human action recognition systems are designed for PCs, their performance is poor when transplanted to mobile devices. In this paper, we develop a human action recognition system called “RegFrame,” which can rapidly and accurately recognize simple human actions, including 3D actions, on a stand-alone mobile device. The system divides an action recognition process into two steps: object recognition and movement detection. The movement detection is implemented by a novel Nine-Square algorithm that nearly avoids floating point computing, which improves the recognition time. The experimental results show that the proposed “RegFrame” works reliably in different testing scenarios, and it outperforms the action recognition method of the SAMSUNG Galaxy V (S5) by up to 20% in terms of action recognition time. In addition, the proposed system can be flexibly integrated with a variety of applications.

Book ChapterDOI
01 Jan 2018
TL;DR: In this paper, edge detection method is used to detect object in frame and then compare their angle with the database angle value is done to reduce the size of hardware and speed up the performance.
Abstract: Image processing is very vast domain, in which object recognition is toughest challenge in computer vision. As object having different key feathers to describe them we have taken silhouette image of object from recognition proceeding. In this paper edge detection method is used to detect object in frame and then compare their angle with the database angle value is done. To reduce the size of hardware and speed-up the performance we use Raspberry-Pi 2 model.