scispace - formally typeset
Search or ask a question
Author

S. Hamidreza Kasaei

Other affiliations: Islamic Azad University
Bio: S. Hamidreza Kasaei is an academic researcher from University of Aveiro. The author has contributed to research in topics: Computer science & Object (computer science). The author has an hindex of 13, co-authored 27 publications receiving 457 citations. Previous affiliations of S. Hamidreza Kasaei include Islamic Azad University.

Papers
More filters
Journal ArticleDOI
TL;DR: A novel sign disambiguation method is proposed, for computing a unique reference frame from the eigenvectors obtained through Principal Component Analysis of the point cloud of the target object view captured by a 3D sensor.
Abstract: Object representation is one of the most challenging tasks in robotics because it must provide reliable information in real-time to enable the robot to physically interact with the objects in its environment. To ensure robustness, a global object descriptor must be computed based on a unique and repeatable object reference frame. Moreover, the descriptor should contain enough information enabling to recognize the same or similar objects seen from different perspectives. This paper presents a new object descriptor named Global Orthographic Object Descriptor (GOOD) designed to be robust, descriptive and efficient to compute and use. We propose a novel sign disambiguation method, for computing a unique reference frame from the eigenvectors obtained through Principal Component Analysis of the point cloud of the target object view captured by a 3D sensor. Three principal orthographic projections and their distribution matrices are computed by exploiting the object reference frame. The descriptor is finally obtained by concatenating the distribution matrices in a sequence determined by entropy and variance features of the projections. Experimental results show that the overall classification performance obtained with GOOD is comparable to the best performances obtained with the state-of-the-art descriptors. Concerning memory and computation time, GOOD clearly outperforms the other descriptors. Therefore, GOOD is especially suited for real-time applications. The estimated object’s pose is precise enough for real-time object manipulation tasks.

59 citations

Journal ArticleDOI
TL;DR: Experimental results show that the proposed system is able to interact with human users, learn new object categories over time, as well as perform complex tasks.
Abstract: This paper presents an artificial cognitive system tightly integrating object perception and manipulation for assistive robotics. This is necessary for assistive robots, not only to perform manipulation tasks in a reasonable amount of time and in an appropriate manner, but also to robustly adapt to new environments by handling new objects. In particular, this system includes perception capabilities that allow robots to incrementally learn object categories from the set of accumulated experiences and reason about how to perform complex tasks. To achieve these goals, it is critical to detect, track and recognize objects in the environment as well as to conceptualize experiences and learn novel object categories in an open-ended manner, based on human–robot interaction. Interaction capabilities were developed to enable human users to teach new object categories and instruct the robot to perform complex tasks. A naive Bayes learning approach with a Bag-of-Words object representation are used to acquire and refine object category models. Perceptual memory is used to store object experiences, feature dictionary and object category models. Working memory is employed to support communication purposes between the different modules of the architecture. A reactive planning approach is used to carry out complex tasks. To examine the performance of the proposed architecture, a quantitative evaluation and a qualitative analysis are carried out. Experimental results show that the proposed system is able to interact with human users, learn new object categories over time, as well as perform complex tasks.

58 citations

Proceedings ArticleDOI
01 Oct 2017
TL;DR: This work proposes a complete active multi-view framework to recognize 6DOF pose of multiple object instances in a crowded scene and includes several components in active vision setting to increase the accuracy.
Abstract: Recovering object pose in a crowd is a challenging task due to severe occlusions and clutters. In active scenario, whenever an observer fails to recover the poses of objects from the current view point, the observer is able to determine the next view position and captures a new scene from another view point to improve the knowledge of the environment, which may reduce the 6D pose estimation uncertainty. We propose a complete active multi-view framework to recognize 6DOF pose of multiple object instances in a crowded scene. We include several components in active vision setting to increase the accuracy: Hypothesis accumulation and verification combines single-shot based hypotheses estimated from previous views and extract the most likely set of hypotheses; an entropy-based Next-Best-View prediction generates next camera position to capture new data to increase the performance; camera motion planning plans the trajectory of the camera based on the view entropy and the cost of movement. Different approaches for each component are implemented and evaluated to show the increase in performance.

57 citations

Journal ArticleDOI
TL;DR: In this system main stage is the isolation of the license plate, from the digital image of the car obtained by a digital camera under different circumstances such as illumination, slop, distance, and angle, which shows the high accuracy and robustness of the proposed method.
Abstract: matching. In this system main stage is the isolation of the license plate, from the digital image of the car obtained by a digital camera under different circumstances such as illumination, slop, distance, and angle. The algorithm starts with preprocessing and signal conditioning. Next license plate is localized using morphological operators. Then a template matching scheme will be used to recognize the digits and characters within the plate. The system is tested on Iranian car plate images, and the performance was 97.3% of correct plates identification and localization and 92% of correct recognized characters. The results regarding the complexity of the problem and diversity of the test cases show the high accuracy and robustness of the proposed method. The method could also be applicable for other applications in the transport information systems, where automatic recognition of registration plates, shields, signs, and so on is often necessary.

55 citations

Journal ArticleDOI
TL;DR: An object perception and perceptual learning system developed for a complex artificial cognitive agent working in a restaurant scenario that integrates detection, tracking, learning and recognition of tabletop objects and the Point Cloud Library is used in nearly all modules.
Abstract: This paper describes a 3D object perception and perceptual learning system developed for a complex artificial cognitive agent working in a restaurant scenario. This system, developed within the scope of the European project RACE, integrates detection, tracking, learning and recognition of tabletop objects. Interaction capabilities were also developed to enable a human user to take the role of instructor and teach new object categories. Thus, the system learns in an incremental and open-ended way from user-mediated experiences. Based on the analysis of memory requirements for storing both semantic and perceptual data, a dual memory approach, comprising a semantic memory and a perceptual memory, was adopted. The perceptual memory is the central data structure of the described perception and learning system. The goal of this paper is twofold: on one hand, we provide a thorough description of the developed system, starting with motivations, cognitive considerations and architecture design, then providing details on the developed modules, and finally presenting a detailed evaluation of the system; on the other hand, we emphasize the crucial importance of the Point Cloud Library (PCL) for developing such system.11This paper is a revised and extended version of Oliveira et?al. (2014). We describe an object perception and perceptual learning system.The system is able to detect, track and recognize tabletop objects.The system learns novel object categories in an open-ended fashion.The Point Cloud Library is used in nearly all modules of the system.The system was developed and used in the European project RACE.

46 citations


Cited by
More filters
Proceedings ArticleDOI
18 Jun 2018
TL;DR: A single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses is proposed, which substantially outperforms other recent CNN-based approaches when they are all used without postprocessing.
Abstract: We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. Unlike a recently proposed single-shot technique for this task [10] that only predicts an approximate 6D pose that must then be refined, ours is accurate enough not to require additional post-processing. As a result, it is much faster - 50 fps on a Titan X (Pascal) GPU - and more suitable for real-time processing. The key component of our method is a new CNN architecture inspired by [27, 28] that directly predicts the 2D image locations of the projected vertices of the object's 3D bounding box. The object's 6D pose is then estimated using a PnP algorithm. For single object and multiple object pose estimation on the LINEMOD and OCCLUSION datasets, our approach substantially outperforms other recent CNN-based approaches [10, 25] when they are all used without postprocessing. During post-processing, a pose refinement step can be used to boost the accuracy of these two methods, but at 10 fps or less, they are much slower than our method.

642 citations

Proceedings ArticleDOI
01 Oct 2019
TL;DR: A novel deep learning method that estimates dense multi-class 2D-3D correspondence maps between an input image and available 3D models and demonstrates that a large number of correspondences is beneficial for obtaining high-quality 6D poses both before and after refinement.
Abstract: In this paper we present a novel deep learning method for 3D object detection and 6D pose estimation from RGB images. Our method, named DPOD (Dense Pose Object Detector), estimates dense multi-class 2D-3D correspondence maps between an input image and available 3D models. Given the correspondences, a 6DoF pose is computed via PnP and RANSAC. An additional RGB pose refinement of the initial pose estimates is performed using a custom deep learning-based refinement scheme. Our results and comparison to a vast number of related works demonstrate that a large number of correspondences is beneficial for obtaining high-quality 6D poses both before and after refinement. Unlike other methods that mainly use real data for training and do not train on synthetic renderings, we perform evaluation on both synthetic and real training data demonstrating superior results before and after refinement when compared to all recent detectors. While being precise, the presented approach is still real-time capable.

362 citations

Posted Content
Mark Newman1
TL;DR: In this paper, it was shown that one's acquaintances, one's immediate neighbors in the acquaintance network, are far from being a random sample of the population, and that this biases the numbers of neighbors two and more steps away.
Abstract: Recent work has demonstrated that many social networks, and indeed many networks of other types also, have broad distributions of vertex degree. Here we show that this has a substantial impact on the shape of ego-centered networks, i.e., sets of network vertices that are within a given distance of a specified central vertex, the ego. This in turn affects concepts and methods based on ego-centered networks, such as snowball sampling and the "ripple effect". In particular, we argue that one's acquaintances, one's immediate neighbors in the acquaintance network, are far from being a random sample of the population, and that this biases the numbers of neighbors two and more steps away. We demonstrate this concept using data drawn from academic collaboration networks, for which, as we show, current simple theories for the typical size of ego-centered networks give numbers that differ greatly from those measured in reality. We present an improved theoretical model which gives significantly better results.

239 citations

Posted Content
TL;DR: In this paper, a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses is proposed.
Abstract: We propose a single-shot approach for simultaneously detecting an object in an RGB image and predicting its 6D pose without requiring multiple stages or having to examine multiple hypotheses. Unlike a recently proposed single-shot technique for this task (Kehl et al., ICCV'17) that only predicts an approximate 6D pose that must then be refined, ours is accurate enough not to require additional post-processing. As a result, it is much faster - 50 fps on a Titan X (Pascal) GPU - and more suitable for real-time processing. The key component of our method is a new CNN architecture inspired by the YOLO network design that directly predicts the 2D image locations of the projected vertices of the object's 3D bounding box. The object's 6D pose is then estimated using a PnP algorithm. For single object and multiple object pose estimation on the LINEMOD and OCCLUSION datasets, our approach substantially outperforms other recent CNN-based approaches when they are all used without post-processing. During post-processing, a pose refinement step can be used to boost the accuracy of the existing methods, but at 10 fps or less, they are much slower than our method.

198 citations

Journal ArticleDOI
Weiping Liu1, Jia Sun2, Wanyi Li2, Ting Hu1, Peng Wang2 
26 Sep 2019-Sensors
TL;DR: The recent existing point cloud feature learning methods are classified as point-based and tree-based, which first employs a k-dimensional tree structure to represent the point cloud with a regular representation and then feeds these representations into deep learning models.
Abstract: Point cloud is a widely used 3D data form, which can be produced by depth sensors, such as Light Detection and Ranging (LIDAR) and RGB-D cameras. Being unordered and irregular, many researchers focused on the feature engineering of the point cloud. Being able to learn complex hierarchical structures, deep learning has achieved great success with images from cameras. Recently, many researchers have adapted it into the applications of the point cloud. In this paper, the recent existing point cloud feature learning methods are classified as point-based and tree-based. The former directly takes the raw point cloud as the input for deep learning. The latter first employs a k-dimensional tree (Kd-tree) structure to represent the point cloud with a regular representation and then feeds these representations into deep learning models. Their advantages and disadvantages are analyzed. The applications related to point cloud feature learning, including 3D object classification, semantic segmentation, and 3D object detection, are introduced, and the datasets and evaluation metrics are also collected. Finally, the future research trend is predicted.

151 citations