scispace - formally typeset
Search or ask a question
Author

G. C. Nandi

Bio: G. C. Nandi is an academic researcher from Indian Institute of Information Technology, Allahabad. The author has contributed to research in topics: Robot & GRASP. The author has an hindex of 3, co-authored 13 publications receiving 19 citations. Previous affiliations of G. C. Nandi include Indian Institutes of Information Technology.

Papers
More filters
Proceedings ArticleDOI
03 Dec 2020
TL;DR: In this paper, the authors proposed a novel method to deal with Mode Collapse by using multiple generator architecture, which uses multiple generators to provide a better solution to the missing modes problem.
Abstract: With the advancement of Deep Neural Network and its increased applications, requirement of data has increased exponentially. To fulfil this requirement Deep generative models specifically Generative Adversarial Networks (GANs) have emerged as a very powerful tool. However, tuning GAN parameters are extremely difficult due to its instability and it is very prone to miss modes while training, which is termed as Mode Collapse. Mode collapse leads generators to generate the images of a particular mode while ignoring the other mode classes. In the present research, we propose a novel method to deal with Mode Collapse by using multiple generator architecture. Initially, We have shown the comparison of different GAN architectures which deals with the Mode collapse problem. We use Inception score (IS) as a evaluation metric to evaluate the performance of GAN. We started analysing GAN on a simple dataset (MNIST) using DCGAN architecture. To produce better results, the present work describes the implementation of two other different approaches. We have experimented on Wasserstein GAN (WGAN) which improves GANs training by adopting a different metric which is Wasserstein distance for calculating the distance between two probability distributions. Subsequently we have proposed the approach of multiple generator GAN architecture which uses multiple generators to provide a better solution to the missing modes problem. We evaluate our approach on several datasets (MNIST, CIFAR-10, SVHN, CelebA Face dataset) with encouraging results compared to the other existing architectures.

16 citations

Proceedings ArticleDOI
19 Jul 2020
TL;DR: A semi-supervised learning based grasp detection approach has been presented, which models a discrete latent space using a Vector Quantized Variational AutoEncoder (VQ-VAE) and performs significantly better than the existing approaches which do not make use of unlabelled images to improve the grasp.
Abstract: For a robot to perform complex manipulation tasks, it is necessary for it to have a good grasping ability. However, vision based robotic grasp detection is hindered by the unavailability of sufficient labelled data. Furthermore, the application of semi-supervised learning techniques to grasp detection is underexplored. In this paper, a semi-supervised learning based grasp detection approach has been presented, which models a discrete latent space using a Vector Quantized Variational AutoEncoder (VQ-VAE). To the best of our knowledge, this is the first time a Variational AutoEncoder (VAE) has been applied in the domain of robotic grasp detection. The VAE helps the model in generalizing beyond the Cornell Grasping Dataset (CGD) despite having a limited amount of labelled data by also utilizing the unlabelled data. This claim has been validated by testing the model on images, which are not available in the CGD. Along with this, we augment the Generative Grasping Convolutional Neural Network (GGCNN) architecture with the decoder structure used in the VQ-VAE model with the intuition that it should help to regress in the vector-quantized latent space. Subsequently, the model performs significantly better than the existing approaches which do not make use of unlabelled images to improve the grasp.

10 citations

Posted Content
TL;DR: This paper has developed learning-based pose estimation by decomposing the problem into both position and orientation learning and develops a deep reinforcement learning (DRL) model which is named as grasp deep Q-network (GDQN).
Abstract: Intelligent Object manipulation for grasping is a challenging problem for robots. Unlike robots, humans almost immediately know how to manipulate objects for grasping due to learning over the years. A grown woman can grasp objects more skilfully than a child because of learning skills developed over years, the absence of which in the present day robotic grasping compels it to perform well below the human object grasping benchmarks. In this paper we have taken up the challenge of developing learning based pose estimation by decomposing the problem into both position and orientation learning. More specifically, for grasp position estimation, we explore three different methods - a Genetic Algorithm (GA) based optimization method to minimize error between calculated image points and predicted end-effector (EE) position, a regression based method (RM) where collected data points of robot EE and image points have been regressed with a linear model, a PseudoInverse (PI) model which has been formulated in the form of a mapping matrix with robot EE position and image points for several observations. Further for grasp orientation learning, we develop a deep reinforcement learning (DRL) model which we name as Grasp Deep Q-Network (GDQN) and benchmarked our results with Modified VGG16 (MVGG16). Rigorous experimentations show that due to inherent capability of producing very high-quality solutions for optimization problems and search problems, GA based predictor performs much better than the other two models for position estimation. For orientation learning results indicate that off policy learning through GDQN outperforms MVGG16, since GDQN architecture is specially made suitable for the reinforcement learning. Based on our proposed architectures and algorithms, the robot is capable of grasping all rigid body objects having regular shapes.

8 citations

Journal ArticleDOI
TL;DR: In this article, a grasp deep Q-network (GDQN) was proposed to learn the grasp orientation and pose of a robot by decomposing the problem into both position and orientation learning.
Abstract: Intelligent object manipulation for grasping is a challenging problem for robots. Unlike robots, humans almost immediately know how to manipulate objects for grasping due to learning over the years. In this paper, we have developed learning-based pose estimation by decomposing the problem into both position and orientation learning. More specifically, for grasp position estimation, we explore three different methods such as genetic algorithm (GA)-based optimization method to minimize error between calculated image points and predicted end-effector (EE) position, a regression-based method (RM) where collected data points of robot EE and image points have been regressed with a linear model, a pseudoinverse (PI) model which has been formulated in the form of a mapping matrix with robot EE position and image points for several observations. Further for grasp orientation learning, we develop a deep reinforcement learning (DRL) model which we name as grasp deep Q-network (GDQN) and benchmarked our results with Modified VGG16 (MVGG16). Rigorous experimentation shows that due to inherent capability of producing very high-quality solutions for optimization problems and search problems, GA-based predictor performs much better than the other two models for position estimation. For orientation, learning results indicate that off policy learning through GDQN outperforms MVGG16, since GDQN architecture is specially made suitable for the reinforcement learning. Experimentation based on our proposed architectures and algorithms shows that the robot is capable of grasping nearly all rigid body objects having regular shapes.

6 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: Three key tasks during vision-based robotic grasping are concluded, which are object localization, object pose estimation and grasp estimation, which include 2D planar grasp methods and 6DoF grasp methods.
Abstract: This paper presents a comprehensive survey on vision-based robotic grasping. We conclude three key tasks during vision-based robotic grasping, which are object localization, object pose estimation and grasp estimation. In detail, the object localization task contains object localization without classification, object detection and object instance segmentation. This task provides the regions of the target object in the input data. The object pose estimation task mainly refers to estimating the 6D object pose and includes correspondence-based methods, template-based methods and voting-based methods, which affords the generation of grasp poses for known objects. The grasp estimation task includes 2D planar grasp methods and 6DoF grasp methods, where the former is constrained to grasp from one direction. These three tasks could accomplish the robotic grasping with different combinations. Lots of object pose estimation methods need not object localization, and they conduct object localization and object pose estimation jointly. Lots of grasp estimation methods need not object localization and object pose estimation, and they conduct grasp estimation in an end-to-end manner. Both traditional methods and latest deep learning-based methods based on the RGB-D image inputs are reviewed elaborately in this survey. Related datasets and comparisons between state-of-the-art methods are summarized as well. In addition, challenges about vision-based robotic grasping and future directions in addressing these challenges are also pointed out.

137 citations

Journal ArticleDOI
TL;DR: A comprehensive survey of the recent optimization methods employed to enhance DNNs performance on various tasks is presented and the importance of optimization methods in generating the optimal hyper-parameters and structures of Dnns in taking into consideration massive-scale data is analyzed.
Abstract: Deep neural networks (DNNs) have evolved as a beneficial machine learning method that has been successfully used in various applications. Currently, DNN is a superior technique of extracting information from massive sets of data in a self-organized method. DNNs have different structures and parameters, which are usually produced for particular applications. Nevertheless, the training procedures of DNNs can be protracted depending on the given application and the size of the training set. Further, determining the most precise and practical structure of a deep learning method in a reasonable time is a possible problem related to this procedure. Meta-heuristics techniques, such as swarm intelligence (SI) and evolutionary computing (EC), represent optimization frames with specific theories and objective functions. These methods are adjustable and have been demonstrated their effectiveness in various applications; hence, they can optimize the DNNs models. This paper presents a comprehensive survey of the recent optimization methods (i.e., SI and EC) employed to enhance DNNs performance on various tasks. This paper also analyzes the importance of optimization methods in generating the optimal hyper-parameters and structures of DNNs in taking into consideration massive-scale data. Finally, several potential directions that still need improvements and open problems in evolutionary DNNs are identified.

43 citations

Journal ArticleDOI
23 Jan 2023-Drones
TL;DR: In this paper , a cost-efficient, socially designed robot called ''Tinku'' is presented to assist in teaching special needs children with autism spectrum disorder. But it is not suitable for use in the classroom.
Abstract: Recent studies state that, for a person with autism spectrum disorder, learning and improvement is often seen in environments where technological tools are involved. A robot is an excellent tool to be used in therapy and teaching. It can transform teaching methods, not just in the classrooms but also in the in-house clinical practices. With the rapid advancement in deep learning techniques, robots became more capable of handling human behaviour. In this paper, we present a cost-efficient, socially designed robot called `Tinku’, developed to assist in teaching special needs children. `Tinku’ is low cost but is full of features and has the ability to produce human-like expressions. Its design is inspired by the widely accepted animated character `WALL-E’. Its capabilities include offline speech processing and computer vision—we used light object detection models, such as Yolo v3-tiny and single shot detector (SSD)—for obstacle avoidance, non-verbal communication, expressing emotions in an anthropomorphic way, etc. It uses an onboard deep learning technique to localize the objects in the scene and uses the information for semantic perception. We have developed several lessons for training using these features. A sample lesson about brushing is discussed to show the robot’s capabilities. Tinku is cute, and loaded with lots of features, and the management of all the processes is mind-blowing. It is developed in the supervision of clinical experts and its condition for application is taken care of. A small survey on the appearance is also discussed. More importantly, it is tested on small children for the acceptance of the technology and compatibility in terms of voice interaction. It helps autistic kids using state-of-the-art deep learning models. Autism Spectral disorders are being increasingly identified today’s world. The studies show that children are prone to interact with technology more comfortably than a with human instructor. To fulfil this demand, we presented a cost-effective solution in the form of a robot with some common lessons for the training of an autism-affected child.

21 citations