scispace - formally typeset
Search or ask a question

Showing papers on "Sketch recognition published in 2015"


Journal ArticleDOI
TL;DR: An analysis of comparative surveys done in the field of gesture based HCI and an analysis of existing literature related to gesture recognition systems for human computer interaction by categorizing it under different key parameters are provided.
Abstract: As computers become more pervasive in society, facilitating natural human---computer interaction (HCI) will have a positive impact on their use. Hence, there has been growing interest in the development of new approaches and technologies for bridging the human---computer barrier. The ultimate aim is to bring HCI to a regime where interactions with computers will be as natural as an interaction between humans, and to this end, incorporating gestures in HCI is an important research area. Gestures have long been considered as an interaction technique that can potentially deliver more natural, creative and intuitive methods for communicating with our computers. This paper provides an analysis of comparative surveys done in this area. The use of hand gestures as a natural interface serves as a motivating force for research in gesture taxonomies, its representations and recognition techniques, software platforms and frameworks which is discussed briefly in this paper. It focuses on the three main phases of hand gesture recognition i.e. detection, tracking and recognition. Different application which employs hand gestures for efficient interaction has been discussed under core and advanced application domains. This paper also provides an analysis of existing literature related to gesture recognition systems for human computer interaction by categorizing it under different key parameters. It further discusses the advances that are needed to further improvise the present hand gesture recognition systems for future perspective that can be widely used for efficient human computer interaction. The main goal of this survey is to provide researchers in the field of gesture based HCI with a summary of progress achieved to date and to help identify areas where further research is needed.

1,338 citations


Journal ArticleDOI
TL;DR: A survey of domain adaptation methods for visual recognition discusses the merits and drawbacks of existing domain adaptation approaches and identifies promising avenues for research in this rapidly evolving field.
Abstract: In pattern recognition and computer vision, one is often faced with scenarios where the training data used to learn a model have different distribution from the data on which the model is applied. Regardless of the cause, any distributional change that occurs after learning a classifier can degrade its performance at test time. Domain adaptation tries to mitigate this degradation. In this article, we provide a survey of domain adaptation methods for visual recognition. We discuss the merits and drawbacks of existing domain adaptation approaches and identify promising avenues for research in this rapidly evolving field.

871 citations


Journal ArticleDOI
TL;DR: A review of vision-based hand gesture recognition algorithms reported in the last 16 years using RGB and RGB-D cameras and qualitative and quantitative comparisons of algorithms are provided.

259 citations


Posted Content
TL;DR: In this paper, a multi-scale multi-channel deep neural network framework was proposed for sketch recognition, which achieved state-of-the-art performance on the largest human sketch dataset.
Abstract: We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.

164 citations


Proceedings ArticleDOI
07 Sep 2015
TL;DR: A multi-scale multi-channel deep neural network framework that yields sketch recognition performance surpassing that of humans, and not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.
Abstract: We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.

147 citations


Journal ArticleDOI
TL;DR: This work proposes a Multiple Kernel Learning (MKL) framework for sketch recognition, fusing several features common to sketches, and investigates the use of attributes as a high-level feature for sketches and shows how this complements low-level features for improving recognition performance under the MKL framework.

95 citations


Proceedings ArticleDOI
18 Apr 2015
TL;DR: This paper introduces crowdsourcing techniques and tools for prototyping interactive systems in the time it takes to describe the idea, and introduces Powering Apparition, the first self-coordinated, real-time crowdsourcing infrastructure.
Abstract: Prototyping allows designers to quickly iterate and gather feedback, but the time it takes to create even a Wizard-of-Oz prototype reduces the utility of the process. In this paper, we introduce crowdsourcing techniques and tools for prototyping interactive systems in the time it takes to describe the idea. Our Apparition system uses paid microtask crowds to make even hard-to-automate functions work immediately, allowing more fluid prototyping of interfaces that contain interactive elements and complex behaviors. As users sketch their interface and describe it aloud in natural language, crowd workers and sketch recognition algorithms translate the input into user interface elements, add animations, and provide Wizard-of-Oz functionality. We discuss how design teams can use our approach to reflect on prototypes or begin user studies within seconds, and how, over time, Apparition prototypes can become fully-implemented versions of the systems they simulate. Powering Apparition is the first self-coordinated, real-time crowdsourcing infrastructure. We anchor this infrastructure on a new, lightweight write-locking mechanism that workers can use to signal their intentions to each other.

93 citations


Proceedings ArticleDOI
19 May 2015
TL;DR: In the proposed algorithm, first the deep learning architecture based facial representation is learned using large face database of photos and then the representation is updated using small problem-specific training database.
Abstract: Sketch recognition is one of the integral components used by law enforcement agencies in solving crime. In recent past, software generated composite sketches are being preferred as they are more consistent and faster to construct than hand drawn sketches. Matching these composite sketches to face photographs is a complex task because the composite sketches are drawn based on the witness description and lack minute details which are present in photographs. This paper presents a novel algorithm for matching composite sketches with photographs using transfer learning with deep learning representation. In the proposed algorithm, first the deep learning architecture based facial representation is learned using large face database of photos and then the representation is updated using small problem-specific training database. Experiments are performed on the extended PRIP database and it is observed that the proposed algorithm outperforms recently proposed approach and a commercial face recognition system.

79 citations


Proceedings ArticleDOI
10 Jun 2015
TL;DR: A ConvNet is proposed that is both more accurate and lighter/faster than the two only previous attempts at making use of ConvNets for handsketch recognition and makes use of convolution neural networks features as a basis for similarity search using k-Nearest Neighbors.
Abstract: In this paper, we present a system for sketch classification and similarity search. We used deep convolution neural networks (ConvNets), state of the art in the field of image recognition. They enable both classification and medium/highlevel features extraction. We make use of ConvNets features as a basis for similarity search using k-Nearest Neighbors (kNN). Evaluation are performed on the TU-Berlin benchmark. Our main contributions are threefold: first, we use ConvNets in contrast to most previous approaches based essentially on hand crafted features. Secondly, we propose a ConvNet that is both more accurate and lighter/faster than the two only previous attempts at making use of ConvNets for handsketch recognition. We reached an accuracy of 75.42%. Third, we shown that similarly to their application on natural images, ConvNets allow the extraction of medium-level and high-level features (depending on the depth) which can be used for similarity search.1

60 citations


Proceedings ArticleDOI
26 Feb 2015
TL;DR: This paper reviews the comparative study of various hand gesture recognition techniques and various segmentation and Tracking, feature extraction and recognition techniques are studied and analyzed.
Abstract: From the ancient age, gesture was the first mode of communication, after the evolution of human civilization they developed the verbal communication, but still non-verbal communication is equally significant. Such on-verbal communication is not only used for physically challenged person but also it can be efficiently used for various applications such as 3D gaming, aviation, surveying, etc. This is the best method to interact with computer without any peripheral devices. Many Researchers are still developing robust and efficient new hand gesture recognition techniques. The major steps associated while designing the system are: data acquisition, segmentation and tracking, feature extraction and gesture recognition. There're different methodologies associated with several substepspresent at each step. A various segmentation and Tracking, feature extraction and recognition techniques are studied and analyzed. This paper reviews the comparative study of various hand gesture recognition techniques which are presented up-till now.

48 citations


Proceedings ArticleDOI
01 Oct 2015
TL;DR: This paper proposes a scheme using a database-driven hand gesture recognition based upon skin color model approach and thresholding approach along with an effective template matching using PCA using Principal Component Analysis (PCA) for recognition.
Abstract: Gesture recognition turns up to be important field in the recent years Communication through gestures has been used since early ages not only by physically challenged persons hut nowadays for many other applications Interacting with physical world using expressive body movements is much easier and effective than just speaking As most predominantly hand is used to perform gestures Hand Gesture Recognition have been widely accepted for numerous applications such as human computer interactions, robotics, sign language recognition, etc Hand Gesture recognition techniques are basically divided into vision based and sensor based techniques This paper focuses on vision based hand gesture recognition system by proposing a scheme using a database-driven hand gesture recognition based upon skin color model approach and thresholding approach along with an effective template matching using PCA Initially, hand region is segmented by applying skin color model in YCbCr color space In the next stage otsuthresholding is applied to separate foreground and background Finally, template based matching technique is developed using Principal Component Analysis (PCA) for recognition The system is tested with 4 gestures with 5 different poses per gesture from 4 subjects making 20 images per gesture and shows 9125% average accuracy and 0098251 seconds average recognition time and finally confusion matrix is drawn

Posted Content
30 Jan 2015
TL;DR: This work outlines a sketch image retrieval system in a unified neural network framework that is the first deep neural network model for sketch classification, and it has outperformed state-of-the-art results in the TU-Berlin sketch benchmark.
Abstract: We propose a multi-scale multi-channel deep neural network framework that, for the first time, yields sketch recognition performance surpassing that of humans. Our superior performance is a result of explicitly embedding the unique characteristics of sketches in our model: (i) a network architecture designed for sketch rather than natural photo statistics, (ii) a multi-channel generalisation that encodes sequential ordering in the sketching process, and (iii) a multi-scale network ensemble with joint Bayesian fusion that accounts for the different levels of abstraction exhibited in free-hand sketches. We show that state-of-the-art deep networks specifically engineered for photos of natural objects fail to perform well on sketch recognition, regardless whether they are trained using photo or sketch. Our network on the other hand not only delivers the best performance on the largest human sketch dataset to date, but also is small in size making efficient training possible using just CPUs.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: A realtime Kinect-based dynamic hand gesture recognition (HGR) system which contains hand tracking, data processing, model training and gesture classification is proposed and shows efficiency with an average recognition rate of 95.42% and real-time performance of the method.
Abstract: The use of hand gestures provides an attractive alternative to cumbersome interface devices for Human-Computer Interaction (HCI). However, in dynamic gesture recognition area, hand tracking under a complicated environment and gesture spotting namely detecting the start and end point are the two most challenging topics. In our work, a realtime Kinect-based dynamic hand gesture recognition (HGR) system which contains hand tracking, data processing, model training and gesture classification is proposed. In the first stage, two states of the performed hand including open and closed are utilized to achieve gesture spotting and 3D motion trajectories of gestures are captured by Kinect sensor. Further, motion orientation is extracted as the unique feature and Support Vector Machine (SVM) is used as the recognition algorithm in the proposed system. The results of experiments conducted in our database containing 10 Arabic numbers from 0 to 9 and the 26 characters of alphabet show efficiency with an average recognition rate of 95.42% and real-time performance of our method.

Journal ArticleDOI
TL;DR: Experimental results on two challenging data sets demonstrate the proposed model is comparable with the state-of-the-art models and show that learning person-person interactions plays a critical role in collective activity recognition.
Abstract: Collective activity is a collection of atomic activities (individual person's activity) and can hardly be distinguished by an atomic activity in isolation. The interactions among people are important cues for recognizing collective activity. In this paper, we concentrate on modeling the person-person interactions for collective activity recognition. Rather than relying on hand-craft description of the person-person interaction, we propose a novel learning-based approach that is capable of computing the class-specific person-person interaction patterns. In particular, we model each class of collective activity by an interaction matrix, which is designed to measure the connection between any pair of atomic activities in a collective activity instance. We then formulate an interaction response (IR) model by assembling all these measurements and make the IR class specific and distinct from each other. A multitask IR is further proposed to jointly learn different person-person interaction patterns simultaneously in order to learn the relation between different person-person interactions and keep more distinct activity-specific factor for each interaction at the same time. Our model is able to exploit discriminative low-rank representation of person-person interaction. Experimental results on two challenging data sets demonstrate our proposed model is comparable with the state-of-the-art models and show that learning person-person interactions plays a critical role in collective activity recognition.

01 Jan 2015
TL;DR: The concept of Recognition one phase of Speech Recognition Process using Hidden Markov Model has been discussed in this paper and the third phase of speech recognition process 'Recognition' and HiddenMarkov Model is studied in detail.
Abstract: The concept of Recognition one phase of Speech Recognition Process using Hidden Markov Model has been discussed in this paper. Preprocessing, Feature Extraction and Recognition three steps and Hidden Markov Model (used in recognition phase) are used to complete Automatic Speech Recognition System. Today's life human is able to interact with computer hardware and related machines in their own language. Research followers are trying to develop a perfect ASR system because we have all these advancements in ASR and research in digital signal processing but computer machines are unable to match the performance of their human utterances in terms of accuracy of matching and speed of response. In case of speech recognition the research followers are mainly using three different approaches namely Acoustic phonetic approach, Knowledge based approach and Pattern recognition approach. This paper's study is based on pattern recognition approach and the third phase of speech recognition process 'Recognition' and Hidden Markov Model is studied in detail.

Proceedings ArticleDOI
01 Oct 2015
TL;DR: A rapid recognition for dynamic hand gestures using leap motion is proposed, which contains the three-dimensional motion trajectory of the numbers and the alphabet which captured by utilizing a leap motion.
Abstract: Human Computer Interaction would be much more smooth with the implementation of rapid recognition, the aim of which is to recognize the hand gesture before it is completed In this paper, a rapid recognition for dynamic hand gestures using leap motion is proposed The database contains the three-dimensional motion trajectory of the numbers and the alphabet (36 gestures in total) which captured by utilizing a leap motion In order to enhance the effectiveness of rapid recognition, the SVM algorithm is utilized in the paper Experimental results show high recognition rate and accuracy of the proposed system

Journal ArticleDOI
TL;DR: This paper presents a comprehensive review of recent approaches to human action recognition based on depth maps, skeleton joints, and other hybrid approaches and focuses on the advantages and limitations of the existing approaches and on future directions.
Abstract: Human action recognition from a video scene has remained a challenging problem in the area of computer vision and pattern recognition. The development of the low-cost RGB depth camera (RGB-D) allows new opportunities to solve the problem of human action recognition. In this paper, we present a comprehensive review of recent approaches to human action recognition based on depth maps, skeleton joints, and other hybrid approaches. In particular, we focus on the advantages and limitations of the existing approaches and on future directions.

Posted Content
TL;DR: This paper introduces a freehand sketch recognition framework based on "deep" features extracted from CNNs, and provides a preliminary glimpse of how such features can help identify crucial attributes of the sketched objects.
Abstract: Freehand sketches often contain sparse visual detail. In spite of the sparsity, they are easily and consistently recognized by humans across cultures, languages and age groups. Therefore, analyzing such sparse sketches can aid our understanding of the neuro-cognitive processes involved in visual representation and recognition. In the recent past, Convolutional Neural Networks (CNNs) have emerged as a powerful framework for feature representation and recognition for a variety of image domains. However, the domain of sketch images has not been explored. This paper introduces a freehand sketch recognition framework based on "deep" features extracted from CNNs. We use two popular CNNs for our experiments -- Imagenet CNN and a modified version of LeNet CNN. We evaluate our recognition framework on a publicly available benchmark database containing thousands of freehand sketches depicting everyday objects. Our results are an improvement over the existing state-of-the-art accuracies by 3% - 11%. The effectiveness and relative compactness of our deep features also make them an ideal candidate for related problems such as sketch-based image retrieval. In addition, we provide a preliminary glimpse of how such features can help identify crucial attributes (e.g. object-parts) of the sketched objects.

Journal ArticleDOI
TL;DR: This paper studies how multiple Gestalt rules can be encapsulated into a unified perceptual grouping framework for sketch generation and shows that by solving the problem of Gestalt confliction, more similar to human-made sketches can be generated.

Proceedings ArticleDOI
07 Aug 2015
TL;DR: A sketch-based posing system for rigged 3D characters that allows artists to create custom sketch abstractions on top of a character's actual shape, and enables a new form of intuitive sketch- based posing in which the character designer has the freedom to prescribe the sketch abstraction that is most meaningful for the character.
Abstract: We propose a sketch-based posing system for rigged 3D characters that allows artists to create custom sketch abstractions on top of a character's actual shape. A sketch abstraction is composed of rigged curves that form an iconographic 2D representation of the character from a particular viewpoint. When provided with a new input sketch, our optimization system minimizes a nonlinear iterative closest point energy to find the rigging parameters that best align the character's sketch abstraction to the input sketch. A custom regularization term addresses the underconstrained nature of the problem to select favorable poses. Although our system supports arbitrary black-box rigs, we show how to optimize computations when rigging formulas and derivatives are available. We demonstrate our system's flexibility with examples showing different artist-designed sketch abstractions for both full body posing and the customization of individual components of a modular character. Finally, we show that simple sketch abstractions can be built on the fly by projecting a drawn curve onto the character's mesh. Redrawing the curve allows the user to dynamically pose the character. Taken together, our system enables a new form of intuitive sketch-based posing in which the character designer has the freedom to prescribe the sketch abstraction that is most meaningful for the character.

BookDOI
01 Jan 2015
TL;DR: This work test the most common imputation methods used in the literature for filling missing records in the ADNI (Alzheimer's Disease Neuroimaging Initiative) data set, which affects about 80% of the patients, and shows the importance of using imputation procedures to achieve higher accuracy and robustness in the classification.
Abstract: In real-world applications it is common to find data sets whose records contain missing values. As many data analysis algorithms are not designed to work with missing data, all variables associated with such records are generally removed from the analysis. A better alternative is to employ data imputation techniques to estimate the missing values using statistical relationships among the variables. In this work, we test the most common imputation methods used in the literature for filling missing records in the ADNI (Alzheimer’s Disease Neuroimaging Initiative) data set, which affects about 80% of the patients–making unwise the removal of most of the data. We measure the imputation error of the different techniques and then evaluate their impact on classification performance. We train support vector machine and random forest classifiers using all the imputed data as opposed to a reduced set of samples having complete records, for the task of discriminating among different stages of the Alzheimer’s disease. Our results show the importance of using imputation procedures to achieve higher accuracy and robustness in the classification.

Proceedings Article
25 Jan 2015
TL;DR: Maestoso first automatically recognizes students' sketched input of quizzed concepts, then relies on existing sketch and gesture recognition techniques to automatically recognize the input, and finally generates instructor-emulated feedback.
Abstract: Learning music theory not only has practical benefits for musicians to write, perform, understand, and express music better, but also for both non-musicians to improve critical thinking, math analytical skills, and music appreciation. However, current external tools applicable for learning music theory through writing when human instruction is unavailable are either limited in feedback, lacking a written modality, or assuming already strong familiarity of music theory concepts. In this paper, we describe Maestoso, an educational tool for novice learners to learn music theory through sketching practice of quizzed music structures. Maestoso first automatically recognizes students' sketched input of quizzed concepts, then relies on existing sketch and gesture recognition techniques to automatically recognize the input, and finally generates instructor-emulated feedback. From our evaluations, we demonstrate that Maestoso performs reasonably well on recognizing music structure elements and that novice students can comfortably grasp introductory music theory in a single session.

Proceedings ArticleDOI
01 Nov 2015
TL;DR: The general architecture of modern OCR system with details of each module is discussed, and Moore neighborhood tracing is applied for extracting boundary of characters and then chain rule for feature extraction.
Abstract: Artificial intelligence, pattern recognition and computer vision has a significant importance in the field of electronics and image processing. Optical character recognition (OCR) is one of the main aspects of pattern recognition and has evolved greatly since its beginning. OCR is a system which recognized the readable characters from optical data and converts it into digital form. Various methodologies have been developed for this purpose using different approaches. In this paper, general architecture of modern OCR system with details of each module is discussed. We applied Moore neighborhood tracing for extracting boundary of characters and then chain rule for feature extraction. In the classification stage for character recognition, SVM is trained and is applied on suitable example.

Proceedings ArticleDOI
07 Dec 2015
TL;DR: Zhang et al. as discussed by the authors used object-scene convolutional neural networks (OS-CNNs) to perform event recognition from still images and achieved the third position at the ICCV ChaLearn Looking at People (LAP) challenge 2015.
Abstract: Event recognition from still images is one of the most important problems for image understanding. However, compared with object recognition and scene recognition, event recognition has received much less research attention in computer vision community. This paper addresses the problem of cultural event recognition in still images and focuses on applying deep learning methods on this problem. In particular, we utilize the successful architecture of Object-Scene Convolutional Neural Networks (OS-CNNs) to perform event recognition. OS-CNNs are composed of object nets and scene nets, which transfer the learned representations from the pre-trained models on large-scale object and scene recognition datasets, respectively. We propose four types of scenarios to explore OS-CNNs for event recognition by treating them as either "end-to-end event predictors" or "generic feature extractors". Our experimental results demonstrate that the global and local representations of OS-CNNs are complementary to each other. Finally, based on our investigation of OS-CNNs, we come up with a solution for the cultural event recognition track at the ICCV ChaLearn Looking at People (LAP) challenge 2015. Our team secures the third place at this challenge and our result is very close to the best performance.

Journal ArticleDOI
TL;DR: The sketching with words (SWW) technique is applied to design a system that can simulate a face sketch expert and different types of face have generated after applying ‘fairly’ and ‘very’ linguistic hedges on face components.
Abstract: The face sketch of the criminal may be one of the crucial evidence in catching the criminal. Face sketch is drawn by the sketch expert on the basis of onlooker’s statement, which is about different human face parts like forehead, eyes, nose, and chin etc. These statements are full of uncertainties e.g. ‘His eyes were not fairly small’. Since the precise interpretation of these natural language statements is a very difficult task. So we need a system that can convert imprecise face description, into a complete face. Therefore we have applied the sketching with words (SWW) technique to design a system that can simulate a face sketch expert. SWW is a methodology in which the objects of computation are fuzzy geometric objects e.g. fuzzy line, fuzzy circle, fuzzy triangle, and fuzzy parallel. These fuzzy objects (f-objects) are formalized by fuzzy geometry (f-geometry) of Zadeh. SWW is inspired by computing with words and fuzzy geometry. Since the onlooker has to granulate face into granule label. Hence the concept of fuzzy granule has applied for face recognition. Different types of face have generated after applying ‘fairly’ and ‘very’ linguistic hedges on face components.

Journal ArticleDOI
12 Nov 2015-Sensors
TL;DR: A novel approach for hand gesture recognition with depth maps generated by the Microsoft Kinect Sensor is proposed using a variation of the CIPBR (convex invariant position based on RANSAC) algorithm and a hybrid classifier composed of dynamic time warping (DTW) and Hidden Markov models (HMM), called the hybrid approach for gesture recognitionWith depth maps (HAGR-D).
Abstract: The hand is an important part of the body used to express information through gestures, and its movements can be used in dynamic gesture recognition systems based on computer vision with practical applications, such as medical, games and sign language. Although depth sensors have led to great progress in gesture recognition, hand gesture recognition still is an open problem because of its complexity, which is due to the large number of small articulations in a hand. This paper proposes a novel approach for hand gesture recognition with depth maps generated by the Microsoft Kinect Sensor (Microsoft, Redmond, WA, USA) using a variation of the CIPBR (convex invariant position based on RANSAC) algorithm and a hybrid classifier composed of dynamic time warping (DTW) and Hidden Markov models (HMM), called the hybrid approach for gesture recognition with depth maps (HAGR-D). The experiments show that the proposed model overcomes other algorithms presented in the literature in hand gesture recognition tasks, achieving a classification rate of 97.49% in the MSRGesture3D dataset and 98.43% in the RPPDI dynamic gesture dataset.

Proceedings ArticleDOI
06 Jun 2015
TL;DR: An embedded virtual mouse system by using hand gesture recognition is proposed and the result shows that the system can operate well even in some harsh environment.
Abstract: In the digital information time, daily life is inseparable with human-computer interface (HCI) Human computer interaction has a long history to become more intuitive For human being, hand gesture of different kind is one of the most intuitive and common communication However, vision-based hand gesture recognition is still a challenging problem In this paper, an embedded virtual mouse system by using hand gesture recognition is proposed There are several techniques involved in the proposed system Skin detection and motion detection method are used to capture the region-of-interest and distinguish the foreground/background area Connected component labeling algorithm is used to identify the centroid of an object The removal on arm and the convex hull algorithm are used to recognize hand area as well as the related gesture The result shows that our system can operate well even in some harsh environment

Journal ArticleDOI
TL;DR: This work demonstrates that the active learning technology can be used to reduce the amount of manual annotation required to achieve a target recognition accuracy, and shows that by annotating few, but carefully selected examples, it can surpass accuracies achievable with equal number of arbitrarily selected examples.

Dissertation
01 Jan 2015
TL;DR: This thesis proposes a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses are correct or not, and proposes two new convolutional neural network architectures, and shows how optical flow can be employed in Convolutional nets to further improve the predictions.
Abstract: This thesis presents new methods in two closely related areas of computer vision: human pose estimation, and gesture recognition in videos. In human pose estimation, we show that random forests can be used to estimate human pose in monocular videos. To this end, we propose a co-segmentation algorithm for segmenting humans out of videos, and an evaluator that predicts whether the estimated poses are correct or not. We further extend this pose estimator to new domains (with a transfer learning approach), and enhance its predictions by predicting the joint positions sequentially (rather than independently) in an image, and using temporal information in the videos (rather than predicting the poses from a single frame). Finally, we go beyond random forests, and show that convolutional neural networks can be used to estimate human pose even more accurately and efficiently. We propose two new convolutional neural network architectures, and show how optical flow can be employed in convolutional nets to further improve the predictions. In gesture recognition, we explore the idea of using weak supervision to learn gestures. We show that we can learn sign language automatically from signed TV broadcasts with subtitles by letting algorithms 'watch' the TV broadcasts and 'match' the signs with the subtitles. We further show that if even a small amount of strong supervision is available (as there is for sign language, in the form of sign language video dictionaries), this strong supervision can be combined with weak supervision to learn even better models.

Proceedings ArticleDOI
01 Dec 2015
TL;DR: Extensive experimental results show that the deep learning method is significantly superior to handcrafted features, and with the near-frontal pose constraint, human-level recognition accuracy is achievable.
Abstract: Recent automatic facial expression recognition research has focused on optimizing performance on a few databases that were collected under controlled pose and lighting conditions, and has produced nearly perfect accuracy. This paper explores the necessary characteristics of the training dataset, feature representations and machine learning algorithms for a system that operates reliably in more realistic conditions. A new database, Real-world Affective Face Database (RAF-DB), is presented which contains about 30,000 greatly-diverse facial images from social networks. Crowdsourcing results suggest that real-world expression recognition problem is a typical imbalanced multi-label classification problem, and the balanced, single-label datasets currently used in the literature could potentially lead research into misleading algorithmic solutions. A deep learning architecture, DeepEmo, is proposed to address the real-world challenge of emotion recognition by learning the highlevel feature representations which are highly effective for discriminating realistic facial expressions. Extensive experimental results show that the deep learning method is significantly superior to handcrafted features, and with the near-frontal pose constraint, human-level recognition accuracy is achievable.