scispace - formally typeset
Search or ask a question

Showing papers on "Sketch recognition published in 2000"


Journal ArticleDOI
01 Jul 2000
TL;DR: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role, which is the underlying philosophy of the Devanagari document recognition system described in this work.
Abstract: The reading process has been widely studied and there is a general agreement among researchers that knowledge in different forms and at different levels plays a vital role. This is the underlying philosophy of the Devanagari document recognition system described in this work. The knowledge sources we use are mostly statistical in nature or in the form of a word dictionary tailored specifically for optical character recognition (OCR). We do not perform any reasoning on these. However, we explore their relative importance and role in the hierarchy. Some of the knowledge sources are acquired a priori by an automated training process while others are extracted from the text as it is processed. A complete Devanagari OCR system has been designed and tested with real-life printed documents of varying size and font. Most of the documents used were photocopies of the original. A performance of approximately 90% correct recognition is achieved.

132 citations


Journal ArticleDOI
TL;DR: A multi-layered architecture for sketch-based interaction within virtual environments focused on table-like projection systems as human-centered output-devices to make sketching an integral part of the next-generation human–computer interface.

82 citations


Proceedings ArticleDOI
30 Jul 2000
TL;DR: This paper examines a multimodal meeting room system under development at Carnegie Mellon University that enables us to track, capture and integrate the important aspects of a meeting from people identification to meeting transcription.
Abstract: Face-to-face meetings usually encompass several modalities including speech, gesture, handwriting, and person identification. Recognition and integration of each of these modalities is important to create an accurate record of a meeting. However, each of these modalities presents recognition difficulties. Speech recognition must be speaker and domain independent, have low word error rates, and be close to real time to be useful. Gesture and handwriting recognition must be writer independent and support a wide variety of writing styles. Person identification has difficulty with segmentation in a crowded room. Furthermore, in order to produce the record automatically, we have to solve the assignment problem (who is saying what), which involves people identification and speech recognition. This paper examines a multimodal meeting room system under development at Carnegie Mellon University that enables us to track, capture and integrate the important aspects of a meeting from people identification to meeting transcription. Once a multimedia meeting record is created, it can be archived for later retrieval.

48 citations


Proceedings ArticleDOI
01 Jan 2000
TL;DR: Four architectures for gesture-based interaction between a human being and an autonomous mobile robot using the above mentioned techniques or a hybrid combination of them are presented.
Abstract: Several systems for automatic gesture recognition have been developed using different strategies and approaches. In these systems the recognition engine is mainly based on three algorithms: dynamic pattern matching, statistical classification, and neural networks (NN). In that paper we present four architectures for gesture-based interaction between a human being and an autonomous mobile robot using the above mentioned techniques or a hybrid combination of them. Each of our gesture recognition architecture consists of a preprocessor and a decoder. Three different hybrid stochastic/connectionist architectures are considered. A template matching problem by making use of dynamic programming techniques is dealt with; the strategy is to find the minimal distance between a continuous input feature sequence and the classes. Preliminary experiments with our baseline system achieved a recognition accuracy up to 92%. All systems use input from a monocular color video camera, and are user-independent but so far they are not in real-time yet.

48 citations


Journal ArticleDOI
TL;DR: In this paper, an emotion recognition algorithm based on a neural network and also a method to collect a large speech database containing emotions was proposed to recognize emotions involved in human speech and applied to a computer agent that played a character role in an interactive movie system.
Abstract: In this paper, we first study the recognition of emotions involved in human speech We propose an emotion recognition algorithm based on a neural network and also propose a method to collect a large speech database that contains emotions We carried out emotion recognition experiments based on the neural network trained using this database An emotion recognition rate of approximately 50% was obtained in a speaker-independent mode for eight emotion states We then tried to apply this emotion recognition algorithm to a computer agent that plays a character role in the interactive movie system we are developing We propose to use emotion recognition as key technology for an architecture of the computer characters with both narrative-based and spontaneous interaction capabilities

39 citations


Proceedings ArticleDOI
01 Sep 2000
TL;DR: A recognition system that classifies four kinds of human interactions: shaking hands, pointing at the opposite person, standing hand-in-hand, and an intermediate/transitional state between them with no parsing procedure for sequential data is presented.
Abstract: This paper presents a recognition system that classifies four kinds of human interactions: shaking hands, pointing at the opposite person, standing hand-in-hand, and an intermediate/transitional state between them. Our system achieves recognition by applying the K-nearest neighbor classifier to the parametric human-interaction model, which describes the interpersonal configuration with multiple features from gray scale images (i.e., binary blob, silhouette contour, and intensity distribution). Unlike the algorithms that use temporal information about motion, our system independently classifies each frame by estimating the relative poses of the interacting persons. The system provides a tool to detect the initiation and the termination of an interaction with no parsing procedure for sequential data. Experimental results are presented and illustrated.

34 citations


Book ChapterDOI
TL;DR: Recognition techniques that are discussed include blackboard systems, stochastic Grammars, Hidden Markov Models, and graph grammars for diagram recognition.
Abstract: Document image analysis is the study of converting documents from paper form to an electronic form that captures the information content of the document. Necessary processing includes recognition of document layout (to determine reading order, and to distinguish text from diagrams), recognition of text (called Optical Character Recognition, OCR), and processing of diagrams and photographs. The processing of diagrams has been an active research area for several decades. A selection of existing diagram recognition techniques are presented in this paper. Challenging problems in diagram recognition include (1) the great diversity of diagram types, (2) the difficulty of adequately describing the syntax and semantics of diagram notations, and (3) the need to handle imaging noise. Recognition techniques that are discussed include blackboard systems, stochastic grammars, Hidden Markov Models, and graph grammars.

30 citations


Journal ArticleDOI
TL;DR: In this paper, the computational formulas for evaluating the recognition rates of parts and their combinations are derived, and a number of fascinating results have been reported.

19 citations


Proceedings ArticleDOI
01 Jan 2000
TL;DR: A series of experiments on a previously described object recognition system try to see, if any, which design axes of such systems hold the greatest potential for improving performance, and conclude that the greatest leverage lies at the level of intermediate feature construction.
Abstract: Appearance-based object recognition systems are currently the most successful approach for dealing with 3D recognition of arbitrary objects in the presence of clutter and occlusion. However, no current system seems directly scalable to human performance levels in this domain. We describe a series of experiments on a previously described object recognition system that try to see, if any, which design axes of such systems hold the greatest potential for improving performance. We look at the potential effect of different design modifications, and conclude that the greatest leverage lies at the level of intermediate feature construction.

17 citations


Journal ArticleDOI
TL;DR: A new method for user-independent gesture recognition from time-varying images using relative motion-dependent feature extraction, together with discriminant analysis and dynamically updated buffer structures for providing online learning/recognition abilities.

14 citations


Proceedings ArticleDOI
10 Sep 2000
TL;DR: A fuzzy relational adjacency grammars are used to provide a natural handling of fuzzy logic and spatial relation syntax in a single unified formalism to support document layout sketching in a simple way.
Abstract: We present a visual approach to layout documents as hand-drawn compositions of simple geometric shapes. This approach is based on a grammatical method to support document design through sketch recognition which explicitly addresses visual ambiguity. We use fuzzy relational adjacency grammars to provide a natural handling of fuzzy logic and spatial relation syntax in a single unified formalism. Fuzzy relations enable us to replace spatial constraints such as "a is above b" or "a is parallel to c" by quantities that express a degree of uncertainty. Their use allows us to associate a "measure of goodness" to all data and to intermediate and final results. We developed a prototype application that supports document layout sketching in a simple way.

Journal ArticleDOI
TL;DR: The feature extraction methods investigated are oriented Gaussian filters, Gabor filters and oriented Laplacian of Gaussian (L∘G) filters, which are shown to compare favourably with other techniques designed specifically for the two tasks.

Proceedings ArticleDOI
30 Jul 2000
TL;DR: A working system inspired by infant language learning which learns from untranscribed speech and images is presented and explores the idea of learning from unannotated data by leveraging information across multiple modes of input.
Abstract: Human-computer interaction based on recognition of speech, gestures, and other natural modalities is on the rise. Recognition technologies are typically developed in a statistical framework and require large amounts of training data. The cost of collecting manually annotated data is usually the bottleneck in developing such systems. We explore the idea of learning from unannotated data by leveraging information across multiple modes of input. A working system inspired by infant language learning which learns from untranscribed speech and images is presented.

Journal ArticleDOI
TL;DR: The proposed method for user-independent gesture recognition from time-varying images uses relative-motion extraction and discriminant analysis for providing online learning/recognition abilities and is computationally inexpensive which allows real-time operation on a personal computer.
Abstract: We propose a new method for user-independent gesture recognition from time-varying images. The method uses relative-motion extraction and discriminant analysis for providing online learning/recognition abilities. Efficient and robust extraction of motion information is achieved. The method is computationally inexpensive which allows real-time operation on a personal computer. The performance of the proposed method has been tested with several data sets and good generalization abilities have been observed: it is robust to changes in background and illumination conditions, to users’ external appearance and changes in spatial location, and successfully copes with the non-uniformity of the performance speed of the gestures. No manual segmentation of any kind, or use of markers, etc. is necessary. Having the above-mentioned features, the method could be successfully used as a part of more refined human-computer interfaces.

Proceedings ArticleDOI
17 Oct 2000
TL;DR: The wavelet transform is used to generate features for the recognition of hand gestures based on contours in computer vision systems based on shape analysis tools introduced by R.M. Cesar Jr and L. da F. Costa (1997).
Abstract: This paper discusses an ongoing project for hand gesture recognition in computer vision systems. The proposed approach is based on the shape analysis tools introduced by R.M. Cesar Jr. and L. da F. Costa (1997). More specifically, the wavelet transform is used to generate features for the recognition of hand gestures based on contours.

01 Jan 2000
TL;DR: The lack of computer systems that can be easily used during the early stages of the architectural design process has been discussed for many years and some systems allowing user to sketch in digital 3D space have been developed which do not depend on sketch recognition.
Abstract: The lack of computer systems that can be easily used during the early stages of the architectural design process has been discussed for many years. The usual argument starts with the recognition that hand drawn sketches are an important tool in the early stage of both professional and student design because they can be used to visualise the designer’s ideas quickly and have the flexibility to handle any shape the designer imagines. Research has then mostly focused on using computer based sketch recognition to directly produce three dimensional models from hand drawn sketches. However sketch recognition still has certain problems that require the drawing action of users to be constrained in some way in order to be solved. If sketch recognition is still imperfect, the possibility of directly sketching within digital 3D space should be considered. Some systems allowing user to sketch in digital 3D space have been developed which do not depend on sketch recognition. Although Piranesi does not aim to support sketch design, it does allow the user to paint in the Z-buffer space - an unique idea termed "interactive rendering." SketchVRML tries to generate 3D geometrical data automatically from 2D hand drawn sketches by adding the depth value to the drawn lines according to the strength of line strokes. SketchBoX provides translucent surfaces in digital 3D space which can be glued onto existing objects or arranged anywhere in space. These surfaces have texture map data which can be modified by painting onto the texture. Transparent textures can be painted onto the surfaces to create see-through portions. Moderato also uses this technique to model a polygonOs shape.


Journal ArticleDOI
01 Feb 2000
TL;DR: The main theme of the paper is the automatic recognition of hand-printed Arabic characters using machine learning.
Abstract: Character recognition systems can contribute tremendously to the advancement of the automation process and can improve the interaction between man and machine in many applications, including office automation, check verification and a large variety of banking, business and data entry applications. The main theme of the paper is the automatic recognition of hand-printed Arabic characters using machine learning. Conventional methods have relied on hand-constructed dictionaries which are tedious to construct and difficult to make tolerant to variation in writing styles. The advantages of machine learning are that it can generalize over the large degree of variation between writing styles and recognition rules can be constructed by example. The system was tested on a sample of handwritten characters from several individuals whose writing ranged from acceptable to poor in quality and the correct average recognition rate obtained using cross-validation was 89.65%.

01 Jan 2000
TL;DR: This thesis presents a model-based algorithm that tracks hand movements for the recognition of gestures that makes use of a ‘free moving’ visual approach, which is inexpensive compared to the cumbersome virtual reality data-gloves currently available.
Abstract: This thesis presents a model-based algorithm that tracks hand movements for the recognition of gestures. The system makes use of a ‘free moving’ visual approach, which is inexpensive compared to the cumbersome virtual reality data-gloves currently available. The approach to this thesis is twofold. First, a study of human generated gestures is undertaken to determine if gestures can in fact be used for communication and device control. The results also offer a few architectural guidelines for designing a recognition system. The second part discusses the development of the algorithm itself. This system does not make use of colour information, instead, edge detection is used to locate the various body features. Two categories of gestures are recognised: static eg. pointing, and motion eg. waving. A frame rate of 8Hz has been achieved, and a total of 15 gestures can be recognised.

Proceedings ArticleDOI
01 Sep 2000
TL;DR: A distributed neural network architecture (DNNA) for object recognition is presented and a selection threshold is finally used to select from this list the objects that most resemble the objects on the image.
Abstract: A distributed neural network architecture (DNNA) for object recognition is presented. The proposed architecture is tested in two scenarios: occluded planar object recognition and face recognition. The DNNA is composed of several classifiers, each one with a standard ART2 neural network (ART2-NN) connected to a memory map (MM), a set of logical AND gates, an evidence register, and a set of comparators. In a first step, objects are described by a set of sub-feature vectors (SFVs), during the training stage, each SFV is then fed to an ART2-NN to train it and to build its corresponding memory map (MM). During a second phase of indexing a new image possibly containing the object is used to retrieve from the previously constructed MM the list of candidate objects that are in the image. A selection threshold is finally used to select from this list the objects that most resemble the objects on the image.

Proceedings ArticleDOI
03 Sep 2000
TL;DR: The method using linear combination is simple and needs only a few learning samples and is able to distinguish objects with very similar patterns and is more accurate than other conventional methods in the literature.
Abstract: Presents a method for visualization, understanding and recognition of artificial objects. The method using linear combination is simple and needs only a few learning samples. Furthermore, it can strengthen the advantages of conventional methods while overcoming their drawbacks. Also, it is able to distinguish objects with very similar patterns and is more accurate than other conventional methods in the literature. Four experiments are included to demonstrate the method's simplicity and accuracy.

Proceedings ArticleDOI
13 Jul 2000
TL;DR: In this paper, one is interested in the handwriting recognition which is one of the most difficult problems because of the diversity, and the no homogeneity of writing.
Abstract: A language is a set of symbols, signs, sounds, .... Each language has its own signs (Arab, Japanese, Latin....) that people discern easily. In fact, one gives these symbols (isolated or regrouped in a words or sentences) senses or meaning. It is a means of communication to describe our desires, our thought, our needs, .... Nevertheless, there are always basic references through which one communicates. These references are not fixed. They evolve with time, knowledge, experience.... Thus, machines "which understand human languages" have to recognize a given situation "in different sentences". This language can be written (handwritten recognition) or spoken (signal processing) or with gestures (computer vision) and grimaces (face recognition). In this paper, one is interested in the handwriting recognition which is one of the most difficult problems because of the diversity, and the no homogeneity of writing. In fact, to solve this problem, one is going to use the notion of references excessively.