scispace - formally typeset
Search or ask a question

Showing papers on "Sketch recognition published in 2002"


01 Jan 2002
TL;DR: Besides a measurable speed advantage in drawing interfaces, users found the JavaSketchIt system more comfortable, natural and intuitive to use, than the competing product, as demonstrated by post‐experiment inquiries.
Abstract: We present a visual approach to layout static components of user interfaces as hand-drawn compositions of simple geometric shapes, based on sketch recognition. We have defined a visual grammar using drawing data from target users, where we tried to figure out how people sketch interfaces and what combinations of shapes are more commonly used to define widgets. From these we built our grammar and implemented a prototype, JavaSketchIt, that allows creating user interfaces through hand-drawn geometric shapes, identified by a gesture recognizer. This prototype generates a Java interface, whose layout can be beautified using an a posteriori set of grammar rules (e.g. to align and group objects). To validate our approach, we conducted usability studies to compare our approach with a commercial system (JBuilder). Besides a measurable speed advantage in drawing interfaces, users found our system more comfortable, natural and intuitive to use, than the competing product, as demonstrated by post‐experiment inquiries.

102 citations


01 Jan 2002
TL;DR: An algorithm for ink parsing that uses a statistical model to disambiguate and a declarative grammar for the language, generating a model from the grammar, and training the model on drawing examples is developed.
Abstract: In this paper we motivate a new technique for automatic recognition of hand-sketched digital ink. By viewing sketched drawings as utterances in a visual language, sketch recognition can be posed as an ambiguous parsing problem. On this premise we have developed an algorithm for ink parsing that uses a statistical model to disambiguate. Under this formulation, writing a new recognizer for a visual language is as simple as writing a declarative grammar for the language, generating a model from the grammar, and training the model on drawing examples. We evaluate the speed and accuracy of this approach for the sample domain of the SILK visual language and report positive initial results.

101 citations


01 Jan 2002
TL;DR: This work presents an architecture to support the development of robust recognition systems across multiple domains that maintains a separation between low-level shape information and high-level domain-specific context information, but uses the two sources of information together to improve recognition accuracy.
Abstract: People use sketches to express and record their ideas in many domains, including mechanical engineering, software design, and information architecture. Unfortunately, most computer programs cannot interpret free-hand sketches; designers transfer their sketches into computer design tools through menu-based interfaces. The few existing sketch recognition systems either tightly constrain the user’s drawing style or are fragile and difficult to construct. In previous work we found that domain knowledge can aid recognition. Here we present an architecture to support the development of robust recognition systems across multiple domains. Our architecture maintains a separation between low-level shape information and high-level domain-specific context information, but uses the two sources of information together to improve recognition accuracy.

78 citations


Journal ArticleDOI
TL;DR: The goal is to characterize the present state of artificial recognition technologies for these tasks, the influence of neuroscience on the design of these systems and the key challenges they face.
Abstract: How the brain recognizes complex patterns in the environment is a central, but little understood question in neuroscience. The problem is of great significance for a host of applications such as biometric-based access control, autonomous robots and content-based information management. Although some headway in these directions has been made, the current artificial systems do not match the robustness and versatility of their biological counterparts. Here I examine recognition tasks drawn from two different sensory modalities—face recognition and speaker/speech recognition. The goal is to characterize the present state of artificial recognition technologies for these tasks, the influence of neuroscience on the design of these systems and the key challenges they face.

67 citations


01 Jan 2002
TL;DR: It is argued that useful sketch recognition may be within the grasp of current research, if these requirements are addressed systematically and in concert.
Abstract: This paper discusses the problem of matching models of curvilinear configurations to hand-drawn sketches. It collects observations from our own recent research, which focused initially on the domain of sketched human stick figures in diverse postures, as well as related computer vision literature. Sketch recognition, i.e., labeling strokes in the input with the names of the model parts they depict, would be a key component of higher-level sketch understanding processes that reason about the recognized configurations. A sketch recognition technology must meet three main requirements. It must cope reliably with the pervasive variability of hand sketches, provide interactive performance, and be easily extensible to new configurations. We argue that useful sketch recognition may be within the grasp of current research, if these requirements are addressed systematically and in concert.

60 citations


Book ChapterDOI
08 Feb 2002
TL;DR: A new modeling and rendering system that enables users to construct 3D models with an interface that seems no different from sketching by hand, and that displays models in a sketch-like style, preserving the features of the user's strokes is proposed.
Abstract: We propose a new modeling and rendering system that enables users to construct 3D models with an interface that seems no different from sketching by hand, and that displays models in a sketch-like style, preserving the features of the user's strokes. We call this system 3D SKETCH. To reconstruct 3D objects from sketches, we limit the domain of renderable sketches and prepare a template for interpreting sketches. As long as a sketch can be matched to such a template, the system can reconstruct a mesh model from the sketch. The system collects information about strokes made, and uses that information for our rendering scheme.

60 citations


Proceedings ArticleDOI
20 Apr 2002
TL;DR: A preliminary experiment to assist the user in giving directions for urban navigation by combining partial results from unreliable speech recognition and unreliable visual recognition is described.
Abstract: Recognition technologies such as speech recognition and optical recognition are still, by themselves, not reliable enough for many practical uses in user interfaces. However, by combining input from several sources, each of which may be unreliable by itself, and with knowledge of a specific task and context that the user is engaged in, we might achieve enough recognition to provide useful results. We describe a preliminary experiment to assist the user in giving directions for urban navigation by combining partial results from unreliable speech recognition and unreliable visual recognition.

35 citations


Journal ArticleDOI
TL;DR: The MTC posterior estimator is based on a coordinated set of divide-and-conquer estimators that derive from a three-tiered architectural structure corresponding to individual members, teams, and the overall committee, designed to reduce modeling uncertainty.
Abstract: When building a complex pattern recognizer with high-dimensional input features, a number of selection uncertainties arise. Traditional approaches to resolving these uncertainties typically rely either on the researcher's intuition or performance evaluation on validation data, both of which result in poor generalization and robustness on test data. This paper describes a novel recognition technique called members to teams to committee (MTC), which is designed to reduce modeling uncertainty. In particular, the MTC posterior estimator is based on a coordinated set of divide-and-conquer estimators that derive from a three-tiered architectural structure corresponding to individual members, teams, and the overall committee. Basically, the MTC recognition decision is determined by the whole empirical posterior distribution, rather than a single estimate. This paper describes the application of the MTC technique to handwritten gesture recognition and multimodal system integration and presents a comprehensive analysis of the characteristics and advantages of the MTC approach.

32 citations


01 Jan 2002
TL;DR: This paper provides an overview of six current pieces of work at the MIT AI Lab on the sketch recognition part of this overall goal, and the claim that interaction will be effortless only if the listener is smart: effortless interaction and invisible interfaces must be knowledge-based.
Abstract: The problem with software is not that it needs a good user interface, it needs to have no user interface. Interacting with software should — ideally — feel as natural, informal, rich, and easy as working with a human assistant. One key to this lies in enabling means of interacting with software that are similarly natural, informal, rich and easy. We are making it possible for people involved in design and planning tasks to sketch, gesture, and talk about their ideas (rather than type, point, and click), and have the computer system understand their messy freehand sketches, their casual gestures, and the fragmentary utterances that are part and parcel of such interaction. A second key lies in appropriate use each of the means of interaction. Our work to date has made it clear, for example, that different means are well suited to communicating different things: Geometry is best sketched, behavior and rationale are best described in words and gestures. A third key lies in the claim that interaction will be effortless only if the listener is smart: effortless interaction and invisible interfaces must be knowledge-based. If it is to make sense of informal sketches, the listener has to understand something about the domain and something about how freehand sketches are drawn. This paper provides an overview of six current pieces of work at the MIT AI Lab on the sketch recognition part of this overall goal.

26 citations


Journal ArticleDOI
J. Park1
TL;DR: An adaptive handwritten word recognition method based on interaction between flexible character classification and deductive decision making is presented and the experimental result shows that the proposed method has advantages in producing valid answers using the same number of features as conventional methods.
Abstract: An adaptive handwritten word recognition method is presented. A recursive architecture based on interaction between flexible character classification and deductive decision making is developed. The recognition process starts from the initial coarse level using a minimum number of features, then increases the discrimination power by adding other features adaptively and recursively until the result is accepted by the decision maker. For the computational aspect of a feasible solution, a unified decision metric, recognition confidence; is derived from two measurements: pattern confidence, evaluation of absolute confidence using shape features, and lexical confidence, evaluation of the relative string dissimilarity in the lexicon. Practical implementation and experimental results in reading the handwritten words of the address components of US mail pieces are provided. Up to a 4 percent improvement in recognition performance is achieved compared to a nonadaptive method. The experimental result shows that the proposed method has advantages in producing valid answers using the same number of features as conventional methods.

26 citations


Proceedings ArticleDOI
10 Dec 2002
TL;DR: Computer Assisted Visual Interactive Recognition (CAVIAR) draws on sequential pattern recognition, image database, expert systems, pen computing, and digital camera technology to recognize wildflowers and other families of similar objects more accurately than machine vision and faster than most laypersons.
Abstract: Computer Assisted Visual Interactive Recognition (CAVIAR) draws on sequential pattern recognition, image database, expert systems, pen computing, and digital camera technology. It is designed to recognize wildflowers and other families of similar objects more accurately than machine vision and faster than most laypersons. The novelty of the approach is that human perceptual ability is exploited through interaction with the image of the unknown object. The computer remembers the characteristics of all previously seen classes, suggests possible operator actions, and displays confidence scores based on already detected features. In one application, consisting of 80 test images of wildflowers, 10 laypersons averaged 80% recognition accuracy at 12 seconds per flower.

Proceedings ArticleDOI
11 Aug 2002
TL;DR: This work has developed a hand gesture recognition system, based on the shape analysis of static gestures, for human computer interaction purposes that uses modified Fourier descriptors for the classification of hand shapes in an interactive supervised way.
Abstract: We have developed a hand gesture recognition system, based on the shape analysis of static gestures, for human computer interaction purposes. Our appearance-based recognition uses modified Fourier descriptors for the classification of hand shapes. As always found in literature, such recognition systems consist of two phases: training and recognition. In our new practical approach, following the chosen appearance-based model, training and recognition is done in an interactive supervised way: the adaptation for untrained gestures is also solved by hand signals. Our experimental results with three different users are reported. Besides describing the recognition itself we demonstrate our interactive training method in a practical application.

Proceedings Article
01 Jan 2002
TL;DR: A new method is shown that under the running phase of the static hand-gesture recognition system users can interactive modify and learn hand gestures by the gesture motion, so they could improve the efficiency of the system.
Abstract: We have developed a static hand-gesture recognition system for the Human Computer Interaction based on shape analysis. This appearance-based recognition uses modified Fourier descriptors for the classification of hand shapes. Usually systems use two phases: training and running phase under the recognition. A new method is shown that under the running phase of the system users can interactive modify and learn hand gestures by the gesture motion, so they could improve the efficiency of the system. With this interactive learning algorithm our system is able to adapt to similar gestures of other users or small changing of hand posture. We will show a gesture recognition application applying these methods for the controlling of old film restoration. ∗ Hand Recognition Demo can be downloaded from http://www.knt.vein.hu/staff/licsara/

Proceedings ArticleDOI
10 Dec 2002
TL;DR: Experimental results demonstrate the effectiveness of human-like recognition for identifying actions and the superior performance of the proposed system with respect to conventional action recognitions systems.
Abstract: This paper proposes a human-like action recognition system which can output the result of human action recognition just like the case human does. The system targets actions associated with regular human activity such as walking or lying down, and uses three human recognition characteristics: using specific features of an action to recognize that action; recognition of simultaneous actions; and summarization of recognition results over a short time interval. Experimental results demonstrate the effectiveness of human-like recognition for identifying actions and the superior performance of the proposed system with respect to conventional action recognitions systems. Human-like recognition is expected to ensure smooth communication between humans and robots and enhances the support functionality.

Journal Article
TL;DR: The experimental result of the system shows that using the method of model matching based on the Hausdorff distance to realize the vision based static gesture recognition is feasible.
Abstract: With the development of the advanced techniques of human computer interaction(HCI), gesture recognition is becoming one of the key techniques of HCI. Due to some notable advantages of vision based gesture recognition(VGR), e.g. more naturalness to HCI, now VGR is an active research topic in the fields of image processing, pattern recognition, computer vision and others. The method of model matching using Hausdorff distance has the characters of low computing cost and strong adaptability. The system described in this paper applies the hausdorff distance for the first time to visually recognize the chinese finger alphabet(CFA) gestures(total 30 gestures) with the recognition features of edge pixels in the distance transform space. In order to improve the robust performance of the system, the modified hausdorff distance(MHD) has been proposed and applied in the recognition process. The average recognition rate of the system using MHD is up to 96 7% on the testing set. The experimental result of the system shows that using the method of model matching based on the Hausdorff distance to realize the vision based static gesture recognition is feasible.

Proceedings ArticleDOI
10 Dec 2002
TL;DR: Evidence suggests that computer sketch recognition may be unnecessary, and that efforts should be directed toward improving the human factors aspects of current CAD software to better support the needs of conceptual design.
Abstract: Sketching is widely considered to be an essential activity during conceptual design, and many argue that CAD tools should be faithful to the sketching metaphor for conceptual design. However, CAD tools have progressed significantly in recent years, and there is growing experimental evidence that existing CAD tools can be as effective as sketching. Recent research in cognitive psychology supports the idea that the sketching metaphor is not necessarily ideal, and that a 3D geometric modeling metaphor might better support human cognitive processes. Informal experiments in CAD modeling of sample geometric shapes reported in the sketch recognition literature shows that the two approaches are comparable. This evidence suggests that computer sketch recognition may be unnecessary, and that efforts should be directed toward improving the human factors aspects of current CAD software to better support the needs of conceptual design.

01 Jun 2002
TL;DR: A general framework for producing formative audio feedback for gesture recognition is presented, including the dynamic and semantic aspects of gestures, and Granular synthesis is used to present the audio display of the changing probabilities and observed states.
Abstract: A general framework for producing formative audio feedback for gesture recognition is presented, including the dynamic and semantic aspects of gestures. The beliefs states are probability density functions conditioned on the trajectories of the observed variables. We describe example implementations of gesture recognition based on Hidden Markov Models and a dynamic programming recognition algorithm. Granular synthesis is used to present the audio display of the changing probabilities and observed states.

01 Jan 2002
TL;DR: A part of this system that generates efficient bottom-up recognizers by compiling object descriptions is described, which will differ from existing architectures in many aspects, including a language for describing shapes, mechanisms for learning new shapes, and a blackboard based recognition architecture with top-down and bottom- up recognizers.
Abstract: The Problem: We use sketches as a medium for expressing ideas and saving thoughts. Sketching is especially common in early design as a means of communication, documentation and as a tool for stimulating thought. Despite the increasing availability of pen based PDAs and PCs, we still can’t interact with our devices via sketching as we do with people. As a group, we are building a generic multi-domain sketch recognition architecture to make computers sketch literate. This sketch recognition system will differ from existing architectures in many aspects, including a language for describing shapes, mechanisms for learning new shapes, and a blackboard based recognition architecture with top-down and bottom-up recognizers. Here we describe a part of this system that generates efficient bottom-up recognizers by compiling object descriptions.


Proceedings ArticleDOI
07 Oct 2002
TL;DR: A system to provide interaction between user and virtual humans using a data glove and an artificial neural network system responsible for the recognition of hand postures is presented.
Abstract: Interaction between human and computer has been used in a large scale in computer graphics and virtual reality. This paper presents a system to provide interaction between user and virtual humans. The system uses a data glove and an artificial neural network system responsible for the recognition of hand postures.

01 Jan 2002
TL;DR: A domain description language used to describe domain-specic information to a domain-independent sketch recognition system, primarily based on shape to ensure correlation between the drawn shape and the recognized shapes and to enable designers to draw the shapes as they would naturally.
Abstract: The Problem: Pervasive environments, complete with digital whiteboards and pocket PC’s, have increasingly included applications with sketchable interfaces. Sketch recognition applications built for the Oxygen platform include Ligature [4], Tahuti [6], and Assist [1] / Assistance [9]. To date, sketch recognition systems have been domain-specic, with the recognition details of the domain hard-coded into the system. A domain-independent recognition system is advantageous since it may be used for several domains, increasing the exibility and capabilities of a system. However, the system cannot identify the domain shapes if it doesn’t know that they are. In order to properly recognize a sketch of a particular domain, domain-specic information must be supplied to the domain-independent recognition system. Motivation: We propose a domain description language used to describe domain-specic information to a domain-independent sketch recognition system. The language is primarily based on shape to ensure correlation between the drawn shape and the recognized shapes. and to enable designers to draw the shapes as they would naturally. The language is different from other such languages because it can be also be to describe non-shape information, including display information, editing behavior, and drawing order. Previous Work: Shape description languages have been around for a long time [10]. These grammars have been studied widely within the eld or architecture, and many systems are still built using shape grammars [5]. However, they have been developed for design generation rather than recognition, and don’t provide for non-graphical information, such as stroke order, that may be helpful in recognition. Within the eld of sketch recognition, there have been other attempts to create shape languages for sketch recognition. [8] use a language to model and recognize stick gur es. The language currently is not hierarchical, making large objects cumbersome to describe. [3] use fuzzy relational grammars and [2] use BNF grammars to describe shape information. Both lack the ability to describe non-shape domain information such as stroke order or direction and editing behavior information. Approach: The difculties in determining the language’s components and syntax include ensuring that the language allows all common helpful domain information to be specied. The language must also encourage and facilitate the creation of correct programs. For instance, to encourage the reuse of geometric shape denitions, the language distinguishes between geometric shape denitions (shapes usable in many domains) and domain shapes (shapes specic to a domain). The language also provides abstract shape denitions that describe a class of similar shapes to prevent rewriting of identical attributes.


Proceedings ArticleDOI
04 Nov 2002
TL;DR: Experiments show that the proposed SRG-based approach is both efficient and effective for online composite graphics recognition in sketch-based graphics input systems.
Abstract: A spatial relation graph (SRG) and its partial matching method are proposed for online composite graphics representation and recognition. A conditional partial permutation strategy is also proposed to reduce the computational cost of matching two SRGs, which is originally an NP-complete problem as graph isomorphism is. Experiments show that the proposed SRG-based approach is both efficient and effective for online composite graphics recognition in sketch-based graphics input systems.

Proceedings ArticleDOI
26 Aug 2002
TL;DR: A dynamic grouping multi-class face recognition method, which has knowledge-increasable ability and can solve the problems of large classes face recognition and pattern classes dynamic extension, is presented.
Abstract: The paper presents a dynamic grouping multi-class face recognition method, which has knowledge-increasable ability and can solve the problems of large classes face recognition and pattern classes dynamic extension. By adopting multiple classifiers parallel working in the process of training and dynamic grouping recognition, the method can not only speed up calculation and improve the recognition rate but also achieve extension easily and freely. Experimental results prove its reasonableness and feasibility.

Proceedings ArticleDOI
05 Aug 2002
TL;DR: In the recognition experiment of faces the system recognized both shape and position with sufficient accuracy and Regarding the generalization ability the real-time two-D-SAN net system recognized scale changes of 80% and view plane rotation of the faces of up to /spl plusmn/4/spl sim/5 degrees.
Abstract: The "two-D spreading associative neural network" (two-D-SAN net), which is constructed based on the spatial recognition system in the brain not only recognizes the shape of an object irrespective of its position but also recognizes its position irrespective of its shape in the input pattern The original two-D-SAN net performed this recognition by off-line processing using still picture files We extended the original system to a real-time recognition system which performed the recognition by online processing using serial images from a video camera We investigated and evaluated the recognition characteristics of the real-time two-D-SAN net system In the recognition experiment of faces the system recognized both shape and position with sufficient accuracy Regarding the generalization ability the real-time two-D-SAN net system recognized scale changes of 80%/spl sim/120% and view plane rotation of the faces of up to /spl plusmn/4/spl sim/5 degrees

01 Jan 2002
TL;DR: The algorithm that has been derived uses neural-network-based feature detectors to identify localcharacteristic features of a flexible object and proves to be able to generalise to a considerable extent over instances that do not meet the requirements of the training set.
Abstract: The aim of the research described in this paper has been to take an investigative step towards the development o f a general framework for object recognition. The algorithm that has been derived as a result of these explorations uses neural-network-based feature detectors to identify localcharacteristic features of a flexible object. Recognition is a re sult of finding a configuration of features detected in a given imag e that closely resembles the structure of one of a set of known instances of the object. The experiments show that the described approach applied to the object classhorse produces good recognition results for instances that meet the requirements of the training set. Fu rthermore, the method proves to be able to generalise to a considerable extent over instances that do not meet these requi re-

Dissertation
01 Jan 2002
TL;DR: Reinforcement learning is applied both to automatic template generation from a model image and to template matching within the input image, and experiment results showed that the proposed set of algorithms are fast, efficient, and potentially robust.
Abstract: Object recognition, a branch of pattem recognition, is to identify and localize one or more objects in a given scene. We have to determine what is present and where it is within the input image. Although great achievements have been made during the last decades, currently existing object recognition techniques have shortcomings like unreliability and inefficiency, general inadaptability, manual template marking heavily influenced by human factors, inability to recognize an object without a model, and so on. Any recognition problem can be formulated as a searching process and has to be guided in a controlled maimer. All search problems involve optimization, so object recognition requires optimization and control techniques. Reinforcement leaming is leaming how to behave given a situation and possible actions to maximize the total expected reward in the long mn, and therefore needs to be optimized. Most pattem recognition techniques do not combine reinforcement learning for feature understanding. In this dissertation, reinforcement learning is applied both to automatic template generation from a model image and to template matching within the input image. The newly designed affme parameter estimation algorithm provides reliable results based on information contained at all feature point locations. The points are extracted in the scale-space using isophote curvature extreme points, which are invariant to affine transformations. The affme parameter estimation algorithm is applicable to any kind of franslations, rotations, and scales, and moderate occlusions and deformations of the object to be recognized. Experiment results showed that the proposed set of algorithms are fast, efficient, and potentially robust. The automatic template generation algorithm, an efficient contour tracing one in gray-level images, can also be used in object recognition without a model. This is a new research field, and a great amount of future work needs to be done before an intelligent recognition system, as efficient as the human vision system, can be developed.

01 Jan 2002
TL;DR: UML-type diagrams are selected because they are a de facto standard for depicting software applications, and many of the symbols used in class diagrams are quite similar, and hence, offer an interesting challenge for sketch recognition.
Abstract: tion may be voiced during a software design meeting. We can capture the spoken and visual software design meeting information by videotaping the meeting and any white-boards used. By indexing these videos, we make it easy to retrieve the videotaped information without watching the entire video from start to nish. Motivation: We want to allow software design meetings to continue as they are, with software designers discussing the design and drawing free-hand sketches of these designs on a white-board. Using our system, designers can sketch naturally, as we place few requirements on the sketcher. We recognize and interpret these diagrams using sketch recognition. Because the diagrams are interpreted, we provide natural editing capabilities to the designers, allowing the users to edit their original strokes in an intuitive way. For instance, the designer can drag their drawn class from the center and move all of the strokes used to draw the class as well as stretch and skew the strokes used to create an attached arrow. The interpreted diagrams are used to automatically generate stub code using a software engineering tool. Software design meetings are videotaped to capture visual and spoken design information unobtrusively. When drawn items are interpreted, we use these understood sketch events to index the videotape of the software design meeting. We decided to design our application as a Metaglue agent since the Metaglue agent architecture provides support for multi-modal interactions through speech, gesture, and graphical user interfaces[2]. The Metaglue agent architecture also provides mechanisms for resource discovery and management which allows us to use available video agents or screen capture agents in a Metaglue supported room. We have selected UML-type diagrams because they are a de facto standard for depicting software applications. Within UML [1] we focused on class diagrams, rst because of their central role in describing program structure, and second because many of the symbols used in class diagrams are quite similar, and hence, offer an interesting challenge for sketch recognition. We added several symbols for agent-design since many of the applications created in the Intelligent Room [6] of the MIT AI Lab are