Error Correction Techniques for Handwriting, Speech, and Other Ambiguous or Error Prone Systems

Error Correction Techniques for Handwriting, Speech, and

other ambiguous or error prone systems

Jennifer Manko & Gregory D. Abowd GVU Center & College of Computing

Georgia Institute of Technology,Atlanta, GA, USA

+1 404 894 7512

jmanko@cc.gatech.edu, abowd@cc.gatech.edu

http://www.cc.gatech.edu/fce/p endragon

ABSTRACT

Interfaces which supp ort natural inputs such as hand-

writing and speech are becoming more prevalent and

this is a desirable trend. However, these recognition-

based interface techniques are error prone. Despite re-

search eorts to improve recognition rates, a certain

amount of error will never be removed. Suitable re-

search eorts should attend to the problem of correc-

tion techniques for these error prone techniques. Hu-

mans have developed countless ways to correct errors

in understanding or clarify ambiguous statements. It is

time for interface designers to focus on ways for comput-

ers to do the same. We present a survey of the design,

implementation, and study of interfaces for correcting

error prone input technologies. Previous work by others

and our own researchinto exible pen-based note-taking

environments grounds our researchinto interface tech-

niques for handling errors in recognition systems.

KEYWORDS:

handwriting and speech recognition, in-

terface design, error handling

1 INTRODUCTION

1.1 Motivating the Problem

Computer interfaces which supp ort more natural hu-

man forms of communication (e.g. handwriting, sp eech,

and gestures) are b eginning to supplement or replace

elements of the GUI paradigm. These interfaces are

lauded for their low learning curves and their abilityto

support tasks such as authoring and drawing without

drastically changing their structure. Additionally, they

can be used by p eople with disabilities that make the

traditional mouse and keyboard less accessible.

Unfortunately, these new interfaces come with a

new set of problems |they make mistakes. When errors

occur, the initial reaction of system designers is to try

to eliminate them, for example by improving recogni-

tion accuracy. This is often a dicult task |Buskirk &

LaLomia (1995) found that an improvement of 5-10% is

necessary b efore the ma jority of people will even notice

GVU Tech Rep ort GIT-GVU???

a dierence in a sp eech recognition system.

Worse yet, eliminating errors may not b e possible.

Even

humans

make mistakes when dealing with these

same forms of communication. As an example, con-

sider handwriting recognition. Even the most exp ert

handwriting recognizers (humans) can have a recogni-

tion accuracy as low as 54% when looking at word frag-

ments without the benet of their context (Schomaker,

1994). Human accuracy increases to 88% for cursive

handwriting (Schomaker, 1994), and 96.8% for printed

handwriting (Frankish et al., 1995), but it is never p er-

fect. This evidence all p oints to the conclusion that

computer handwriting recognition will never be p erfect.

Computer-based recognizers are even more error

prone than humans. The data they start with is of-

ten less ne-grained than that whichhumans are able

to sense. They have less processing p ower. And vari-

ables suchasvocal fatigue can cause usage data to dier

signicantly from training data, causing reduced recog-

nition accuracy over time in sp eech recognition systems

(Frankish et al., 1992).

On the other hand, recognition accuracy is not the

only determinant for user satisfaction. Both the com-

plexity of error recovery dialogues (Za jicek & Hewitt,

1990), and the amount gained for the eort (Frank-

ish et al., 1995), aect user satisfaction. For example,

Frankish found that users were less frustrated by recog-

nition errors when the task was to enter a command in

a form than when they were writing journal entries. He

suggests that this is b ecause the pay-back for entering

a single word in the case of a command is much larger

than in a paragraph of a journal entry when compared

with the eort of entering the word.

Error handling is not a new problem. In fact, it

is endemic to the design of computer systems which at-

tempt to mimic human abilities. Research in the area of

error handling for recognition technologies must assume

that errors will o ccur, and then answer questions ab out

the b est ways to deal with them. The goal of this pap er

is to present a survey of existing research in discovering

and correcting errors in recognition based interfaces.

1.2 Dening The Area

Our survey has have identied vekey research areas

for error handling of recognition-based interfaces.

Error reduction

Error reduction involves researchinto

improving recognition technology in order to eliminate

or reduce errors. It has been the fo cus of extensive re-

search, and could easily b e the sub ject of a whole pa-

per on its own. Evidence suggests that its holy grail,

the elimination of errors, is probably not achievable.

And big improvements (5-10%) are required b efore

users even notice a dierence (Buskirk & LaLomia,

1995). Because of these facts, wehavechosen not to

address error reduction in this paper.

Error discovery

Before either the system or the user

can takeany action related to a given error, one of

them has to know that the error has o ccured. The

system may b e told of an error through user input, and

can help the user to nd errors through its output. In

addition, system designers have used three techniques

to automate error discovery |thresholding, rules, and

historical statistics.

Error correction techniques

Just as the user inter-

face is the only way one party can inform the other

that an error has o ccured, it is also the only way that

the user can correct an error. We found that current

error handling techniques fall into three main cate-

gories |choosing a default, encouraging less ambigu-

ous input, and mimicking natural human correction

strategies.

Validation of techniques

Validation go es hand in hand

with researchinto error correction techniques. Valida-

tion is the only way to determine the eectiveness of

dierent designs. Our survey uncovered researchinto

theoretical issues suchashow to compare techniques,

and practical results such as which techniques are ef-

fective.

To olkit level supp ort

Toolkits provide reusable com-

ponents and are most useful when a class of common,

similar problems exists. Interfaces for error handling

would benet tremendously from a toolkit which could

be used and re-used every time an error prone situa-

tion arose. In addition to interface widgets, a to olkit

would need to support complete reversibility, and keep

trackofmultiple potential interpretations at once.

In addition to surveying existing work, we are build-

ing a platform to test strategies for dealing with seg-

mentation errors, handwriting recognition errors, and

gesture recognition errors (see Figure 1). Our system,

called PenPad, supp orts handwriting recognition in the

context of p ersonal note-taking. Our motivation for this

application is to support note taking and do cument cre-

ation in situations when typing is not an option. This

Figure 1: PenPad’s user interface. The words:

Pen-

pad; around; both the; al l; potential;

wereallrecog-

nized correctly. Thedarker theword,the surerthe rec-

ognizer is of this. The word “interpretations” was rec-

ognized incorrectly. When the user moves the mouse

over this word, ﬁve alternatives are displayed, shown

in the blow-up. The words “ink, and” were originally

incorrect, but the user was able to select them from a

similar set of ﬁve potential choices.

includes mobile settings, and users with repetitive stress

injuries or other disabilities which makekeyb oard typ-

ing dicult.

The rest of this paper describes the results of our

survey. We discuss research in each of the last four

sub-areas mentioned ab ove |error discovery, error cor-

rection techniques, validation of techniques, and to olkit

level supp ort.

2 ERROR DISCOVERY

Before the system can supp ort error recovery in anyway,

or the user can handle an error, one or the other needs

to know that an error has occurred. The user interface

is a conduit through which the system and user can

pass information. User input can notify the system of

an error (and correct it, described in more detail in the

next section). And it is through visual or oral feedback

that the system helps the user to identify errors.

The system can also try to determine when it has

made a mistake without the user's help, either through

thresholding (Baber & Hone, 1993; Poon et al., 1995;

Brennan & Hulteen, 1995), a rule base (Bab er & Hone,

1993; Davis, 1979), or historical statistics (Marx &

Schmandt, 1994).

2.1 User input to help the system nd

errors

In the most common approaches to notication, the

user explicitly indicates the presence of an error by, for

example, clicking on a word, or saying a sp ecial key-

word. Many speech and handwriting recognition sys-

tems use this approach. Three well known examples

are the PalmPilot

tm

, DragonDictate

tm

, and the Apple

MessagePad

tm

.For example, when the user clicks on a

word in the Apple MessagePad

tm

, a menu of alternative

interpretations appears.

In cases where there is no special interface for noti-

cation or correction, user action may still help the sys-

tem to discover errors. For example, if the user deletes

aword and enters a new one, the system may infer that

an error has o ccurred by matching the deleted word to

the new one.

2.2 System output to help the user nd

errors

There is a plethora of hidden information available to

the system designer which can help users to identify

errors. The likelihoo d that something is correct, the

history of values an item has had, other p ossible val-

ues it could have, and the user's original input are just

a few of the non application-specic ones. Our survey

shows that designer after designer has found it bene-

cial to reveal some of this hidden information to the user

(Brennan & Hulteen, 1995; Davis, 1979; Goldberg &

Goo disman, 1991; Igarashi et al., 1997; Kurtenbach

et al., 1994; Rho des & Starner, 1996) Two of the most

Figure 2: Pictures of two user interfaces,adaptedfrom

a paper about drawing understanding (A, left) (Gold-

berg & Goodisman, 1991), and pen input (B, right)

(Igarashi et al, 1997)

common pieces of information to display are the proba-

bility of correctness (called certainty in this paper), and

multiple alternatives.

An example of a system which shows information

about certainty is the PenPad system. The probability

of correctness is displayed through color. For example,

the typewritten word

PenPad

is lighter (less certain)

than the corresponding words

ink, and

in Figure 1. Fig-

ure 2 shows two example systems which displaymulti-

ple alternatives. The rst (Figure 2A) is a drawing un-

derstanding system designed by Igarashi et al. (1997).

The b old line represents the system's current top guess.

The dotted lines represent potential alternatives, and

the plain line is a past accepted guess. Figure 2B shows

acharacter recognition system designed by Goldberg &

Goo disman (1991). The larger character is the system's

top choice; the two smaller letters are the second and

third most likely possibilities. In b oth systems, the user

can click on an alternative to tell the system that its

default choice should be changed. In b oth systems, if

the user continues input as normal, they are implic-

itly accepting the default choice. Interestingly, although

Igarashi had success with this approach in his drawing-

understanding system, Goldb erg and Go odisman found

that it required too great a cognitiveoverhead to be

eective in their character recognition system.

Both certainty and the displayof multiple alter-

natives can also be achieved in an audio-only setting,

as demonstrated by Brennan & Hulteen (1995). They

base their approach on linguistic research showing that

humans reveal p ositive and negative evidence as they

converse. Positive evidence is output which conrms

that the listener has heard the speaker correctly.For

example, the listener may sp ell back a name which has

just b een dictated to them. Negative evidence is output

which somehow reveals that the listener (in this case,

the recognition system) is not sure they have under-

stoo d the speaker correctly. Examples are rep eating the

speaker's sentence and replacing the questionable word

with a pause or simply saying \Huh?" Negative evidence

can also b e used to displaymultiple alternatives, So, for

example, the system maysay \call

John

or

Jane

?" in

response to a user's request. Brennan and Hulteen built

a sophisticated response system using both techniques.

They make use of positive and negative evidence, and

they limit the display of alternatives based on a contex-

tual analysis of the likelihoo d of correctness.

Another setting in whichmultiple alternatives are

commonly displayed is word prediction (Alm et al., 1992;

Greenberg et al., 1995). Word prediction is often used

to support communication and productivity for p eople

with disabilities which maketyping, and in some cases

even using a mouse, very dicult. As the user types

each letter, the system retrieves a list of words which

are the most likely completions of what has b een typed

so far. Often there are a large number of potential com-

pletions, and many are displayed at some distance from

the actual input on screen.

2.3 Thresholding

Many error prone systems return some measure of the

probability that each result is correct when they return

the result. This probability represents the condence

of the interpretation. The resultant probabilities can

be compared to a threshold. When they fall below the

threshold, the system assumes an error has o ccurred.

When they fall ab ove it, the assumption is that no error

has occurred. Most systems set this threshold to zero,

meaning they never assume that there has been a mis-

take. Some systems may set it to one, meaning they

always assume they are wrong (e.g., word prediction),

and other systems try to determine a reasonable thresh-

old based on statistics or other means (Poon et al., 1995;

Brennan & Hulteen, 1995; Bab er & Hone, 1993).

2.4 Rules

Baber & Hone (1993) suggest using a rule base to deter-

mine when errors mayhave o ccurred.This can proveto

be more sophisticated than either statistics or thresh-

olding since it allows the use of context in determining

whether an error has occurred. An example rule might

be:

When the user has just written `for (', lower the probabil-

ity of correctness for any alternatives to the next word they

write which are not members of the set of variable names

currently in scop e.

This go es b eyond simple statistics b ecause it uses knowl-

edge about the context in whichaword has b een written

to detect errors.

2.5 Historical Statistics

When error prone systems do not return a measure of

probability, or when the estimates of probabilitymay

be wrong, new probabilities can be generated by doing

a statistical analysis of historical data about when and

where the system makes mistakes. This talk itself bene-

ts from go od error discovery. A historical analysis can

help to increase the accuracy of b oth thresholding and

rules. For example, Marx & Schmandt (1994) compiled

speech data about which letters were misrecognized as

\

e

", with what frequencies, and used them as a list of

potential alternatives whenever the speech recognizer re-

turned \

e

". They did the same for each letter of the

alphabet.

The example below shows pen data for \

e

" gen-

erated by the rst author by repeating each letter of

the alphab et 25 times in a PalmPilot

tm

. The rst col-

umn represents the letter that was written; the other

columns show which letters the PalmPilot

tm

Grati

tm

recognizer returned. Only letters whichwere mistaken

for \

e

" are shown.

original top guess other guesses

e e

(100%)

k k(72%) l(16%),

e

(8%), s(4%)

l l(80%) c(17%),

e

(3%)

This sort of matrix is called a

confusion matrix

be-

cause it shows potential correct answers that the system

mayhave confused with its returned answer. In this way,

historical statistics may provide a default probabilityof

correctness for a given answer. More sophisticated anal-

yses can help in the creation of better rules or the choice

of when to apply certain rules.

Although error discovery is a necessary component

of error handling interfaces, it has a stigma associated

with it: The task of error discovery is itself error prone.

Rules, thresholding, and historical statistics may all be

wrong. Even when the user's explicit actions are ob-

served, the system may incorrectly infer that an error

has o ccurred. Only when the user's action is to explic-

itly notify the system of an error can we be sure that

an error really has occurred in the user's eyes. In other

words, all of the approaches mentioned may create a new

source of errors, leading to a cascade of error handling

issues.

3 ERROR CORRECTION TECHNIQUES

Once a mistake has been identied, the system can take

action to correct it, or ask the user's help in correcting

it (through some sort of error handling interface). Al-

ternatively the system can supp ort error handling in an

integrated fashion. For example, the interactive b eauti-

cation system shown in Figure 2A displays alternatives

after every stroke. The same interface also supports no-

tication |if the user selects an alternative, the system

can infer that the original default was wrong and the

alternative is correct.

Most of the tasks b eing supported require the selec-

tion of a single correct interpretation of user input (one

exception to this is search engines, whichmayhavemul-

tiple correct responses). One imp ortantchoice facing

the designer of error handling techniques is how active

the system should be in selecting this interpretation. Es-

sentially, the designer must choose whether to accept the

most certain choice by default, or to wait for user con-

rmation. The rst part of this section discusses where

eachchoice has shown up in the literature, and why. The

remaining parts discuss two commonly used techniques

for error handling, encouraging less ambiguous input,

and mimicking natural human correction strategies.

3.1 Cho osing a Default

The numb er of answers returned by an error prone sys-

tem is often larger than the number of answers exp ected

by the user. This leaves the interface designer with the

choice of selecting none of the answers, or selecting one

(or more) of the answers as \correct" by default. For

example, the drawing understanding system mentioned

above selects one line by default (shown bold in Fig-

ure 2A) (Igarashi et al., 1997). The interface designer

should use information about the probability of correct-

ness and the overhead for correcting a mistaken choice

of default to decide when it is appropriate to choose a

default. In the case of the drawing understanding sys-

tem, the interface is designed so that the user do es no

more work when the system selects a default than when

it doesn't. And if the system selects the correct choice,

the user do es less work (since they don't have to select

it themselves b efore they continue drawing).

An example of a system whichdoes well to select

nothing by default is Rhodes & Starner's (1996)

remem-

brance agent

. The remembrance agent retrieves do cu-

ments based on their relevance to the current text in an

editor. Rather than immediately displaying the most

relevant do cument, it has a small permanent window

where it shows a single line from each of three potentially

interesting do cuments. Actually selecting a document

and displaying it would be far more invasive, dicult to

correct, and often not what the user wants. Even if the

system has found relevant documents, the user may not

wantto be interrupted in order to read them.

Word prediction systems also demonstrate why the

designer maycho ose not to select a default. If, for ex-

ample, the system assumes its top prediction is correct,

it will insert it. But word prediction is a particularly dif-

cult task in which the top choice is often wrong. And

it will most likely take more keystrokes for the user to

delete the mistake and continue typing than it would to

have simply typed the whole word out in the rst place,

especially if similar mistakes happ en automatically after

every character typed.

Even when it is appropriate to choose a default for

the user, this choice may b e wrong, and b ecause of this

the user interface needs to support error correction. One

way to support this is to display alternatives from which

the user can select a correct choice. Another approach

is to unobtrusively provide ways to change the default

without necessarily displaying alternatives. For exam-

ple, Goldb erg & Go odisman (1991) suggest using a sim-

ple gesture (a tap) to select the next choice. As another

example, consider the

Tivoli

system in which some in-

puts are interpreted as gestures and others simply as ink

to b e drawn on the screen (Moran et al., 1997). If a user

draws a gesture which could trigger an action, suchas

\move", the system by default assumes that the action is

intended (and not simply drawing on the screen). How-

ever, if the user doesn't follow through (by selecting an

ob ject to move in this case), Moran et al. automatically

undo it, replacing it instead with its alternate interpre-

tation as plain ink.

3.2 Encouraging Less Ambiguous Input

Certain mo des of input are known to b e less error prone

than others (compare typing to handwriting recogni-

tion), and there are times when it is appropriate to

make use of this fact. For example, Suhm found that

recognition accuracy actually

decreases

by 10{65% dur-

ing this sort of error repair in a speech recognition sys-

tem (Suhm, 1997)). One option is for the computer

to oer a less ambiguous input method as an alterna-

tive. This technique has b een used eectively in the

Apple MessagePad

tm

,as well as for speech input (Marx

&Schmandt, 1994), pen input (Goldberg & Go odisman,

1991), and a mixture of the two (Suhm et al., 1996b).

Alternatively,aninterface designer maycho ose to

encourage a less error prone input from the outset. For

example, the designers of the PalmPilot

tm

chose to use

a unistroke alphabet (Goldb erg & Richardson, 1993). It

is easier to recognize unistrokes than to recognize hand-

writing because there is no p ossibility of segmentation

errors since each letter is exactly one stroke(pen up

to p en down). In another example, Goldberg & Go od-

isman (1991) suggest using on-screen marks (b oxes) to

reduce segmentation errors and discourage cursive hand-

writing.

Several researchers have made use of a human's ten-

dency to mimic the output of whatever they are commu-

nicating with. Zoltan-Ford (1991) found that people will

mimic sentence structures of the computer's responses,

something that helps to make natural language pro cess-

ing easier. Kurtenbach et al. (1994) investigated the

use of crib sheets which display gestures for a user to

copy. The user can request an animation of a command

by clicking on its picture on the crib sheet. Crib sheets

have also been found to successfully improve recognition

in a character recognition system (Wolf, 1990).

3.3 Mimicking Natural Human Correc-

tion Strategies

Although computers are a ma jor source of errors, hu-

mans also make mistakes. Both exp erience and research

Error Correction Techniques for Handwriting, Speech, and Other Ambiguous or Error Prone Systems

Citations

Keyboard system with automatic correction

Virtual keyboard system with automatic correction

Directional input system with automatic correction

Touch screen and graphical user interface

Selective input system based on tracking of motion parameters of an input device

References

User interface design

Touch-typing with a stylus

Cirrin: a word-level unistroke keyboard for pen input

Pen-based interaction techniques for organizing material on an electronic whiteboard

Interaction and feedback in a spoken language system: a theoretical framework

Related Papers (5)

Cirrin: a word-level unistroke keyboard for pen input

Multimodal error correction for speech user interfaces

An efficient text input method for pen-based computers

Shorthand writing on stylus keyboard

Data entry device and method