scispace - formally typeset
Search or ask a question
Author

S. Janet

Bio: S. Janet is an academic researcher. The author has contributed to research in topics: Handwriting recognition & Data exchange. The author has an hindex of 1, co-authored 1 publications receiving 428 citations.

Papers
More filters
Proceedings ArticleDOI
09 Oct 1994
TL;DR: The status of the UNIPEN project of data exchange and recognizer benchmarks started two years ago is reported, to propose and implement solutions to the growing need of handwriting samples for online handwriting recognizers used by pen-based computers.
Abstract: We report the status of the UNIPEN project of data exchange and recognizer benchmarks started two years ago at the initiative of the International Association of Pattern Recognition (Technical Committee 11). The purpose of the project is to propose and implement solutions to the growing need of handwriting samples for online handwriting recognizers used by pen-based computers. Researchers from several companies and universities have agreed on a data format, a platform of data exchange and a protocol for recognizer benchmarks. The online handwriting data of concern may include handprint and cursive from various alphabets (including Latin and Chinese), signatures and pen gestures. These data will be compiled and distributed by the Linguistic Data Consortium. The benchmarks will be arbitrated the US National Institute of Standards and Technologies. We give a brief introduction to the UNIPEN format. We explain the protocol of data exchange and benchmarks.

437 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: The nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms are described.
Abstract: Handwriting has continued to persist as a means of communication and recording information in day-to-day life even with the introduction of new technologies. Given its ubiquity in human transactions, machine recognition of handwriting has practical significance, as in reading handwritten notes in a PDA, in postal addresses on envelopes, in amounts in bank checks, in handwritten fields in forms, etc. This overview describes the nature of handwritten language, how it is transduced into electronic data, and the basic concepts behind written language recognition algorithms. Both the online case (which pertains to the availability of trajectory data during writing) and the off-line case (which pertains to scanned images) are considered. Algorithms for preprocessing, character and word recognition, and performance with practical systems are indicated. Other fields of application, like signature verification, writer authentification, handwriting learning tools are also considered.

2,653 citations

Journal ArticleDOI
TL;DR: The FERET evaluation procedure is an independently administered test of face-recognition algorithms to allow a direct comparison between different algorithms and to assess the state of the art in face recognition.

2,494 citations

Journal ArticleDOI
TL;DR: This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies, significantly outperforming a state-of-the-art HMM-based system.
Abstract: Recognizing lines of unconstrained handwritten text is a challenging task. The difficulty of segmenting cursive or overlapping characters, combined with the need to exploit surrounding context, has led to low recognition rates for even the best current recognizers. Most recent progress in the field has been made either through improved preprocessing or through advances in language modeling. Relatively little work has been done on the basic recognition algorithms. Indeed, most systems rely on the same hidden Markov models that have been used for decades in speech and handwriting recognition, despite their well-known shortcomings. This paper proposes an alternative approach based on a novel type of recurrent neural network, specifically designed for sequence labeling tasks where the data is hard to segment and contains long-range bidirectional interdependencies. In experiments on two large unconstrained handwriting databases, our approach achieves word recognition accuracies of 79.7 percent on online data and 74.1 percent on offline data, significantly outperforming a state-of-the-art HMM-based system. In addition, we demonstrate the network's robustness to lexicon size, measure the individual influence of its hidden layers, and analyze its use of context. Last, we provide an in-depth discussion of the differences between the network and HMMs, suggesting reasons for the network's superior performance.

1,686 citations

Journal ArticleDOI
TL;DR: A database that consists of handwritten English sentences based on the Lancaster-Oslo/Bergen corpus, which is expected that the database would be particularly useful for recognition tasks where linguistic knowledge beyond the lexicon level is used.
Abstract: In this paper we describe a database that consists of handwritten English sentences. It is based on the Lancaster-Oslo/Bergen (LOB) corpus. This corpus is a collection of texts that comprise about one million word instances. The database includes 1,066 forms produced by approximately 400 different writers. A total of 82,227 word instances out of a vocabulary of 10,841 words occur in the collection. The database consists of full English sentences. It can serve as a basis for a variety of handwriting recognition tasks. However, it is expected that the database would be particularly useful for recognition tasks where linguistic knowledge beyond the lexicon level is used, because this knowledge can be automatically derived from the underlying corpus. The database also includes a few image-processing procedures for extracting the handwritten text from the forms and the segmentation of the text into lines and words.

1,254 citations

Book
14 Jan 2010
TL;DR: In this article, the authors present a glossary for language analysis and understanding in the context of spoken language input and output technologies, and evaluate their work with a set of annotated corpora.
Abstract: 1. Spoken language input Ronald Cole, Victor Zue, Wayne Ward, Melvyn J. Hunt, Richard M. Stern, Renato De Mori, Fabio Brugnara, Salim Roukos, Sadaoki Furui and Patti Price 2. Written language input Joseph Mariani, Sargur N. Srihari, Rohini K. Srihari, Richard G. Casey, Abdel Belaid, Claudie Faure, Eric Lecolinet, Isabelle Guyo, Colin Warwick and Rejean Plamondon 3. Language analysis and understanding Annie Zaenen, Hans Uszkoreit, Fred Karlsson, Lauri Karttunen, Antonio Sanfilippo, Stephen F. Pulman, Fernando Pereira and Ted Briscoe 4. Language generation Hans Uszkoreit, Eduard Hovy, Gertjan van Noord, Gunter Neumann and John Bateman 5. Spoken output technologies Ronald Cole, Yoshinori Sagisaka, Christophe d'Alessandro, Jean-Sylvain Lienard, Richard Sproat, Kathleen R. McKeown and Johanna D. Moore 6. Discourse and dialogue Hans Uszkoreit, Barbara Grosz, Donia Scott, Hans Kamp, Phil Cohe and Egidio Giachin 7. Document processing Annie Zaenen, Per-Kristian Halvorsen, Donna Harman, Peter Schauble, Alan Smeaton, Paul Jacobs, Karen Sparck Jones, Robert Dale, Richard H. Wojcik and James E. Hoard 8. Multilinguality Annie Zaenen, Martin Kay, Christian Boitet, Christian Fluhr, Alexander Waibel, Yeshwant K. Muthusamy and A. Lawrence Spitz 9. Multimodality Joseph Mariani, James L. Flanagan, Gerard Ligozat, Wolfgang Wahlster, Yacine Bellik, Alan J. Goldschen, Christian Benoit, Dominic W. Massaro and Michael M. Cohen 10. Transmission and storage Victor Zue, Isabel Trancoso, Bishnu S. Atal, Nikil S. Jayant and Dirk Van Compernolle 11. Mathematical methods Ronald Cole, Hans Uszkoreit, Steve Levinson, John Makhoul, Aravind Joshi, Herve Bourlard, Nelson Morgan, Ronald M. Kaplan and John Bridle 12. Language resources Ronald Cole, Antonio Zampolli, Eva Ejerhed, Ken Church, Lori Lamel, Ralph Grishman, Nicoletta Calzolari, Christian Galinski and Gerhard Budin 13. Evaluation Joseph Mariani, Lynette Hirschman, Henry S. Thompson, Beth Sundheim, John Hutchins, Ezra Black, Margaret King, David S. Pallett, Adrian Fourcin, Louis C. W. Pols, Sharon Oviatt, Herman J. M. Steeneken and Junichi Kanai Glossary Citation index Index.

569 citations