Author

Jeff Czorapinski

Bio: Jeff Czorapinski is an academic researcher from Lockheed Martin Corporation. The author has contributed to research in topics: Annotation & Unicode. The author has an hindex of 1, co-authored 1 publications receiving 26 citations.

Topics: Annotation, Unicode, Optical character recognition, Image file formats, XML ...read more

Papers

PDF

Open Access

More filters

Proceedings Article•DOI•

TRUEVIZ: a groundtruth/metadata editing and visualizing toolkit for OCR

[...]

Tapas Kanungo¹, Chang Ha Lee¹, Jeff Czorapinski², Ivan Bella²•Institutions (2)

University of Maryland, College Park¹, Lockheed Martin Corporation²

21 Dec 2000

TL;DR: In this article, the authors describe TrueViz, a tool for visualizing and editing groundtruth/metadata for OCR, which is implemented in the Java programming language and works on various platforms including Windows and Unix.

...read moreread less

Abstract: Tools for visualizing and creating groundtruth and metadata are crucial for document image analysis research. In this paper, we describe TrueViz which is a tool for visualizing and editing groundtruth/metadata for OCR. TrueViz is implemented in the Java programming language and works on various platforms including Windows and Unix. TrueViz reads and stores groundtruth/metadata in XML format, and reads a corresponding image stored in TIFF image file format. Multilingual text editing, display, and search module based on the Unicode representation for text is also provided. This software is being made available free of charge to researchers.

...read moreread less

26 citations

Cited by

PDF

Open Access

More filters

Proceedings Article•DOI•

Beyond guidelines: what can we learn from the visual information seeking mantra?

[...]

Brock Craft¹, Paul Cairns¹•Institutions (1)

University College London¹

06 Jul 2005

TL;DR: A need for empirical validation of the mantra and for a method, such as design patterns, to inform a holistic approach to visualisation design is indicated.

...read moreread less

Abstract: The field of information visualization offers little methodological guidance to practitioners who seek to design novel systems. Though many sources describe the foundations of the domain, few discuss practical methods for solving visualization problems. One frequently cited guideline to design is the "Visual information-seeking mantra", proposed by Shneiderman in 1996. Although often used to inform the design of information visualization systems, it is unclear what use this has been for visualization designers. We reviewed the current literature that references the mantra, noting what authors have found useful about it and why they cite it. The results indicate a need for empirical validation of the mantra and for a method, such as design patterns, to inform a holistic approach to visualisation design.

...read moreread less

98 citations

Patent•

System and method of specifying image document layout definition

[...]

Steven J. Simske¹, Malgorzata M. Sturgill¹•Institutions (1)

Hewlett-Packard¹

03 Oct 2003

TL;DR: In this article, a system and method of processing an image comprises receiving a definition of at least one region in the image, where the region definition has a location specification and a type specification.

...read moreread less

Abstract: A system and method of processing an image comprises receiving a definition of at least one region in the image, where the region definition has a location specification and a type specification. The method further comprises displaying the boundaries of the at least one defined region according to its type specification, receiving a definition of a visible area in the image, the visible area definition having a specification of margins around the image, generating an image layout definition comprising the region definition and the visible area definition, and saving the image layout definition. The image layout definition may also be used as a template to conform image documents to a specified layout.

...read moreread less

75 citations

Report•DOI•

Groundtruth generation and document image degradation

[...]

Gang Zi

02 May 2005

TL;DR: A system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives, and used for training and evaluating Optical Character Recognition (OCR) systems.

...read moreread less

Abstract: : The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness.

...read moreread less

67 citations

Journal Article•DOI•

A fast technique for comparing graph representations with applications to performance evaluation

[...]

Daniel P. Lopresti¹, Gordon Wilfong²•Institutions (2)

Lehigh University¹, Alcatel-Lucent²

01 Apr 2003-International Journal on Document Analysis and Recognition

TL;DR: This paper presents a formalism showing that graph probing provides a lower bound on the true edit distance between two graphs, and examines in detail the graph probing paradigm first put forth in the context of table understanding and later extended to HTML-coded Web pages.

...read moreread less

Abstract: Finding efficient, effective ways to compare graphs arising from recognition processes with their corresponding ground-truth graphs is an important step toward more rigorous performance evaluation.In this paper, we examine in detail the graph probing paradigm we first put forth in the context of our work on table understanding and later extended to HTML-coded Web pages. We present a formalism showing that graph probing provides a lower bound on the true edit distance between two graphs. From an empirical standpoint, the results of two simulation studies and an experiment using scanned pages show that graph probing correlates well with the latter measure. Moreover, our technique is very fast; graphs with tens or hundreds of thousands of vertices can be compared in mere seconds. Ease of implementation, scalability, and speed of execution make graph probing an attractive alternative for graph comparison.

...read moreread less

53 citations

Journal Article•DOI•

CVC-FP and SGT: a new database for structural floor plan analysis and its groundtruthing tool

[...]

Lluís-Pere de las Heras¹, Oriol Ramos Terrades¹, Sergi Robles¹, Gemma Sánchez¹•Institutions (1)

Autonomous University of Barcelona¹

01 Mar 2015-International Journal on Document Analysis and Recognition

TL;DR: This paper presents a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations and implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner.

...read moreread less

Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.

...read moreread less

52 citations

Collapse