scispace - formally typeset
Search or ask a question
Author

Jeff Czorapinski

Bio: Jeff Czorapinski is an academic researcher from Lockheed Martin Corporation. The author has contributed to research in topics: Annotation & Unicode. The author has an hindex of 1, co-authored 1 publications receiving 26 citations.

Papers
More filters
Proceedings ArticleDOI
21 Dec 2000
TL;DR: In this article, the authors describe TrueViz, a tool for visualizing and editing groundtruth/metadata for OCR, which is implemented in the Java programming language and works on various platforms including Windows and Unix.
Abstract: Tools for visualizing and creating groundtruth and metadata are crucial for document image analysis research. In this paper, we describe TrueViz which is a tool for visualizing and editing groundtruth/metadata for OCR. TrueViz is implemented in the Java programming language and works on various platforms including Windows and Unix. TrueViz reads and stores groundtruth/metadata in XML format, and reads a corresponding image stored in TIFF image file format. Multilingual text editing, display, and search module based on the Unicode representation for text is also provided. This software is being made available free of charge to researchers.

26 citations


Cited by
More filters
Proceedings ArticleDOI
06 Jul 2005
TL;DR: A need for empirical validation of the mantra and for a method, such as design patterns, to inform a holistic approach to visualisation design is indicated.
Abstract: The field of information visualization offers little methodological guidance to practitioners who seek to design novel systems. Though many sources describe the foundations of the domain, few discuss practical methods for solving visualization problems. One frequently cited guideline to design is the "Visual information-seeking mantra", proposed by Shneiderman in 1996. Although often used to inform the design of information visualization systems, it is unclear what use this has been for visualization designers. We reviewed the current literature that references the mantra, noting what authors have found useful about it and why they cite it. The results indicate a need for empirical validation of the mantra and for a method, such as design patterns, to inform a holistic approach to visualisation design.

98 citations

Patent
03 Oct 2003
TL;DR: In this article, a system and method of processing an image comprises receiving a definition of at least one region in the image, where the region definition has a location specification and a type specification.
Abstract: A system and method of processing an image comprises receiving a definition of at least one region in the image, where the region definition has a location specification and a type specification. The method further comprises displaying the boundaries of the at least one defined region according to its type specification, receiving a definition of a visible area in the image, the visible area definition having a specification of margins around the image, generating an image layout definition comprising the region definition and the visible area definition, and saving the image layout definition. The image layout definition may also be used as a template to conform image documents to a specified layout.

75 citations

ReportDOI
02 May 2005
TL;DR: A system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives, and used for training and evaluating Optical Character Recognition (OCR) systems.
Abstract: : The problem of generating synthetic data for the training and evaluation of document analysis systems has been widely addressed in recent years. With the increased interest in processing multilingual sources, however, there is a tremendous need to be able to rapidly generate data in new languages and scripts, without the need to develop specialized systems. We have developed a system, which uses language support of the MS Windows operating system combined with custom print drivers to render tiff images simultaneously with windows Enhanced Metafile directives. The metafile information is parsed to generate zone, line, word, and character ground truth including location, font information and content in any language supported by Windows. The resulting images can be physically or synthetically degraded by our degradation modules, and used for training and evaluating Optical Character Recognition (OCR) systems. Our document image degradation methodology incorporates several often-encountered types of noise at the page and pixel levels. Examples of OCR evaluation and synthetically degraded document images are given to demonstrate the effectiveness.

67 citations

Journal ArticleDOI
TL;DR: This paper presents a formalism showing that graph probing provides a lower bound on the true edit distance between two graphs, and examines in detail the graph probing paradigm first put forth in the context of table understanding and later extended to HTML-coded Web pages.
Abstract: Finding efficient, effective ways to compare graphs arising from recognition processes with their corresponding ground-truth graphs is an important step toward more rigorous performance evaluation.In this paper, we examine in detail the graph probing paradigm we first put forth in the context of our work on table understanding and later extended to HTML-coded Web pages. We present a formalism showing that graph probing provides a lower bound on the true edit distance between two graphs. From an empirical standpoint, the results of two simulation studies and an experiment using scanned pages show that graph probing correlates well with the latter measure. Moreover, our technique is very fast; graphs with tens or hundreds of thousands of vertices can be compared in mere seconds. Ease of implementation, scalability, and speed of execution make graph probing an attractive alternative for graph comparison.

53 citations

Journal ArticleDOI
TL;DR: This paper presents a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations and implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner.
Abstract: Recent results on structured learning methods have shown the impact of structural information in a wide range of pattern recognition tasks. In the field of document image analysis, there is a long experience on structural methods for the analysis and information extraction of multiple types of documents. Yet, the lack of conveniently annotated and free access databases has not benefited the progress in some areas such as technical drawing understanding. In this paper, we present a floor plan database, named CVC-FP, that is annotated for the architectural objects and their structural relations. To construct this database, we have implemented a groundtruthing tool, the SGT tool, that allows to make specific this sort of information in a natural manner. This tool has been made for general purpose groundtruthing: It allows to define own object classes and properties, multiple labeling options are possible, grants the cooperative work, and provides user and version control. We finally have collected some of the recent work on floor plan interpretation and present a quantitative benchmark for this database. Both CVC-FP database and the SGT tool are freely released to the research community to ease comparisons between methods and boost reproducible research.

52 citations