Metadata and data structures for the historical newspaper digital library

doi:10.1145/319950.319971

Proceedings ArticleDOI

Metadata and data structures for the historical newspaper digital library

- pp 147-153

TLDR

In this paper, the authors examine metadata and data-structure issues for the Historical Newspaper Digital Library and propose a framework for the logical structure and physical layout of metadata relevant to the image processing and to the historians who will use this collection.

Abstract:

We examine metadata and data-structure issues for the Historical Newspaper Digital Library. This project proposes to digitize and then do OCR and linguisting processing on several years worth of historical newspapers. Newspapers are very complex information objects so developing a rich description of their content is challenging. In addition to frameworks for the logical structure and physical layout, we propose metadata relevant to the image processing and to the historians who will use this collection. Finally, we consider how the metadata infrastructure might be managed as it evolves with improved text processing capabilities and how an infrastructure might be developed to support a community of users.

Citations

PDF

Open Access

More filters

Book

Reading and Writing the Electronic Book

Catherine C. Marshall

TL;DR: This book begins with a brief historical overview the history of electronic books, including the social and technical forces that have shaped their development, and takes a closer look at the sociality of reading: how the authors read in a group and how they share what they read.

...read moreread less

Journal ArticleDOI

The architecture of TrueViz: a groundTRUth/metadata editing and VIsualiZing ToolKit

Chang Ha Lee, +1 more

- 01 Mar 2003 -

Pattern Recognition

TL;DR: TrueViz is implemented in the Java programming language and works on various platforms including Windows and Unix and reads and stores groundtruth/metadata in XML format, and reads a corresponding image stored in TIFF image file format.

...read moreread less

Proceedings ArticleDOI

The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

Gregory Crane, +1 more

TL;DR: This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond, and suggests the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service.

...read moreread less

An Annotated Bibliography on Temporal and Evolution Aspects in the World Wide Web

Fabio Grandi

TL;DR: The present bibliography reflects interest by collecting the references concerning the handling of time and evolution issues in World Wide Web research by following several fortunate bibliographies on time-varying information.

...read moreread less

Patent

Method and system for forming a hyperlink reference and embedding the hyperlink reference within an electronic version of a paper

Billy P. Taylor

TL;DR: In this article, the authors describe a method for storing a version of a mass-produced printed paper, and forming a reference within the version, which is associated with an operation and at least a portion of the version.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Book

Handbook of Character Recognition and Document Image Analysis

Horst Bunke, +1 more

TL;DR: Arabic character recognition, A. Amin automatic reading of braille documents, and Antonacopoulos techniques for improving OCR results.

...read moreread less

Journal ArticleDOI

Pink Panther: a Complete Environment for Ground-Truthing and Benchmarking Document Page Segmentation

Berrin Yanikoglu, +1 more

- 01 Sep 1998 -

Pattern Recognition

TL;DR: A new approach for the automatic evaluation of document page segmentation algorithms that is region-based: segmentation quality is assessed by comparing the segmentation output, described as a set of regions, to the corresponding ground-truth.

...read moreread less

Book

The Art of Editing

Floyd K. Baskette, +1 more

Book

The newspaper designer's handbook

Tim Harrower

TL;DR: This article took a hands-on approach to newspaper design techniques from basic page layout to complex info-graphics, emphasizing the importance of a fundamental yet often overlooked aspect of design, and a new section on the newspaper design report card.

...read moreread less

Journal ArticleDOI

An approach to a digital library of newspapers

María José Aramburu Cabo, +1 more

- 01 Sep 1997 -

Information Processing and Management

TL;DR: A new application for retrieving news from a large electronic bank of newspapers is intended to manage past issues of newspapers in such a way that users are able to draw up chronicles and trends about reported topics.

...read moreread less

Metadata and data structures for the historical newspaper digital library

Citations

Reading and Writing the Electronic Book

The architecture of TrueViz: a groundTRUth/metadata editing and VIsualiZing ToolKit

The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

An Annotated Bibliography on Temporal and Evolution Aspects in the World Wide Web

Method and system for forming a hyperlink reference and embedding the hyperlink reference within an electronic version of a paper

References

Handbook of Character Recognition and Document Image Analysis

Pink Panther: a Complete Environment for Ground-Truthing and Benchmarking Document Page Segmentation

The Art of Editing

The newspaper designer's handbook

An approach to a digital library of newspapers

Related Papers (5)

The WebBook and the Web Forager: an information workspace for the World-Wide Web

Fractal computer user centerface with zooming capability

Style sheets for publishing system

Regulating access to digital content

System for generating a custom formatted hypertext document by using a personal profile to retrieve hierarchical documents