scispace - formally typeset
Proceedings ArticleDOI

Metadata and data structures for the historical newspaper digital library

TLDR
In this paper, the authors examine metadata and data-structure issues for the Historical Newspaper Digital Library and propose a framework for the logical structure and physical layout of metadata relevant to the image processing and to the historians who will use this collection.
Abstract
We examine metadata and data-structure issues for the Historical Newspaper Digital Library. This project proposes to digitize and then do OCR and linguisting processing on several years worth of historical newspapers. Newspapers are very complex information objects so developing a rich description of their content is challenging. In addition to frameworks for the logical structure and physical layout, we propose metadata relevant to the image processing and to the historians who will use this collection. Finally, we consider how the metadata infrastructure might be managed as it evolves with improved text processing capabilities and how an infrastructure might be developed to support a community of users.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Reading and Writing the Electronic Book

TL;DR: This book begins with a brief historical overview the history of electronic books, including the social and technical forces that have shaped their development, and takes a closer look at the sociality of reading: how the authors read in a group and how they share what they read.
Journal ArticleDOI

The architecture of TrueViz: a groundTRUth/metadata editing and VIsualiZing ToolKit

TL;DR: TrueViz is implemented in the Java programming language and works on various platforms including Windows and Unix and reads and stores groundtruth/metadata in XML format, and reads a corresponding image stored in TIFF image file format.
Proceedings ArticleDOI

The challenge of virginia banks: an evaluation of named entity analysis in a 19th-century newspaper collection

TL;DR: This paper evaluates automatic extraction of ten named entity classes from a 19th century newspaper, the Civil War years of the Richmond Times Dispatch, digitized with IMLS support by the University of Richmond, and suggests the kinds of knowledge sources that digital libraries need to assemble as part of their machine readable reference collections to support named entity identification as a core service.

An Annotated Bibliography on Temporal and Evolution Aspects in the World Wide Web

TL;DR: The present bibliography reflects interest by collecting the references concerning the handling of time and evolution issues in World Wide Web research by following several fortunate bibliographies on time-varying information.
Patent

Method and system for forming a hyperlink reference and embedding the hyperlink reference within an electronic version of a paper

TL;DR: In this article, the authors describe a method for storing a version of a mass-produced printed paper, and forming a reference within the version, which is associated with an operation and at least a portion of the version.
References
More filters
Book

Handbook of Character Recognition and Document Image Analysis

TL;DR: Arabic character recognition, A. Amin automatic reading of braille documents, and Antonacopoulos techniques for improving OCR results.
Journal ArticleDOI

Pink Panther: a Complete Environment for Ground-Truthing and Benchmarking Document Page Segmentation

TL;DR: A new approach for the automatic evaluation of document page segmentation algorithms that is region-based: segmentation quality is assessed by comparing the segmentation output, described as a set of regions, to the corresponding ground-truth.
Book

The newspaper designer's handbook

Tim Harrower
TL;DR: This article took a hands-on approach to newspaper design techniques from basic page layout to complex info-graphics, emphasizing the importance of a fundamental yet often overlooked aspect of design, and a new section on the newspaper design report card.
Journal ArticleDOI

An approach to a digital library of newspapers

TL;DR: A new application for retrieving news from a large electronic bank of newspapers is intended to manage past issues of newspapers in such a way that users are able to draw up chronicles and trends about reported topics.