scispace - formally typeset
Search or ask a question
Topic

Historical document

About: Historical document is a research topic. Over the lifetime, 448 publications have been published within this topic receiving 5300 citations. The topic is also known as: Historical documents.


Papers
More filters
Proceedings ArticleDOI
09 Jun 2010
TL;DR: A new document image binarization technique that segments the text from badly degraded historical document images by using local thresholds that are estimated from the detected high contrast pixels within a local neighborhood window.
Abstract: This paper presents a new document image binarization technique that segments the text from badly degraded historical document images. The proposed technique makes use of the image contrast that is defined by the local image maximum and minimum. Compared with the image gradient, the image contrast evaluated by the local maximum and minimum has a nice property that it is more tolerant to the uneven illumination and other types of document degradation such as smear. Given a historical document image, the proposed technique first constructs a contrast image and then detects the high contrast image pixels which usually lie around the text stroke boundary. The document text is then segmented by using local thresholds that are estimated from the detected high contrast pixels within a local neighborhood window. The proposed technique has been tested over the dataset that is used in the recent Document Image Binarization Contest (DIBCO) 2009. Experiments show its superior performance.

239 citations

Patent
09 Apr 2009
TL;DR: In this paper, the sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic and a unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance.
Abstract: The sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic. A unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance and the group of documents is inserted into a historical document sentiment vector space for the topic. Action areas in the vector space are defined from the locations of action documents and singular sentiment vector may be created that describes the cumulative action area. Newly published documents are sentiment-scored by semantically comparing them to documents in the space and/or to the singular sentiment vector. The sentiment scores for the newly published documents are supplemented by human sentiment assessment of the documents and a sentiment time decay factor is applied to the supplemented sentiment score of each newly published documents. User queries are received and a set of sentiment-ranked documents is returned with the highest age-adjusted sentiment scores.

226 citations

Proceedings ArticleDOI
18 Sep 2011
TL;DR: A patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors that is able to deal with heterogeneous document image collections.
Abstract: In this paper, we present a segmentation-free word spotting method that is able to deal with heterogeneous document image collections. We propose a patch-based framework where patches are represented by a bag-of-visual-words model powered by SIFT descriptors. A later refinement of the feature vectors is performed by applying the latent semantic indexing technique. The proposed method performs well on both handwritten and typewritten historical document images. We have also tested our method on documents written in non-Latin scripts.

150 citations

Book ChapterDOI
08 Sep 2004
TL;DR: A novel digital image binarization scheme for low quality historical documents allowing further content exploitation in an efficient way and performs better compared to current state-of-the-art adaptive thresholding techniques.
Abstract: Historical document collections are a valuable resource for human history. This paper proposes a novel digital image binarization scheme for low quality historical documents allowing further content exploitation in an efficient way. The proposed scheme consists of five distinct steps: a pre-processing procedure using a low-pass Wiener filter, a rough estimation of foreground regions using Niblack’s approach, a background surface calculation by interpolating neighboring background intensities, a thresholding by combining the calculated background surface with the original image and finally a post-processing step in order to improve the quality of text regions and preserve stroke connectivity. The proposed methodology works with great success even in cases of historical manuscripts with poor quality, shadows, nonuniform illumination, low contrast, large signal- dependent noise, smear and strain. After testing the proposed method on numerous low quality historical manuscripts, it has turned out that our methodology performs better compared to current state-of-the-art adaptive thresholding techniques.

139 citations


Network Information
Related Topics (5)
Vocabulary
44.6K papers, 941.5K citations
67% related
Sentence
41.2K papers, 929.6K citations
65% related
Image segmentation
79.6K papers, 1.8M citations
64% related
Grammar
33.8K papers, 767.6K citations
64% related
Natural language
31.1K papers, 806.8K citations
64% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20234
202222
202124
202035
201948
201838