scispace - formally typeset
Search or ask a question
Topic

Document layout analysis

About: Document layout analysis is a research topic. Over the lifetime, 1462 publications have been published within this topic receiving 34021 citations.


Papers
More filters
Proceedings ArticleDOI
01 Dec 2015
TL;DR: The proposed stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load, suggesting that the reader's eyes try to fixate on every phrase while reading.
Abstract: We propose a new electronic text format with a stepped-line layout to optimize viewing position and to improve the efficiency of reading Japanese text. Generally, the reader's eyes try to fixate on every phrase while reading Japanese text. To date, no method has been proposed to optimize the fixation position while reading. In case of spaced text such as English, the space characters provide the boundary information for eye movement, however, in case of Japanese text, reading speed decreases by inserting spaces between phrases. With the new stepped-line text format proposed in this report, a text line is segmented and stepped down between phrases, moreover, line breaks are present between phrases. To evaluate the effect of the stepped-line layout on the reading efficiency, we measured reading speeds and eye movements for both the new layout and a conventional straight-line layout. The reading speed for the new stepped-line layout is approximately 13% faster compared to the straight-line layout, whereas the number of fixations in the stepped-line layout is approximately 11% less than that in the straight-line layout. This is primarily achieved by a reduction in the number of regressions and an increase in the forward saccade length. Moreover, 91% of participants did not experience illegibility or incongruousness with the stepped-line layout reading, suggesting that the stepped-line layout is a new technique for improving the efficiency of eye movements while reading without any increase in cognitive load.

5 citations

Book ChapterDOI
08 Jan 1998
TL;DR: This paper proposes a categorization method on the basis of the classification and verification paradigm that divides various kinds of documents into appropriate document types stepwisely.
Abstract: In the knowledge-based document image understanding, it is important to distinguish the layout structures of individual documents exactly with a view to making use of adaptable document model. At least, the document models which are characterized heuristically by the application-specific layout structures are not always applicable to every document. In this paper, we propose a categorization method of various kinds of documents. Our categorization method on the basis of the classification and verification paradigm divides various kinds of documents into appropriate document types stepwisely. First, the classification procedure divides the given documents using rough features about documents, and then the verification procedure is applied to the globally categorized document sets, using the detail features.

5 citations

Journal ArticleDOI
TL;DR: This paper is to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method, and to formalize the concept of document recognition.
Abstract: Document recognition is a task in which a document in its physical presentation format is transformed into a structured author-oriented model of the document. The presentation format can be bitmaps of document pages, a description of the document in a Page Description Language (PDL), or encoding of the document in a printer or graphics language. The structured model is a format allowing for addition to the document, manipulation of the document, and reformating the layout and the output appearance of the document.Fully automatic document recognition is not possible, in general, for the same reason that it is not possible to de-translate computer programs automatically. However, it is possible to develop a man-assisted semi-automatic document recognition method. This method uses two passes. The first pass is completely automatic; it produces a document format called Interactive Document Model. The Interactive Document Model comprises recognized typesetting and descriptive structures together with derived ODA logical and layout structures for the document. The model generated in the first pass is enough for most purposes and applications. However, if it is not acceptable, the user can then enter the second pass and interactively edit the logical structure.This paper has three objectives. The first is to formalize the concept of document recognition. The second is to subdivide the problem of document recognition and classify it into a number of subproblems, each dealing with different aspects of the problem. The third objective is to introduce a problem which we wish to solve, and then to present a High Level Document Recognition method and the experience in developing and using a number of implementations of the method.

5 citations

Journal Article
TL;DR: A prototype system that allows readers to view an electronic text in multiple simultaneous views, providing insight at several different levels of granularity, including a reading view is described, combined with a number of tools for manipulating the text.
Abstract: This paper describes a prototype system that allows readers to view an electronic text in multiple simultaneous views, providing insight at several different levels of granularity, including a reading view. This prospect display is combined with a number of tools for manipulating the text, for example by highlighting sections of interest for a particular task. The result is a powerful approach to working with electronic text for various purposes: sample scenarios are outlined involving directors reading scripts, students studying novels, and second-language learners familiarizing themselves with grammatical constructions. Introduction Digital text offers software developers and designers the opportunity to provide readers with a variety of new perceptual experiences and possibilities for action that have simply not been available through printed texts (Bork, 1983). An obvious example is the widespread adoption of digital texts connected by hyperlinks and identified by many theorists as a significant change in the way people are able to interact with the written word (Bolter, 1991; Landow, 1994, etc.). However, many other new affordances of digital text remain to be identified, developed and studied. One of these possible new affordances is the ability to have text or layout features change over time (Chang et al, 1988; Ford et al, 1997). In kinetic text research, traditionally static design elements such as font, size, leading, color and placement can all be used dynamically to achieve layout effects that were previously available only in non-interactive media such as film (Lee et al, 2002). This project extends research in hypertext and kinetic text theory to provide readers with a text document display that combines simultaneous prospect - an overview of the entire text - and detail views, with related tools. Much as architectural blueprints allow the person reading them to get a sense of an entire building or some key feature, such as the wiring or the ventilation, allowing readers to see an entire text at once (that is, providing text prospect) has perceptual advantages. These advantages, which we will explore in this paper, are not available in cases where the text can only be accessed sequentially. The system also includes related tools that allow the reader to carry out new kinds of actions that would not otherwise be available. From hypertext theory comes the concept of associated text elements, where interaction with one text moves the reader into a related text. However, zooming through prospect views differs from a hypertextual implementation in that there are no predefined links between views. Hypertext is also predicated on the concept of connecting lexia or individual documents, so that following a link has the effect of visually replacing the source text with the destination text. In this project the text is treated as a stable whole and presented so as to minimize interruptions to the reader's literary engagement with the text (Miall, 1999). Kinetic text theory contributes the notion of a system where text characteristics change as a way of responding to reader interests. In this case, the reader has the ability to identify the portion of the whole text that will display in the reading view. There is also the capacity to highlight specific passages in the entire text, by selecting the features from a set of choices that derive from the tagging available in the document. Finally, in cases where this system has been integrated with related digital reading tools, additional kinetic features may be possible, as in the Watching the Script prototype (Ruecker et al., 2004), where the reader views the script by watching it scroll at various character positions on stage._figure1 The Multi-level Document Visualization Prototype In the Multi-level Document Visualization prototype that we have developed, the prospect view indexes a fisheye reading view, where a segment of text of about a dozen lines is shown at full size, while adjacent text is displayed as increasingly smaller lines of microtext (Small, 1996; Furnas, 1986; Bederson, 2000). …

5 citations

Proceedings ArticleDOI
08 Feb 2015
TL;DR: The proposed method first extracts footnotes and figure captions, and then matches them with their corresponding references within a document, and leverages results from the matching process to provide feedback to the identification process and further improve the algorithm accuracy.
Abstract: Cross-references, such like footnotes, endnotes, figure/table captions, references, are a common and useful type of page elements to further explain their corresponding entities in the target document. In this paper, we focus on cross-reference identification in a PDF document, and present a robust method as a case study of identifying footnotes and figure references. The proposed method first extracts footnotes and figure captions, and then matches them with their corresponding references within a document. A number of novel features within a PDF document, i.e., page layout, font information, lexical and linguistic features of cross-references, are utilized for the task. Clustering is adopted to handle the features that are stable in one document but varied in different kinds of documents so that the process of identification is adaptive with document types. In addition, this method leverages results from the matching process to provide feedback to the identification process and further improve the algorithm accuracy. The primary experiments in real document sets show that the proposed method is promising to identify cross-reference in a PDF document.

5 citations


Network Information
Related Topics (5)
Feature extraction
111.8K papers, 2.1M citations
82% related
Feature (computer vision)
128.2K papers, 1.7M citations
82% related
Object detection
46.1K papers, 1.3M citations
81% related
Image segmentation
79.6K papers, 1.8M citations
80% related
Convolutional neural network
74.7K papers, 2M citations
79% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20235
202219
202134
202019
201914
20189