scispace - formally typeset
Search or ask a question
Author

Xuhong Li

Bio: Xuhong Li is an academic researcher from New Jersey Institute of Technology. The author has contributed to research in topics: Image segmentation & Document clustering. The author has an hindex of 1, co-authored 1 publications receiving 24 citations.

Papers
More filters
Proceedings ArticleDOI
20 Sep 1999
TL;DR: Two learning methodologies are applied-learning from experience and an enhanced perceptron learning algorithm for domain-independent automatic document image understanding system with learning ability.
Abstract: Document image processing begins at the OCR phase with the difficulty of automatic document analysis and understanding. Most existing systems only do well in their specific application domains. In this paper, we describe a domain-independent automatic document image understanding system with learning ability. A segmentation method based on "logical closeness" is proposed. A novel and natural representation of document layout structure-a directed weight graph (DWG)-is described. To classify a given document, a string representation matching algorithm is applied first, instead of comparing all the sample graphs. A frame template and a document type hierarchy (DTH) are used to represent the document's logical structure and the hierarchical relationships among these frame templates, respectively. In this paper, two learning methodologies are applied-learning from experience and an enhanced perceptron learning algorithm.

24 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: In this article, the authors present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents, from geometric features and spatial relations to the textual features and content are employed in the analysis.
Abstract: We present a document analysis system able to assign logical labels and extract the reading order in a broad set of documents. All information sources, from geometric features and spatial relations to the textual features and content are employed in the analysis. To deal effectively with these information sources, we define a document representation general and flexible enough to represent complex documents. To handle such a broad document class, it uses generic document knowledge only, which is identified explicitly. The proposed system integrates components based on computer vision, artificial intelligence, and natural language processing techniques. The system is fully implemented and experimental results on heterogeneous collections of documents for each component and for the entire system are presented.

140 citations

01 Jan 2000
TL;DR: This paper presents a hybrid and comprehensive approach to document structure analysis that makes use of layout as well as textual features of a given document to express fuzzy matched rules of an underlying rule base.
Abstract: Document image processing is a crucial process in the office automation and begins from the ’OCR’ phase with difficulty of the document ’analysis’ and ’understanding’. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense, that it makes use of layout (geometrical) as well as textual features of a given document. These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base.

99 citations

Journal ArticleDOI
TL;DR: This paper presents a hybrid and comprehensive approach to document structure analysis that makes use of layout as well as textual features of a given document, which allows an easy adaptation to specific domains with their specific logical objects.
Abstract: Document image processing is a crucial process in office automation and begins at the ‘OCR’ phase with difficulties in document ‘analysis’ and ‘understanding’ This paper presents a hybrid and comprehensive approach to document structure analysis Hybrid in the sense that it makes use of layout (geometrical) as well as textual features of a given document These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base Rules can be formulated based on features which might be observed within one specific layout object However, rules can also express dependencies between different layout objects In addition to its rule driven analysis, which allows an easy adaptation to specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common objects (eg, lists)

41 citations

01 Jan 2002
TL;DR: In the next section, this chapter defines an abstract propositional formal language to express qualitative spatial relations among document objects to formally express document encoding rules.
Abstract: formal languages can also serve as document encoding languages, for instance, first-order logic. The syntax and semantics are the usual ones for firstorder logic, taking special care in giving adequate semantics to spatial relations and predicates. A final example of a general document encoding rule stated informally in natural language is the following: “in the Western culture, documents are usually read top-bottom and left-right.” (7.1) A problem of stating rules in natural language is ambiguity. In fact, we do not know if one should interpret the “and” as commutative or not. Should one first go top-bottom and then left-right? Or, should one apply any of the two interchangeably? It is not possible to say from the rule merely stated in natural language. In the next section, we define an abstract propositional formal language to express qualitative spatial relations among document objects to formally express document encoding rules. 7.3.2 Relations adequate for documents Considering relations adequate for documents and their components, requires a preliminary formalization step. This consists of regarding a document as a formal model. At this level of abstraction a document is a tuple 〈D,R, l〉 of document objectsD, a binary relationR, and a labeling functionl. Each document object d ∈ D consists of the coordinates of its bounding box (defined as the smallest rectangle containing all elements of that object) D = {d | d = 〈id, x1, y1, x2, y2〉} where id is an identifier of the document object and (x1, y1) (x2, y2) represent the upper-left corner and the lower-right corner of the bounding box of the document object. In addition, we consider the logical labeling information. Given a set of labels L, logical labeling is a functionl, typically injective, from document objects to labels: l : D → L In the following, we consider an instance of such a model where the set of relations R is the set of bidimensional Allen relations and where the set of labels L is {title, body of text, figure, caption, footer, header, page number, graphics }. We shall refer to this model as a spatial [bidimensional Allen] model.Bidimensional Allen relations consist of 13 ×13 relations: the product of Allen’s 13 interval relations [Allen, 1983, van Benthem, 1983b] on two orthogonal axes. (Consider an inverted coordinate system for each document with origin (0,0) in the left-upper corner. The x axis spans horizontally increasing to the right, while the y axis spans vertically towards the bottom.) Each relation r ∈ A is a tuple of Allen interval relations of the 134 • Chapter 7. THICK 2D RELATIONS FOR DOCUMENT UNDERSTANDING form: precedes,meets, overlaps, starts, during, finishes, equals, andprecedes i, meets i, overlaps i, starts i, during i, finishes i. We shall refer to the set of Allen bidimensional relations simply as A and to the propositional language over bidimensional Allen relations asL the remainder of the chapter. Since Allen relations are jointly exhaustive and pairwise disjoint, so is A. This implies that given any two document objects there is one and only one A relation holding among them.

31 citations

Proceedings ArticleDOI
03 Aug 2003
TL;DR: The characteristics of data, knowledge, and information are described in order to describe their synergetic inter-weaving and the inherentcomplexity of sub-problems of document understanding is structure.
Abstract: In this paper I will try to explain the nature of documentunderstanding in all of its dimensions. Therefore I willfirst describe the characteristics of data, knowledge, andinformation in order to describe their synergetic inter-weaving.After that I will try to structure the inherentcomplexity of sub-problems of document understandingwhich may not be solved serially, but rather are attributesof individual documents. Thus, this paper focuses onsystem engineering challenges. However, I will showsome recent work done on the different topics and givesome insights in the individual techniques we chose atDFKI.

25 citations