A document classification and extraction system with learning ability
Citations
140 citations
99 citations
Cites methods from "A document classification and extra..."
...This file including its text, font and layout attributes can be used to conduct further analysis on the text and to find the "structured" parts of the document [5]-[11]....
[...]
41 citations
31 citations
25 citations
Cites methods from "A document classification and extra..."
...In general, they may be grouped into statistical methods [14, 18, 26], Neural Nets [16, 31, 51], decisions trees [25, 32], and rule learning techniques [10, 20, 33]....
[...]
...[21] [13] [40] [3] [5] [26] [37] [43] [44] [47] [9]...
[...]
References
466 citations
"A document classification and extra..." refers methods in this paper
...The rich text information including the text with its font and attributes can be used to conduct further analysis on the text and to find the "structured" parts of the document [5-9]....
[...]
152 citations
"A document classification and extra..." refers methods in this paper
...The rich text information including the text with its font and attributes can be used to conduct further analysis on the text and to find the "structured" parts of the document [5-9]....
[...]
106 citations
"A document classification and extra..." refers methods in this paper
...The rich text information including the text with its font and attributes can be used to conduct further analysis on the text and to find the "structured" parts of the document [5-9]....
[...]
75 citations
"A document classification and extra..." refers methods in this paper
...The rich text information including the text with its font and attributes can be used to conduct further analysis on the text and to find the "structured" parts of the document [5-9]....
[...]
30 citations
"A document classification and extra..." refers background or methods in this paper
...The usefulness and efficiency of a text processing system can be improved greatly by converting normal text representations into a new form adapted better to computer manipulation [1,2,4]....
[...]
...Definition 4.2: Document Type Hierarchy (DTH) A Frame Template is used to keep a record of logical meanings for a document....
[...]
...We organize frame templates of document types as a hierarchical structure, which is called a document type hierarchy (DTH) [1,2,4], based upon the generalization and specialization relations among the frame templates and their inheritance properties among them....
[...]
...The task of segmentation [1] is to separate the original document image file into several rectangular areas, also called blocks....
[...]