Showing papers on "Document processing published in 1998"

PDF

Open Access

Journal Article•DOI•

[...]

Bidyut B. Chaudhuri¹, Umapada Pal¹•Institutions (1)

01 Mar 1998-Pattern Recognition

TL;DR: A complete Optical Character Recognition (OCR) system for printed Bangla, the fourth most popular script in the world, is presented and extension of the work to Devnagari, the third most popular Script in the World, is discussed.

...read moreread less

381 citations

Journal Article•DOI•

The Indexing and Retrieval of Document Images

[...]

David Doermann¹•Institutions (1)

University of Maryland, College Park¹

01 Jun 1998-Computer Vision and Image Understanding

TL;DR: A survey of methods developed by researchers to access and manipulate document images without the need for complete and accurate conversion is provided.

...read moreread less

319 citations

Journal Article•DOI•

Scalable feature selection, classification and signature generation for organizing large text databases into hierarchical topic taxonomies

[...]

Soumen Chakrabarti¹, Byron Dom¹, Rakesh Agrawal¹, Prabhakar Raghavan¹•Institutions (1)

IBM¹

01 Aug 1998

TL;DR: An automatic system that starts with a small sample of the corpus in which topics have been assigned by hand, and then updates the database with new documents as the corpus grows, assigning topics to these new documents with high speed and accuracy is described.

...read moreread less

Abstract: We explore how to organize large text databases hierarchically by topic to aid better searching, browsing and filtering. Many corpora, such as internet directories, digital libraries, and patent databases are manually organized into topic hierarchies, also called taxonomies. Similar to indices for relational data, taxonomies make search and access more efficient. However, the exponential growth in the volume of on-line textual information makes it nearly impossible to maintain such taxonomic organization for large, fast-changing corpora by hand. We describe an automatic system that starts with a small sample of the corpus in which topics have been assigned by hand, and then updates the database with new documents as the corpus grows, assigning topics to these new documents with high speed and accuracy. To do this, we use techniques from statistical pattern recognition to efficiently separate the feature words, or discriminants, from thenoise words at each node of the taxonomy. Using these, we build a multilevel classifier. At each node, this classifier can ignore the large number of “noise” words in a document. Thus, the classifier has a small model size and is very fast. Owing to the use of context-sensitive features, the classifier is very accurate. As a by-product, we can compute for each document a set of terms that occur significantly more often in it than in the classes to which it belongs. We describe the design and implementation of our system, stressing how to exploit standard, efficient relational operations like sorts and joins. We report on experiences with the Reuters newswire benchmark, the US patent database, and web document samples from Yahoo!. We discuss applications where our system can improve searching and filtering capabilities.

...read moreread less

292 citations

Patent•

User-defined search template for extracting information from documents

[...]

Takashi Saito, Yasushi Abe, Tsukasa Kochi

13 Aug 1998

TL;DR: In this paper, a document processing system includes a database generation unit for generating predetermined bases from documents and a user-defined search template generator for generating a user defined search template which is used to extract a predetermined set of information from substantially similar documents.

...read moreread less

Abstract: The document processing system includes a database generation unit for generating predetermined bases from documents and a user-defined search template generation unit for generating a user-defined search template which is used to extract a predetermined set of information from substantially similar documents. The user-defined search templates are efficiently generated without any intervention from a technical support personnel.

...read moreread less

101 citations

Patent•

Digital camera and document processing system using the digital camera

[...]

Hideo Honma¹•Institutions (1)

Canon Inc.¹

08 Dec 1998

TL;DR: In this article, a document is divided into blocks and each block is sensed by a CCD, and perspective correction is performed on the image data of each of a plurality of images obtained by divisionally sensing the document.

...read moreread less

Abstract: A digital camera which performs accurate document reading, and is used in a document processing system. In the document processing system, a document is divided into blocks and each block is sensed by a CCD, and perspective correction is performed on the image data of each of a plurality of images obtained by divisionally sensing the document. An OCR process is performed on the corrected image data to convert the corrected image data to text data. The converted text data, corresponding to the image data of each of the plurality of images, is combined into one text data, and the combined text data is output for printing.

...read moreread less

85 citations

Journal Article•DOI•

Classification of machine-printed and handwritten texts using character block layout variance☆

[...]

Kuo Chin Fan¹, Liang Shen Wang¹, Yin Tien Tu¹•Institutions (1)

National Central University¹

01 Sep 1998-Pattern Recognition

TL;DR: A machine-printed and handwritten text classification method to automatically identify the identity of texts segmented from a document image to facilitate later optical character recognition task.

...read moreread less

78 citations

Patent•

Document processor and document processing method

[...]

Yasuto Ishitani, 康人石谷

27 Feb 1998

TL;DR: In this article, the problem of generating a structured document by setting in appropriate places document logic elements other than sentence such as graphs, and tables, contained in a printing document consisting of a plurality of pages is addressed.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To generate a structured document such as an XML (extensible markup language) document and an HTML (hypertext markup language) document, by setting in appropriate places document logic elements other than sentence such as graphs, and tables, contained in a printing document consisting of a plurality of pages. SOLUTION: The device extracts a paragraph area and a graph area by analyzing document graphs in layout corresponding to a printing document with a layout analyzing part 11 while segmenting characters in the paragraph area to recognize and process with a character recognizing part 12. It extracts a document logic element area from the paragraph area by providing a character recognizing result and a layout analyzing result to a document logic element extracting part 13, and carries out order setting respectively to a document logic element area and a graph/table area with a reading order setting part 14. Then, it extracts a document structure by grouping respectively the document logic element area and the graph/table area with a document structure analyzing part 16, and generates the structure document by changing the appearance position of an area corresponding to the document logic elements other than sentence in the document structure and providing to a document output part 17. COPYRIGHT: (C)2004,JPO

...read moreread less

66 citations

A Robust Practical Text Summarization

[...]

Tomek Strzalkowski¹, Jin Wang, G. Bowden Wise•Institutions (1)

General Electric¹

01 Jan 1998

TL;DR: The SummarizerTool is described, a Java-implemented prototype, and its applications in various document processing tasks, including news reports, government documents, and even court records.

...read moreread less

Abstract: We present an automated method of generating human-readable summaries from text documents such as news, technical reports, government documents, and even court records. Our approach exploits an empirical observation that much of the written text display certain regularities of organization and style, which we call the Discourse Macro Structure (DMS). A summary is therefore created to reflect the con.ponents of a given DMS. In order to produce ~ roherent and readable summary we select continuoa~, well-formed passages from the source document and assemble them into a mini-document within a DMS template. In this paper we describe the SummarizerTool, a Java-implemented prototype, and its applications in various document processing tasks.

...read moreread less

64 citations

Journal Article•DOI•

Speeding up Chinese character recognition in an automatic document reading system

[...]

Yi-Hong Tseng¹, Chi-Chang Kuo¹, Hsi-Jian Lee¹•Institutions (1)

National Chiao Tung University¹

01 Nov 1998-Pattern Recognition

TL;DR: Two techniques for speeding up character recognition are presented, including the candidate-cluster selection and modified branch-and-bound detail-matching modules, which are integrated in the Windows-based document reading system, which provides a user-friendly environment.

...read moreread less

48 citations

Patent•

User-controlled document processing

[...]

Itzhak Pomerantz, Emanuel Menczer, Cohen Ram

27 Feb 1998

TL;DR: A system and method for selectable encryption of documents employing a document generator and a user-controlled document encryptor operative to encrypt user-selector portions of a document generated on document generator is described in this article.

...read moreread less

Abstract: A system and method for selectable encryption of documents employing a document generator and a user-controlled document encryptor operative to encrypt user-selector portions of a document generated on document generator.

...read moreread less

45 citations

Patent•

Method of processing documents in an image-based document processing system and an apparatus therefor

[...]

Hui Wu¹, Stewart B. Kelland¹, Neil P. Boyd¹•Institutions (1)

NCR Corporation¹

01 Jul 1998

TL;DR: In this article, a method of processing documents in an image-based document processing system was proposed to associate recognition results from a primary source results list with corresponding results from the secondary source results lists.

...read moreread less

Abstract: A method of processing documents in an image-based document processing system to associate recognition results from a primary source results list with corresponding recognition results from a secondary source results list to improve assistance to an operator of the image-based document processing system during, operation of the image-based document processing system comprises the steps of (a) scanning a first type of document to obtain scanned data representative thereof, (b) scanning a second type of document to obtain scanned data representative thereof, (c) processing scanned data representative of the first type of document to provide recognition results associated with the first type of document, (d) processing scanned data representative of the second type of document to provide recognition results associated with the second document, (e) storing recognition results associated with the first type of document in a primary list, (f) storing recognition results associated with the second type of document in a secondary list, (g) comparing recognition results from the primary list with recognition results from the secondary list to determine if an exact match occurs and thereby to associate a first set of recognition results from the primary list and a first set of recognition results from the secondary list, and (h) comparing recognition results from the primary list with recognition results from the secondary list to determine if an approximate match occurs when an exact match fails to occur in step (g) and thereby to associate a second set of recognition results from the primary list and a second set of recognition results from the secondary list.

...read moreread less

Patent•

Transmission document editing device, a server device in a communication document processing system, and a computer-readable record medium that stores the function thereof

[...]

Toshihiro Hishida¹, Hidetaka Ohto¹•Institutions (1)

Panasonic¹

05 Oct 1998

TL;DR: A transmission document edition device edits a transmission document to be transmitted to a variety of mobile communication terminals from a document described in a markup language as mentioned in this paper, and a document content storage unit stores a document, including a plurality of document elements.

...read moreread less

Abstract: A transmission document edition device edits a transmission document to be transmitted to a variety of mobile communication terminals from a document described in a markup language. A document content storage unit stores a document including a plurality of document elements to be transmitted. A device input/output information storage unit stores a plurality of pieces of device input/output information that indicate the document elements to be transmitted for a plurality of types of mobile communication terminal. A transmission document creation unit creates a transmission document including the document and the plurality of pieces of device input/output information.

...read moreread less

Proceedings Article•DOI•

Automatic processing of document annotations

[...]

Jacob Stevens¹, Andrew H. Gee¹, Christopher R. Dance²•Institutions (2)

University of Cambridge¹, Xerox²

16 Sep 1998

TL;DR: A system for reliably establishing correspondences between printed words and their electronic counterparts, without performing optical character recognition, which might have interesting applications in document database retrieval, since it allows an electronic document to be indexed by a printed version of itself.

...read moreread less

Abstract: A common authoring technique involves making annotations on a printed draft and then typing the corrections into a computer at a later date. In this paper, we describe a system that goes some way towards automating this process. The author simply passes the annotated documents through a sheetfeed scanner and then brings up the electronic document in a text editor. The system then works out where the annotated words are and allows the author to skip from one annotation to the next at the touch of a key. At the heart of the system lies a procedure for reliably establishing correspondences between printed words and their electronic counterparts, without performing optical character recognition. This procedure might have interesting applications in document database retrieval, since it allows an electronic document to be indexed by a printed version of itself.

...read moreread less

Patent•

Apparatus and method for processing various form documents to meet respective form,and recording medium storing a program to execute the process

[...]

Katsumata Yutaka, Takayuki Matsui, Kazutoshi Yamoto, Hirata Masaki, Kazumi Yamaoka, Asano Ryoei - Show less +2 more

27 Mar 1998

TL;DR: In this article, a form document processing apparatus for entry of data described in form documents is described, which includes a main processing section for performing predetermined processing operations for entered form data, and an auxiliary processing section which, upon receiving a request from the main processing, performs, in an auxiliary manner, a specific processing operation determined by the contents of form data.

...read moreread less

Abstract: There is disclosed a form document processing apparatus for entry of data described in form documents. The form document processing apparatus includes a main processing section for performing predetermined processing operations for entered form data, and an auxiliary processing section which, upon receipt of a request from the main processing section, performs, in an auxiliary manner, among the processing operations to be performed by the main processing section, a specific processing operation determined by the contents of form data. Therefore, specific processing operations determined by the contents of form data can be performed without a need to develop a program for each type of task.

...read moreread less

Proceedings Article•DOI•

Automatic detection of italic, bold and all-capital words in document images

[...]

Bidyut B. Chaudhuri, Utpal Garain

16 Aug 1998

TL;DR: A statistical study reveals that the detection of italic, bold and all-capital words may play a key role in automatic information retrieval from documents and can be used to improve the recognition accuracy of a text recognition system.

...read moreread less

Abstract: We propose simple and fast algorithms for detection of italic, bold and all-capital words without doing actual character recognition. We present a statistical study which reveals that the detection of such words may play a key role in automatic information retrieval from documents. Moreover, detection of italic words can be used to improve the recognition accuracy of a text recognition system. Considerable number of document images have been tested and our algorithms give accurate results on all the tested images, and the algorithms are very easy to implement.

...read moreread less

Patent•

Document processing method and system, and computer-readable recording medium having document processing program recorded therein

[...]

Yuki Aoyama¹, Yukie Takita¹, Toru Takahashi¹, Yukio Hoshi¹•Institutions (1)

Hitachi¹

06 Jul 1998

TL;DR: In this article, a computer-implemented method and system for processing a document such as a structured document in which information such as term, name and belonging department is used as shared information and word consistency or modification can be automatically and easily reflected on all documents.

...read moreread less

Abstract: A computer-implemented method and system for processing a document such as a structured document in which information such as a term, name and belonging department is used as shared information and word consistency or modification can be automatically and easily reflected on all documents. In the document processing method, a shared information editing program edits shared information shared information frequently described in a plurality of documents, a shared information storage program stores the edited shared information in a secondary memory, a shared information list-up program lists up the shared information for each information type, a structured document editing program edits a structured document to describe a link to the shared information selected from the edited shared information listed up, a structured document storage program stores the structured document in the secondary memory, and a structured document output program reads out the shared information and structured document from the secondary memory and embeds the contents of the shared information in the structured document for its display or printout.

...read moreread less

Journal Article•DOI•

Symbolic Compression and Processing of Document Images

[...]

Omid Kia¹, David Doermann¹, Azriel Rosenfeld¹, Rama Chellapa¹•Institutions (1)

University of Maryland, College Park¹

01 Jun 1998-Computer Vision and Image Understanding

TL;DR: A novel encoding scheme is provided that facilitates scalable lossy compression and progressive transmission and supports document image analysis in the compressed domain and a class of document image understanding tasks that operate on the compressed representation.

...read moreread less

Patent•

Speech recognition method and system for recognizing single or un-correlated Chinese characters

[...]

Donald T. Tang¹, Li Qin Shen¹, Xiao Jin Zhu¹•Institutions (1)

IBM¹

28 Aug 1998

TL;DR: In this article, a Chinese speech recognition method and system for single or un-correlated Chinese character(s) is presented. But the method uses various types of Character Description Language (CDL) to describe the single or non-corrrelated Chinese characters to be inputted, and the system uses CDL grammar directed speech recognizer to accept CDLs which are inputted by voice.

...read moreread less

Abstract: A Chinese speech recognition (SR) method and system for single or un-correlated Chinese character(s). The method uses various types of Character Description Language (CDL) to describe the single or un-correlated Chinese character(s) to be inputted. The SR system uses CDL grammar directed speech recognizer to accept CDLs, which are inputted by voice. On the basis of analysis of CDL parser, the character generator gives a corresponding character. Therefore, recognition of single or un-correlated Chinese character(s) out of context can be made reliably.

...read moreread less

Patent•

Displaying multiple document abstracts in a single hyperlinked abstract, and their modified source documents

[...]

Kenji Ono¹, Hideki Hirakawa¹, Kazuo Sumita¹•Institutions (1)

Toshiba¹

30 Jan 1998

TL;DR: In this article, a computerized document processing apparatus for creating an abstract includes document storage for storing a computerised document, keyword storage and an abstract creation section for creating abstract by extracting at least a character string containing a keyword stored in the keyword storage section from the computerized documents stored in a document storage section.

...read moreread less

Abstract: A computerized document processing apparatus for creating an abstract includes document storage for storing a computerized document, keyword storage for storing keywords, an abstract creation section for creating an abstract by extracting at least a character string containing a keyword stored in the keyword storage section from the computerized document stored in the document storage section, a document modification section for modifying the computerized document to link the keyword in the computerized document with the same keyword in the abstract, and a display section for displaying the abstract and the modified document that is linked with the abstract. The modified document is displayed when the linked keyword in the abstract is selected.

...read moreread less

Proceedings Article•DOI•

A recognition-based Arabic optical character recognition system

[...]

A. Cheung¹, Mohammed Bennamoun, Neil W. Bergmann•Institutions (1)

Queensland University of Technology¹

11 Oct 1998

TL;DR: A recognition-based Arabic OCR system that consists of the image acquisition, preprocessing, segmentation, character fragmentation, combination of character fragments, feature extraction, and classification.

...read moreread less

Abstract: Optical character recognition systems improve human-machine interaction and are widely used in many government and commercial departments. After forty years of intensive research, OCR systems for most scripts are well developed. However, not for Arabic script. Since Arabic is a popular script, Arabic OCR systems should have great commercial value. Thus a recognition-based Arabic OCR system is proposed in this paper. It consists of the image acquisition, preprocessing, segmentation, character fragmentation, combination of character fragments, feature extraction, and classification. A signal is fed back to improve and determine the segmentation/recognition result. The system has been implemented and it has 90% recognition accuracy with a 20 chars/sec recognition rate.

...read moreread less

Patent•

Document processing apparatus for adding predetermined design types to an original document

[...]

Keiichi Imamura¹•Institutions (1)

Casio¹

27 Jan 1998

TL;DR: In this article, a preselected decoration is made on document data using a CPU that analyzes document structures of the overall document in unit of a document structural element, and extracts a predetermined structural element from these analyzed structural elements as a design element to be designed.

...read moreread less

Abstract: In a document processing apparatus equipped with a computer program storage medium, a preselected decoration is made on document data. A CPU analyzes document structures of the overall document in unit of a document structural element, and extracts a predetermined structural element from these analyzed structural elements as a design element to be designed. Then, the CPU retrieves a table contained in a RAM based on an attribute of this design element. This table fixedly stores specific decoration information with respect to each of the attributes of the design elements. The CPU retrieves the decoration information corresponding to the attribute of the extracted design element, and then decorated the design element based on this decoration information. As a result, a predetermined decoration can be made on the document data.

...read moreread less

Multivalent Documents: A New Model for Digital Documents

[...]

Robert Wilensky, Thomas A. Phelps

13 Mar 1998

TL;DR: The multivalent document model enables one to better use digital documents for tasks in which paper documents are still otherwise superior to digital documents, such as annotating someone else''s document.

...read moreread less

Abstract: "Multivalent documents" is a model of documents that addresses some of the shortcomings one currently encounters when manipulating documents in digital form. In the multivalent document model, a document is composed out of distributed data and program resources, called layers and behaviors, respectively. The model exposes virtually all aspects of document processing to behaviors, and provides the means to compose these components into a single coherent document. Behaviors allow the model to be highly extensible, including the capability to be extended to work with arbitrary document formats. We have implemented the model in Java, and developed behaviors that support multiple document types (scanned page images, HTML, and ASCII) and a number of different user-interface metaphors (e.g., "lenses" and "Notemarks"). The multivalent document model enables one to better use digital documents for tasks in which paper documents are still otherwise superior to digital documents, such as annotating someone else''s document. We have shown how the model is naturally conducive to realizing powerful forms of distributed, open annotation by implementing a variety of annotation types, some familiar and some novel.

...read moreread less

Journal Article•DOI•

The function of documents

[...]

David Doermann¹, Ehud Rivlin², Azriel Rosenfeld¹•Institutions (2)

University of Maryland, College Park¹, Technion – Israel Institute of Technology²

01 Aug 1998-Image and Vision Computing

TL;DR: In this article, the authors introduce the concept of document functionality, which attempts to describe the roles of documents and their components in the process of transferring information, and demonstrate how functional descriptions can be used to reverse-engineer the intentions of the author, to navigate in document space, and to provide important contextual information to aid in interpretation.

...read moreread less

Patent•

User interface identification and service tags for a document processing system

[...]

Leigh L. Klotz¹, Glen W. Petrie¹, Robert S. Bauer¹, Daniel Davies¹, Julia A. Craig¹ - Show less +1 more•Institutions (1)

Xerox¹

13 Nov 1998

TL;DR: In this article, a tag-based user interface scheme for digitizing and processing hardcopy documents utilizes a sticker that includes a printed data code representative of a user identity code and a service code.

...read moreread less

Abstract: A tag-based user interface scheme for digitizing and processing hardcopy documents utilizes a sticker that includes a printed data code representative of a user identity code and a service code. When the sticker is applied to a hardcopy document and scanned, the sticker is located, the data code is parsed, and a desired service is performed based upon the information stored in the data code.

...read moreread less

Patent•

Document conversion using an intermediate computer which retrieves and stores position information on document data

[...]

Naoko Ito¹•Institutions (1)

NEC¹

08 May 1998

TL;DR: In this paper, a document processing apparatus is implemented in a client/server system to add and modify a document conversion function without modification for either client or server, consisting of a client requesting acquisition or storage of a document, a server performing management such as transfer and storage of the document, network connecting the client and the server, and a proxy server existing on the network and relaying interaction between client and server.

...read moreread less

Abstract: A document processing apparatus implemented in a client/server system to add and modify a document conversion function without modification for either client or server. The apparatus comprises a client requesting acquisition or storage of a document, a server performing management such as transfer and storage of the document, a network connecting the client and the server, and a proxy server existing on the network and relaying interaction between the client and the server. The proxy server has a document data conversion section for performing conversion of the document based on the structure of document.

...read moreread less

Journal Article•DOI•

Machine-printed character recognition revisited : re-application of recent advances in handwritten character recognition research

[...]

Ahmad Fuad Rezaur Rahman¹, Michael Fairhurst¹•Institutions (1)

University of Kent¹

24 Aug 1998-Image and Vision Computing

TL;DR: This paper demonstrates how recent progress in the area of multiple-expert classification can be exploited to provide new approaches to the processing of printed data.

...read moreread less

Patent•

Document processing system and document processing method, and recording medium

[...]

Seiichiro Hayashi, Sadaichi Irimiya, 貞一入宮, 誠一郎林

27 Oct 1998

TL;DR: In this article, a secret area is designated in the document information via an area designation part 12, and a partial document information on the designated area is enciphered at an encipherment part 13 to obtain the information.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To make quickly and properly processable secret document information, to make clear the secret part of the secret document information and to make properly manageable the secret of the secret document information by its providing side. SOLUTION: A secret area is designated in the document information via an area designation part 12, and a partial document information on the designated area is enciphered at an encipherment part 13 to obtain the enciphered information. At a management information generation part 14, the address information on the designated area and the key information on the encipherment are stored in a management table and the management information is generated. Then, the part 14 replaces an object area included in the text of the document information, i.e., the designated area with the enciphered information via the part 13 and transmits or stores the enciphered information together with the management information.

...read moreread less

The MANICURE document processing system

[...]

K. Taghva, Allen Condit, J. Borsack, J. Kilburg, Changshi Wu, J. Gilbreth - Show less +2 more

01 Jan 1998

TL;DR: The MANICURE system as discussed by the authors is a document processing system that provides integrated facilities for creating electronic forms of printed materials and their implementation is described in detail in the paper "Manicure: A Document Processing System for creating Electronic Form of Printed Materials".

...read moreread less

Abstract: MANICURE is a document processing system that provides integrated facilities for creating electronic forms of printed materials. In this paper the functionalties supported by MANICURE and their implementations are described. In particular, we provide information on specific modules dealing with automatic detection and correction of OCR errors and automatic markup of logical components of the text. We further show that the various text formats produced by MANICURE can be used by web browsers and/or be manipulated by search routines to highlight the requested information on document images.

...read moreread less

Patent•

Transmission document editing device and reception document processor and server device for communication document processing system and computer readable recording medium for storing the same functions

[...]

Toshihiro Hishida, Hidetaka Oto, 英隆大戸, 利浩菱田

22 Sep 1998

TL;DR: In this article, a transmission document editing device edits a general document described in language with a mark into transmission document to be transmitted to various mobile communication terminals, and a simulation operation executing part 213 obtains the equipment input and output information of the designated terminal from the transmission document, selects the document elements suited to the selection condition, and generates display data.

...read moreread less

Abstract: PROBLEM TO BE SOLVED: To synthetically edit transmission documents to be transmitted to each kind of mobile communication terminal. SOLUTION: A transmission document editing device edits a general document described in language with a mark into a transmission document to be transmitted to various mobile communication terminals. A document content temporary storing part 201 stores a document constituted of plural document elements to be transmitted. An equipment input and output information storing part 202 stores equipment input and output information including the selection condition of the document elements for each kind of each terminal. A transmission document generating part 208 generates the transmission document in which the plural equipment input and output information is added to the general document. A simulation operation executing part 213 obtains the equipment input and output information of the designated terminal from the transmission document, selects the document elements suited to the selection condition, and generates display data. COPYRIGHT: (C)1999,JPO

...read moreread less

Book Chapter•DOI•

Spotting Topics with the Singular Value Decomposition

[...]

Charles Nicholas¹, Randall Dahlberg•Institutions (1)

University of Maryland, Baltimore County¹

29 Mar 1998-Lecture Notes in Computer Science

TL;DR: It is shown how the matrices produced by the SVD calculation can be interpreted, allowing us to spot patterns of characters that indicate particular topics in a corpus.

...read moreread less

Abstract: The singular value decomposition, or SVD , has been studied in the past as a tool for detecting and understanding patterns in a collection of documents. We show how the matrices produced by the SVD calculation can be interpreted, allowing us to spot patterns of characters that indicate particular topics in a corpus. A test collection, consisting of two days of AP newswire traffic, is used as a running example.

...read moreread less