scispace - formally typeset
J

Jan O Pedersen

Researcher at Xerox

Publications -  19
Citations -  1475

Jan O Pedersen is an academic researcher from Xerox. The author has contributed to research in topics: Electronic document & Cluster analysis. The author has an hindex of 11, co-authored 19 publications receiving 1475 citations.

Papers
More filters
Patent

An iterative technique for phrase query formation and an information retrieval system employing same

TL;DR: In this paper, an information retrieval system and method are provided in which an operator inputs one or more query words which are used to determine a search key for searching through a corpus of documents, and which returns any matches between the search key and the corpora of documents as a phrase containing the word data matching the query word(s), a non-stop (content) word next adjacent to the matching word data, and all intervening stop-words between the matching data and the next adjacent nonstop word.
Patent

Hardcopy lossless data storage and communications for electronic document processing systems

TL;DR: Machine readable electronic domain definitions of hardcopy documents and/or of part or all of the transforms that are performed to produce and reproduce such hardcopies documents are encoded in codes that are printed on such documents, thereby permitting the electronic domain descriptions of such documents and or such transforms to be recovered more robustly and reliably when the information carried by such documents is transformed from the hardcopy domain to electronic domain this article.
Patent

Method and apparatus for information access employing overlapping clusters

TL;DR: In this paper, the authors present a method and apparatus for document clustering-based browsing of a corpus of documents, and more particularly to the use of overlapping clusters to improve recall.
Patent

User query generate search results that rank set of servers where ranking is based on comparing content on each server with user query, frequency at which content on each server is altered using web crawler in a search engine

TL;DR: In this paper, a system, computer readable medium and method for searching for recently altered documents on the World Wide Web is provided, which selects a server to be searched or crawled by a Web crawler based on a user selected ranking.
Patent

Method and apparatus for automatic document summarization

TL;DR: In this paper, sentences are classified by the use of words classified as stop words and vanish words, and sentences are scored based on the number of stop words in the sentence and the string of connected stop words, called stop-word runs.