scispace - formally typeset
Patent

Information management and retrieval

Weeks Richard
Reads0
Chats0
TLDR
In this article, a method and apparatus for extracting key terms from a data set, including the steps of identifying a first set of one or more word groups of words that occur more than once in the data set and removing from this first set a second set of word groups that are sub-strings of longer word groups in the first set.
Abstract
A method and apparatus is provided for extracting key terms from a data set, the method including the steps of identifying a first set of one or more word groups of one or more words that occur more than once in the data set, and removing from this first set a second set of word groups that are sub-strings of longer word groups in the first set The remaining word groups are key terms Each word group is weighted according to its frequency of occurrence within the data set The weighting of any word group may be increased by the frequency of any sub-string of words occurring in the second set and then dividing each weighting by the number of words in the word group This weighting process operates to determine the order of occurrence of the word groups Prefixes and suffixes are also removed from each word in the data set This produces a neutral form of each word so that the weighting values are prefix and suffix independent

read more

Citations
More filters
Patent

On-disk file format for a serverless distributed file system

TL;DR: A file format for a serverless distributed file system is composed of two parts: a primary data stream and a metadata stream as discussed by the authors, and each file is encrypted using a hash of the block as the encryption key.
Patent

Information processing apparatus, information processing method and program

TL;DR: In this article, a method for modifying an image is presented, which consists of displaying an image, the image comprising a portion of an object; determining if an edge of the object is in a location within the portion; and detecting movement in a member direction, of an operating member with respect to the edge.
Patent

Serverless distributed file system

TL;DR: In this paper, a serverless distributed file system manages the storage of files and directories using one or more directory groups, where the directories may be managed using Byzantine-fault-tolerant groups, whereas files are managed without using Byzantine fault tolerant groups.
Patent

Business card and contact management system

TL;DR: In this article, the authors present a system that accepts a Universal Contact Locator (UCL) as encoded on a business card in man and/or machine readable format and, when the UCL is entered into a client program running on a client computer, causes a web browser (or like program) and associated communications software to establish a communications session with a remote server computer.
Patent

Voice interface for a search engine

TL;DR: In this article, a system receives a voice search query from a user, derives one or more recognition hypotheses, each associated with a weight, from the voice search queries, and constructs a weighted boolean query using the recognition hypotheses.
References
More filters
Patent

Methods for generating or revising context vectors for a plurality of word stems

TL;DR: In this article, a method for generating context vectors for use in a document storage and retrieval system is presented, where a context vector is a fixed length list of component values generated to approximate conceptual relationships.
Patent

Content stream analysis

TL;DR: Content stream analysis is a user profiling technique that generates a user profile based on the content files selected and viewed by a user as mentioned in this paper, which can then be used to help select an advertisement or other media presentation to be shown to the user.
Patent

Searching large collections of text using multiple search engines concurrently

TL;DR: In this article, a plurality of text search engines based on substantially different computational searching techniques are combined into a single list of information items, and a ranking process ranks the items in the combined list by utilizing information item ordering data also received from each of the search engines as to the relevance of the information items output by the search engine to the user's request.
Patent

System and methods for searching and matching databases

TL;DR: In this paper, the Soundex function is used to convert elements to terms and then compared against an index of terms to determine which database records relate to the input search data through statistical analysis, match records are given a record weight which may be used to calculate how closely the input data actually is to each match record.
Patent

Software agent for comparing locally accessible keywords with meta-information and having pointers associated with distributed information

TL;DR: In this article, an agent-based access system uses keyword sets to locate information of interest to a user, together with user profiles such that pages being stored by one user can be notified to another whose profile indicates potential interest.