scispace - formally typeset
Patent

Adaptively weighted, partitioned context edit distance string matching

TLDR
In this article, a pattern is partitioned into context and value components, and candidate matches for each of the components is identified by calculating an edit distance between that component and each potentially matching set (sub-string) of symbols within the string.
Abstract
A system and method for examining a string of symbols and identifying portions of the string which match a predetermined pattern using adaptively weighted, partitioned context edit distances. A pattern is partitioned into context and value components, and candidate matches for each of the components is identified by calculating an edit distance between that component and each potentially matching set (sub-string) of symbols within the string. One or more candidate matches having the lowest edit distances are selected as matches for the pattern. The weighting of each of the component matches may be adapted to optimize the pattern matching and, in one embodiment, the context components may be heavily weighted to obtain matches of a value for which the corresponding pattern is not well defined. In one embodiment, an edit distance matrix is evaluated for each of a prefix component, a value component and a suffix component of a pattern. The evaluation of the prefix matrix provides a basis for identifying indicators of the beginning of a value window, while the evaluation of the suffix matrix provides a basis for identifying the alignment of the end of the value window. The value within the value window can then be evaluated via the value matrix to determine a corresponding value match score.

read more

Citations
More filters
PatentDOI

Language input architecture for converting one text form to another text form with tolerance to spelling, typographical, and conversion errors

TL;DR: In this article, a language input architecture has a search engine, one or more typing models, a language model, and lexicons for different languages, which converts input strings of phonetic text to an output string of language text.
Patent

Systems, methods and apparatus for distributed decision processing

TL;DR: In this paper, two custom computing apparatuses are used to resolve the satisfiability of a logical formula and provide an example, for the sole purpose of complying with the Abstract requirement rules, with the explicit understanding that it will not be used to interpret or limit the scope or the meaning of the claims.
Patent

Language conversion and display

TL;DR: In this article, a language input architecture has a user interface that displays the output text and unconverted input text in line with one another and enables in-place editing or error correction without requiring the user to switch modes from an entry mode to an edit mode.
Patent

Identifying alternative spellings of search strings by analyzing self-corrective searching behaviors of users

TL;DR: In this paper, a spelling change analyzer was used to identify useful alternative spellings of search strings submitted to a search engine, as detected by programmatically analyzing search histories of a population of search engine users.
Patent

Systems and methods for data indexing and processing

TL;DR: In this article, a document file may be matched using pattern matching methods and may include comparisons with a comparison reference database to improve or accelerate the indexing process, and information may be presented to a user as potential matches thereby improving manual indexing processes.
References
More filters
Patent

Method and apparatus for improved tokenization of natural language text

TL;DR: In this paper, a parser that extracts characters from the stream of text, an identifying element for identifying a token formed of characters in the stream, and a filter for assigning tags to those tokens requiring further linguistic analysis.
Patent

Non-literal textual search using fuzzy finite-state linear non-deterministic automata

TL;DR: In this article, a metric-based Fuzzy finite-state non-deterministic automation is used to selectively retrieve information contained in a stored document set using a generalized regular search expression from a user.