Showing papers on "Approximate string matching published in 1997"

PDF

Open Access

Book•

Algorithms on Strings, Trees and Sequences: Computer Science and Computational Biology

[...]

Dan Gusfield¹•Institutions (1)

01 Jan 1997

TL;DR: In this paper, the authors introduce suffix trees and their use in sequence alignment, core string edits, alignments and dynamic programming, and extend the core problems to extend the main problems.

...read moreread less

Abstract: Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms 2. Exact matching: classical comparison-based methods 3. Exact matching: a deeper look at classical methods 4. Semi-numerical string matching Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees 6. Linear time construction of suffix trees 7. First applications of suffix trees 8. Constant time lowest common ancestor retrieval 9. More applications of suffix trees Part III. Inexact Matching, Sequence Alignment and Dynamic Programming: 10. The importance of (sub)sequence comparison in molecular biology 11. Core string edits, alignments and dynamic programming 12. Refining core string edits and alignments 13. Extending the core problems 14. Multiple string comparison: the Holy Grail 15. Sequence database and their uses: the motherlode Part IV. Currents, Cousins and Cameos: 16. Maps, mapping, sequencing and superstrings 17. Strings and evolutionary trees 18. Three short topics 19. Models of genome-level mutations.

...read moreread less

3,904 citations

Book•

Pattern matching algorithms

[...]

Alberto Apostolico¹, Zvi Galil²•Institutions (2)

Purdue University¹, Tel Aviv University²

29 May 1997

TL;DR: This tutorial jumps right in to the meat of the book without dragging you through the basic concepts of programming.

...read moreread less

Abstract: 1. Off-Line Serial Exact String Searching 2. Off-Line Parallel Exact String Searching 3. On-Line String Searching 4. Serial Computations of Levenshtein Distances 5. Parallel Computations of Levenshtein Distances 6. Approximate String Searching 7. Dynamic Programming: Special Cases 8. Shortest Common Superstrings 9. Two Dimensional Matching 10. Suffix Tree Data Structures for Matrices 11. Tree Pattern Matching

...read moreread less

307 citations

Patent•

Document search and retrieval system with partial match searching of user-drawn annotations

[...]

Daniel P. Lopresti¹, Yue Ma¹, Jian Zhou¹•Institutions (1)

Panasonic¹

25 Jan 1997

TL;DR: In this paper, a document browser for electronic filing systems, which supports pen-based markup and annotation, is described, where the user may electronically write notes (60-64) anywhere on a page (32, 38) and then later search for those notes using the approximate ink matching (AIM) technique.

...read moreread less

Abstract: In summary there is disclosed a document browser for electronic filing systems, which supports pen-based markup and annotation. The user may electronically write notes (60-64) anywhere on a page (32, 38) and then later search for those notes using the approximate ink matching (AIM) technique. The technique segments (104) the user-drawn strokes, extracts (108) and vector quantizes (112) features contained in those strokes. An edit distance comparison technique (118) is used to query the database (120), rendering the system capable of performing approximate or partial matches to achieve fuzzy search capability.

...read moreread less

299 citations

Approximate tree pattern matching

[...]

Dennis Shasha, Kaizhong Zhang

01 Jan 1997

130 citations

Journal Article•DOI•

Block edit models for approximate string matching

[...]

Daniel P. Lopresti¹, Andrew Tomkins¹•Institutions (1)

Princeton University¹

15 Jul 1997-Theoretical Computer Science

TL;DR: This paper examines string block edit distance, in which two strings A and B are compared by extracting collections of substrings and placing them into correspondence, and shows that several variants are NPcomplete and give polynomial-time algorithms for solving the remainder.

...read moreread less

118 citations

An improved pattern matching algorithm for strings in terms of straight-line programs

[...]

Masamichi Miyazaki, Ayumi Shinohara¹, Masayuki Takeda¹, 正路宮崎, 歩篠原, 正幸竹田 - Show less +2 more•Institutions (1)

Kyushu University¹

23 Jan 1997

92 citations

Journal Article•

An Efficient Pattern-Matching Algorithm for Strings with Short Descriptions.

[...]

Marek Karpinski, Wojciech Rytter, Ayumi Shinohara

01 Jan 1997-Nordic Journal of Computing

TL;DR: An O(n 4 log n) time algorithm is shown for the pattern matching problem for strings which are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation.

...read moreread less

Abstract: We investigate the time complexity of the pattern matching problem for strings which are succinctly described in terms of straight-line programs, in which the constants are symbols and the only operation is the concatenation. Most strings of descriptive size n are of exponential length with respect to n. We show an O(n 4 log n) time algorithm for this problem. The crucial point in our algorithm is the succinct representation of all periods of a (possibly long) string described in this manner. We also show a (rather straightforward) result that a very simple extension of the pattern-matching problem for shortly described strings is NP-complete.

...read moreread less

91 citations

Patent•

Modifying an input string partitioned in accordance with directionality and length constraints

[...]

Lauri Karttunen¹•Institutions (1)

Xerox¹

16 May 1997

TL;DR: In this paper, a processor implemented method of modifying a string of a regular language, which includes at least two symbols and two predetermined substrings, was described, and the processor then replaced the matching substring with the string of the lower language associated with the selected preselected substrings and outputs the modified string.

...read moreread less

Abstract: A processor implemented method of modifying a string of a regular language, which includes at least two symbols and at least two predetermined substrings. Upon receipt of the string, the processor determines an initial position within the string of a substring matching one of the preselected substrings. To make this determination, the processor either matches symbols of the string starting from the left and proceeding to the right or by starting from the right and proceeding to the left. After identifying the initial position, the processor then selects either the longest or the shortest of the preselected substrings. The processor then replaces the matching substring with the string of the lower language associated with the selected preselected substring and outputs the modified string.

...read moreread less

86 citations

Patent•

Data compression and decompression system with immediate dictionary updating interleaved with string search

[...]

Albert B. Cooper¹, Terry A. Welch¹, Theresa Raylene Welch¹•Institutions (1)

Unisys¹

23 Jul 1997

TL;DR: In this paper, a dictionary based data compression and decompression system is proposed, where, in the compressor, when a partial string W and a character C are matched in the dictionary, a new string is entered into the dictionary with C as an extension character on the string PW where P is the string corresponding to the last output compressed code signal.

...read moreread less

Abstract: A dictionary based data compression and decompression system where, in the compressor, when a partial string W and a character C are matched in the dictionary, a new string is entered into the dictionary with C as an extension character on the string PW where P is the string corresponding to the last output compressed code signal. An update string is entered into the compression dictionary for each input character that is read and matched. The updating is immediate and interleaved with the character-by-character matching of the current string. The update process continues until the longest match is found in the dictionary. The code of the longest matched string is output in a string matching cycle. If a single character or multi-character string "A" exists in the dictionary, the string AAA . . . A is encoded in two compressed code signals regardless of the string length. This encoding results in an unrecognized code signal at the decompressor. The decompressor, in response to an unrecognized code signal, enters update strings into the decompressor dictionary in accordance with the recovered string corresponding to the previously received code signal, the unrecognized code signal, the extant code of the decompressor and the number of characters in the previously recovered string.

...read moreread less

85 citations

Proceedings Article•DOI•

Applications of approximate word matching in information retrieval

[...]

James C. French¹, Allison L. Powell¹, Eric Schulman²•Institutions (2)

University of Virginia¹, National Radio Astronomy Observatory²

01 Jan 1997

TL;DR: The notion of approximate word matching is introduced and how it can be used to improve detection and categorization of variant forms in bibliographic entries is shown.

...read moreread less

Abstract: As more online databases are integrated into digital libraries, the issue of quality control of the data becomes increasingly important, especially as it relates to the effective retrieval of information. The need to discover and reconcile variant forms of strings in bibliographic entries, i.e., authority work, will become more difficult. Spelling variants, misspellings, and transliteration differences will all increase the difficulty of retrieving information. Approximate string matching has traditionally been used to help with this problem. In this paper we introduce the notion of approximate word matching and show how it can be used to improve detection and categorization of variant forms.

...read moreread less

63 citations

Patent•

High speed data searching for information in a computer system

[...]

Laurance W. Mayer, Daniel S. Spear

30 Jul 1997

TL;DR: In this article, the authors proposed a data matching mechanism for string replication compression using a dictionary of data, which can be used to find a sequence of data in a data buffer (e.g. looking for a particular series of words, letters, or numbers in an online document).

...read moreread less

Abstract: Efficiencies in searching and matching information in a computer system are achieved using embodiments of the invention. The invention can be used, for example, to build and utilize a dictionary of data for string replication compression. The data matching mechanism can also be applied to other situations where it is necessary to find a sequence of data in a data buffer (e.g. looking for a particular series of words, letters, or numbers in an online document). As a result of processing a current string using the data dictionary, it is possible to find a previously-processed dictionary string that has the greatest number of initial characters in common with the current string, and a location at which the current string can be inserted into the dictionary tree. A count field is used to improve the speed of searching for matched strings.

...read moreread less

Music Information Retrieval Using Audio Input

[...]

Lloyd A. Smith¹, Rodger J. McNab, Ian H. Witten¹•Institutions (1)

University of Waikato¹

01 Jan 1997

TL;DR: This paper presents an analysis of the performance of the system using different search criteria involving melodic contour, musical intervals and rhythm; tests were carried out using both exact and approximate string matching.

...read moreread less

Abstract: This paper describes a system designed to retrieve melodies from a database on the basis of a few notes sung into a microphone. The system first accepts acoustic input from the user, transcribes it into common music notation, then searches a database of 9400 folk tunes for those containing the sung pattern, or patterns similar to the sung pattern; retrieval is ranked according to the closeness of the match. The paper presents an analysis of the performance of the system using different search criteria involving melodic contour, musical intervals and rhythm; tests were carried out using both exact and approximate string matching. Approximate matching used a dynamic programming algorithm designed for comparing musical sequences. Current work focuses on developing a faster algorithm.

...read moreread less

Book Chapter•DOI•

Multiple Approximate String Matching

[...]

Ricardo Baeza-Yates¹, Gonzalo Navarro¹•Institutions (1)

University of Chile¹

06 Aug 1997

TL;DR: Two new algorithms for on-line multiple approximate string matching are presented, extensions of previous algorithms that search for a single pattern that partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm.

...read moreread less

Abstract: We present two new algorithms for on-line multiple approximate string matching. These are extensions of previous algorithms that search for a single pattern. The single-pattern version of the first one is based on the simulation with bits of a non-deterministic finite automaton built from the pattern and using the text as input. To search for multiple patterns, we superimpose their automata, using the result as a filter. The second algorithm partitions the pattern in sub-patterns that are searched with no errors, with a fast exact multipattern search algorithm. To handle multiple patterns, we search the sub-patterns of all of them together. The average running time achieved is in both cases O(n) for moderate error level, pattern length and number of patterns. They adapt (with higher costs) to the other cases. However, the algorithms differ in speed and thresholds of usefulness. We analyze theoretically when each algorithm should be used, and show experimentally that they are faster than previous solutions in a wide range of cases.

...read moreread less

Book•

Serial computations of Levenshtein distances

[...]

Daniel S. Hirschberg

29 May 1997

TL;DR: This chapter focuses on the problem of evaluating a longest common subsequence, which is expressively equivalent to the simple form of the Levenshtein distance.

...read moreread less

Abstract: In the previous chapters, we discussed problems involving an exact match of string patterns. We now turn to problems involving similar but not necessarily exact pattern matches. There are a number of similarity or distance measures, and many of them are special cases or generalizations of the Levenshtein metric. The problem of evaluating the measure of string similarity has numerous applications, including one arising in the study of the evolution of long molecules such as proteins. In this chapter, we focus on the problem of evaluating a longest common subsequence, which is expressively equivalent to the simple form of the Levenshtein distance.

...read moreread less

Journal Article•DOI•

String taxonomy using learning automata

[...]

B.J. Oommen¹, E.V. De St. Croix•Institutions (1)

Carleton University¹

01 Apr 1997

TL;DR: This paper presents a learning-automaton based solution to string taxonomy that utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported.

...read moreread less

Abstract: A typical syntactic pattern recognition (PR) problem involves comparing a noisy string with every element of a dictionary, X. The problem of classification can be greatly simplified if the dictionary is partitioned into a set of subdictionaries. In this case, the classification can be hierarchical-the noisy string is first compared to a representative element of each subdictionary and the closest match within the subdictionary is subsequently located. Indeed, the entire problem of subdividing a set of string into subsets where each subset contains "similar" strings has been referred to as the "String Taxonomy Problem". To our knowledge there is no reported solution to this problem. In this paper we present a learning-automaton based solution to string taxonomy. The solution utilizes the Object Migrating Automaton the power of which in clustering objects and images has been reported. The power of the scheme for string taxonomy has been demonstrated using random string and garbled versions of string representations of fragments of macromolecules.

...read moreread less

Proceedings Article•

Learning String Edit Distance

[...]

Eric Sven Ristad¹, Peter N. Yianilos¹•Institutions (1)

Princeton University¹

08 Jul 1997

TL;DR: In this paper, a stochastic model for string-edit distance is proposed, which is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

...read moreread less

Abstract: In many applications, it is necessary to determine the similarity of two strings. A widely-used notion of string similarity is the edit distance: the minimum number of insertions, deletions, and substitutions required to transform one string into the other. In this report, we provide a stochastic model for string-edit distance. Our stochastic model allows us to learn a string-edit distance function from a corpus of examples. We illustrate the utility of our approach by applying it to the difficult problem of learning the pronunciation of words in conversational speech. In this application, we learn a string-edit distance with nearly one-fifth the error rate of the untrained Levenshtein distance. Our approach is applicable to any string classification problem that may be solved using a similarity function against a database of labeled prototypes.

...read moreread less

Proceedings Article•DOI•

Matching for run-length encoded strings

[...]

Alberto Apostolico¹, Gad M. Landau², Steven Skiena³•Institutions (3)

Purdue University¹, University of Haifa², State University of New York System³

11 Jun 1997-Sequence

TL;DR: This work considers the problem of finding the longest common subsequence of two strings, and develops significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems.

...read moreread less

Abstract: Measuring the similarity between two strings, through such standard measures as Hamming distance, edit distance, and longest common subsequence, is one of the fundamental problems in pattern matching. We consider the problem of finding the longest common subsequence of two strings. A well-known dynamic programming algorithm computes the longest common subsequence of strings X and Y in O(|X|/spl middot/|Y|) time. We develop significantly faster algorithms for a special class of strings which emerge frequently in pattern matching problems. A string S is run-length encoded if it is described as an ordered sequence of pairs (/spl sigma/,i), each consisting of an alphabet symbol /spl sigma/ and an integer i. Each pair corresponds to a run in S consisting of i consecutive occurrences of /spl sigma/. For example, the string aaaabbbbcccabbbbcc can be encoded as a/sup 4/b/sup 4/c/sup 3/a/sup 1/b/sup 4/c/sup 2/. Such a run-length encoded string can be significantly shorter than the expanded string representation. Indeed, runlength coding serves as a popular image compression technique, since many classes of images, such as binary images in facsimile transmission, typically contain large patches of identically-valued pixels.

...read moreread less

Patent•

System and method for computer-aided heuristic adaptive attribute matching

[...]

Robert Walter Schreiber¹•Institutions (1)

IBM¹

10 Nov 1997

TL;DR: In this paper, a system and method for computer-aided heuristic adaptive attribute matching is described, which comprises a server for receiving a status message and for further processing of the status message according to the following steps: (i) preparing a candidate list of the candidates; (ii) searching a search list of search attributes; (iii) eliminating non-matching candidates; and (iv) selecting a matching candidate.

...read moreread less

Abstract: A system and method for computer-aided heuristic adaptive attribute matching are disclosed. A system for computer-aided heuristic adaptive attribute matching comprises a server for receiving a status message and for further processing of the status message according to the following steps: (i) preparing a candidate list of the candidates; (ii) preparing a search list of search attributes; (iii) eliminating non-matching candidates; and, (iv) selecting a matching candidate. A method for computer-aided heuristic adaptive attribute matching in accordance with the invention comprises four steps. Those steps are: (1) preparing a candidate list comprising a plurality of candidates; (2) preparing a search list comprising at least one search attribute; (3) fuzzy matching at least one known attribute to the search attribute responsive to more than one candidate existing; and (4) returning a result of the fuzzy matching.

...read moreread less

Proceedings Article•DOI•

RMESH algorithms for parallel string matching

[...]

Hsi-Chieh Lee¹, Fikret Ercal•Institutions (1)

Yuan Ze University¹

18 Dec 1997

TL;DR: Three algorithms for string matching on reconfigurable mesh architectures are presented and the first algorithm finds the exact matching between T and P in O(1) time on a 2-dimensional RMESH of size (n-m+1)/spl times/m.

...read moreread less

Abstract: String matching problem received much attention over the years due to its importance in various applications such as text/file comparison, DNA sequencing, search engines, and spelling correction. Especially with the introduction of search engines dealing with tremendous amount of textual information presented on the world wide web and the research on DNA sequencing, this problem deserves special attention and any algorithmic or hardware improvements to speed up the process will benefit these important applications. In this paper, we present three algorithms for string matching on reconfigurable mesh architectures. Given a text T of length n and a pattern P of length m, the first algorithm finds the exact matching between T and P in O(1) time on a 2-dimensional RMESH of size (n-m+1)/spl times/m. The second algorithm finds the approximate matching between T and P in O(k) time on a 2D RMESH, where k is the maximum edit distance between T and P. The third algorithm allows only the replacement operation in the calculation of the edit distance and finds an approximate matching between T and P in constant-time on a 3D RMESH.

...read moreread less

Proceedings Article•DOI•

[...]

Wladimir Rodriguez¹, Abraham Kandel, Horst Bunke•Institutions (1)

University of South Florida¹

01 Jul 1997

TL;DR: A new approach to measuring the similarity of 3D curves is presented, based on an extension of the classical string edit distance that allows the possibility to use strings, where each element can be a vector rather than a single symbol.

...read moreread less

Abstract: In this paper a new approach to measuring the similarity of 3D curves is presented. This approach is based on an extension of the classical string edit distance in two ways. The first extension is the possibility to use strings, where each element can be a vector rather than a single symbol, while the second extension is the use of fuzzy set based cost functions in the edit distance computation. These two extensions allow us to tackle various problems, that can't be solved by means of "classical" string edit distance.

...read moreread less

Journal Article•DOI•

K-M-P string matching revisited

[...]

Edward M. Reingold¹, Kenneth J. Urban¹, David Gries²•Institutions (2)

University of Illinois at Urbana–Champaign¹, Cornell University²

15 Dec 1997-Information Processing Letters

TL;DR: A simple recursive, memoized version of the Knuth-Morris-Pratt string matching algorithm is given, along with a proof of correctness and worst-case analysis.

...read moreread less

Proceedings Article•

Simulation of NFA in Approximate String and Sequence Matching.

[...]

Jan Holub

01 Jan 1997

TL;DR: In this article, a detailed description of simulation of non-deterministic finite automata (NFA) for approximate string matching using bit parallelism is presented. And the modi cation of ShiftOr algorithm is designed using generalized Levenshtein distance and modi-cation for exact and approximate sequence matching.

...read moreread less

Abstract: We present detailed description of simulation of nondeterministic nite automata (NFA) for approximate string matching. This simulation uses bit parallelism and used algorithm is called Shift-Or algorithm. Using knowledge of simulation of NFA by Shift-Or algorithm we design modi cation of ShiftOr algorithm for approximate string matching using generalized Levenshtein distance and modi cation for exact and approximate sequence matching.

...read moreread less

Book Chapter•DOI•

Efficient Algorithms for Approximate String Matching with Swaps (Extended Abstract)

[...]

Jee Soo Lee¹, Jee Soo Lee², Dong Kyue Kim², Kunsoo Park², Yookun Cho² - Show less +1 more•Institutions (2)

Korea National Open University¹, Seoul National University²

30 Jun 1997

TL;DR: This paper includes the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and presents an O(t min(m,n)-time algorithm for the extended edit distance problem, where t is the edit distance between the given strings.

...read moreread less

Abstract: Most research on the edit distance problem and the k-differences problem considered the set of edit operations consisting of changes, deletions, and insertions. In this paper we include the swap operation that interchanges two adjacent characters into the set of allowable edit operations, and we present an O(t min(m,n))-time algorithm for the extended edit distance problem, where t is the edit distance between the given strings, and an O(kn)-time algorithm for the extended k-differences problem. That is, we add swaps into the set of edit operations without increasing the time complexities of previous algorithms that consider only changes, deletions, and insertions for the edit distance and k-differences problems.

...read moreread less

Proceedings Article•DOI•

An approximate string match for garbled text with various accuracy

[...]

A. Takasu

18 Aug 1997

TL;DR: A fast approximate string matching method to use a portion of characters of a word and a distance pattern in order to use current index techniques and achieves high recall even for the poorly recognized texts.

...read moreread less

Abstract: This paper presents a fast approximate string matching method. In constructing information spaces such as digital libraries, we have to collect vast amount of information and convert it into uniformly organized data. Since much of the information must be converted from various media automatically, the space contains garbled text with various accuracy. For utilizing these texts, we need to satisfy the three requirements, i.e., high recall, high precision and fast matching process. In order to satisfy these requirements, we have been developing a two-phase matching system. The presented method is used for fast and high recall candidate word selection in the first phase. The key idea of the method is to use a portion of characters of a word and a distance pattern in order to use current index techniques. By experiments, we confirm that the presented method achieves high recall even for the poorly recognized texts.

...read moreread less

Book Chapter•DOI•

Algorithms on Strings, Trees, and Sequences: Core String Edits, Alignments, and Dynamic Programming

[...]

Dan Gusfield

01 Jan 1997

Proceedings Article•DOI•

A VLSI architecture for computing the optimal correspondence of string subsequences

[...]

N. Ranganathan¹, R. Motamarri•Institutions (1)

University of South Florida¹

20 Oct 1997

TL;DR: The systolic solution for approximate string matching is modified and extended for the OCS problem in this paper and the architecture presented here can also be used to determine the minimum edit distance, the Longest Common Subsequence (LCS) and its length.

...read moreread less

Abstract: The string matching problem arises in many fields of text analysis, image analysis and speech recognition. The computationally intensive nature of string matching makes it a candidate for VLSI implementation. Most of the existing algorithms and architectures for string matching consider strings that are from a finite alphabet set. The Optimal Correspondence of String Subsequence (OCS) problem, on the other hand, considers strings from an infinite alphabet set. This paper describes the design of a linear systolic array VLSI architecture for the OCS problem. The systolic solution for approximate string matching is modified and extended for the OCS problem in this paper. The architecture presented here can also be used to determine the minimum edit distance, the Longest Common Subsequence (LCS) and its length. The systolic architecture was simulated and verified using the Cadence design tools.

...read moreread less

Journal Article•DOI•

An approximate string matching algorithm for on-line Chinese character recognition

[...]

Derek Pao¹, M.C. Sun, Murphy C.H. Lam•Institutions (1)

City University of Hong Kong¹

01 Sep 1997-Image and Vision Computing

TL;DR: An effective string match algorithm has been developed which can tolerate common types of distortions, e.g. connected strokes, missing/extra strokes, and variations in writing sequence.

...read moreread less

Book Chapter•DOI•

Algorithms on Strings, Trees, and Sequences: Inexact Matching, Sequence Alignment, Dynamic Programming

[...]

Dan Gusfield

01 Jan 1997

Proceedings Article•

A New Family of String Pattern Matching Algorithms.

[...]

Bruce W. Watson, Richard E. Watson

01 Jan 1997

TL;DR: In this paper, a new family of single keyword pattern matching algorithms is presented, which can be used to do a minimal number of match attempts within the input string (by maintaining as much information as possible from each match attempt).

...read moreread less

Abstract: Even though the field of pattern matching has been well studied, there are still many interesting algorithms to be discovered. In this paper, we present a new family of single keyword pattern matching algorithms. We begin by deriving a common ancestor algorithm, which na¨ývely solves the problem. Through a series of correctness preserving predicate strengthenings, and implementation choices, we derive efficient variants of this algorithm. This paper also presents one of the first algorithms which could be used to do a minimal number of match attempts within the input string (by maintaining as much information as possible from each match attempt). Keywords: Single keyword pattern matching, Shift distances, Match attempts, Reusing match information, Predicate strengthening and weakening, D.1.4, E.1, F.2.2, G.2.2

...read moreread less

Journal Article•DOI•

Hybrid Interpretation of 'No Match' and 'Multiple Match' in Induction

[...]

Xindong Wu¹•Institutions (1)

Monash University¹

01 Jan 1997-The Computer Journal

TL;DR: This paper describes the fuzzy matching techniques implemented in the HCV (Version 2.0) software, and presents a hybrid interpretation mechanism which combines fuzzy matching with probability estimation.

...read moreread less

Abstract: When applying rules produced by induction from training examples to a test example, there are three possible cases that demand different actions: (i) no match; (ii) single match; and (iii) multiple match. Existing techniques for dealing with the first and third cases are exclusively based on probability estimation. However, when there are continuous attributes in the example space, and if these attributes have been discretized into intervals before induction, fuzzy interpretation of the discretized intervals at deduction time could be very valuable. This paper describes the fuzzy matching techniques implemented in the HCV (Version 2.0) software, and presents a hybrid interpretation mechanism which combines fuzzy matching with probability estimation. Experimental results of the HCV (Version 2.0) software with different interpretation techniques are provided on a number of data sets from the University of California at Irvine Repository of Machine Learning Databases.

...read moreread less