scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Proceedings ArticleDOI
08 Jul 2009
TL;DR: In trials on twenty-five instances of the closest string problem with alphabets ranging is size from 2 to 30, the algorithm that used the data-based representation of candidate strings consistently returned the best results, and its advantage increased with the sizes of the test instances' alphABets.
Abstract: Given a set of strings S of equal lengths over an alphabet σ, the closest string problem seeks a string over σ whose maximum Hamming distance to any of the given strings is as small as possible. A data-based coding of strings for evolutionary search represents candidate closest strings as sequences of indexes of the given strings. The string such a chromosome represents consists of the symbols in the corresponding positions of the indexed strings.A genetic algorithm using this coding was compared with two GAs that encoded candidate strings directly as strings over σ. In trials on twenty-five instances of the closest string problem with alphabets ranging is size from 2 to 30, the algorithm that used the data-based representation of candidate strings consistently returned the best results, and its advantage increased with the sizes of the test instances' alphabets.

11 citations

Proceedings ArticleDOI
11 Mar 2002
TL;DR: This work investigates distributed matchmaking within an multi-agent system in which agents communicate in a peer-to-peer fashion with a limited set of neighbors and shows that their support is proportional to the spread of categories tolerable.
Abstract: We investigate distributed matchmaking within an multi-agent system in which agents communicate in a peer-to-peer fashion with a limited set of neighbors. We compare the performance of a system with synchronized time to that of systems using several different models of continuous time. We find little difference between the two, indicating that the ordering of events does not play a part in computation. We also compare a system in which matches are made deterministically between discrete task categories to one in which task matches are made non-deterministically between continuous task categories. We consider several possible matching functions and show that their support is proportional to the spread of categories tolerable. This holds for matching probabilities as low as 0.01. We further show that the matching function's 'height' relates to the speed at which the system finds matches. For instance, we show that for a triangular matching function, doubling the probability of each service matching results in about a 1.6 times speedup.

11 citations

Book ChapterDOI
20 Oct 2007
TL;DR: An action analysis method based on robust string matching using dynamic programming that allows for large pose and appearance changes, is robust to background clutter, and can accommodate spatio-temporal behavior variations amongst different subjects while achieving high discriminability between different behaviors is presented.
Abstract: This paper presents an action analysis method based on robust string matching using dynamic programming. Similar to matching text sequences, atomic actions based on semantic and structural features are first detected and coded as spatio-temporal characters or symbols. These symbols are subsequently concatenated to form a unique set of strings for each action. A similarity metric using longest common subsequence algorithm is employed to robustly match action strings with variable length. A dynamic programming method with polynomial computational complexity and linear space complexity is implemented. An effective learning scheme based on similarity metric embedding is developed to deal with matching strings of variable length. Our proposed method works with limited amount of training data and exhibits desirable generalization property. Moreover, it can be naturally extended to detect compound behaviors and events. Experimental evaluation on our own and a commonly used data set demonstrates that our method allows for large pose and appearance changes, is robust to background clutter, and can accommodate spatio-temporal behavior variations amongst different subjects while achieving high discriminability between different behaviors.

11 citations

Journal ArticleDOI
TL;DR: It is proved that exact parameterized matching on trees can be computed in linear time for alphabets in an O-size integer range, and in time O(nlogm) in general, where n is the tree size and m the pattern length, and these bounds are optimal in the comparison model.

11 citations

Book ChapterDOI
TL;DR: Two efficient approximate techniques for measuring dissimilarities between cyclic patterns are presented, inspired on the quadratic time algorithm proposed by Bunke and Buhler, achieving even more accurate solutions.
Abstract: Two efficient approximate techniques for measuring dissimilarities between cyclic patterns are presented. They are inspired on the quadratic time algorithm proposed by Bunke and Buhler. The first technique completes pseudoalignments built by the Bunke and Buhler algorithm (BBA), obtaining full alignments between cyclic patterns. The edit cost of the minimum-cost alignment is given as an upper-bound estimation of the exact cyclic edit distance, which results in a more accurate bound than the lower one obtained by BBA. The second technique uses both bounds to compute a weighted average, achieving even more accurate solutions. Weights come from minimizing the sum of squared relative errors with respect to exact distance values on a training set of string pairs. Experiments were conducted on both artificial and real data, to demonstrate the capabilities of new techniques in both accurateness and quadratic computing time.

11 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839