scispace - formally typeset
Search or ask a question
Topic

Approximate string matching

About: Approximate string matching is a research topic. Over the lifetime, 1903 publications have been published within this topic receiving 62352 citations. The topic is also known as: fuzzy string-searching algorithm & fuzzy string-matching algorithm.


Papers
More filters
Patent
Akagi Takuma1
31 Jul 2000
TL;DR: In this article, the authors compare each character of a first character string with each characters of a second character string, vote for a matrix having two sides corresponding to the characters of the first character strings and the characters from the second character strings, and calculate values of the voting result for respective components arranged in an oblique direction of the matrix.
Abstract: This invention is to compare each character of a first character string with each character of a second character string, vote for a matrix having two sides corresponding to the characters of the first character string and the characters of the second character string and calculate values of the voting result for respective components arranged in an oblique direction of the matrix The matching result is determined based on the calculated values of the voting result As a result, a high-speed and highly precise matching process which is noise-resistant and takes the character arrangement into consideration can be attained

14 citations

Journal ArticleDOI
TL;DR: The Crochemore-Perrin constant-space O(n)-time string-matching algorithm is extended to run in optimal O( n/@a) time and even in real-time, achieving a factor @a speedup over traditional algorithms that examine each character individually.

14 citations

Proceedings ArticleDOI
10 Jul 2011
TL;DR: An extension to widely used ASM algorithms is proposed to detect the name aliases that generate as a result of transliteration and the experimental evaluation shows that proposed extension increases the accuracy of the basic algorithms to a considerable level.
Abstract: This paper focuses on the problem of alias detection based on orthographic variations of Arabic names. Alias detection is the process to identify different variants of the same name. To detect aliases based on orthographic variations, the approximate string matching (ASM) algorithms are widely used that measure the similarities between two strings (i.e., the name and alias). ASM algorithms work well to detect various type of orthographic variations but still there is a need to develop techniques to detect correct aliases of Arabic names that occur due to the translation of Arabic names into English. An extension to widely used ASM algorithms is proposed to detect the name aliases that generate as a result of transliteration. This paper aims to improve the accuracy of the basic ASM algorithms in order to detect correct aliases. The experimental evaluation shows that proposed extension increases the accuracy of the basic algorithms to a considerable level.

14 citations

Journal ArticleDOI
TL;DR: Harry is a small tool specifically designed for measuring the similarity of strings and implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel.
Abstract: Comparing strings and assessing their similarity is a basic operation in many application domains of machine learning, such as in information retrieval, natural language processing and bioinformatics. The practitioner can choose from a large variety of available similarity measures for this task, each emphasizing different aspects of the string data. In this article, we present Harry, a small tool specifically designed for measuring the similarity of strings. Harry implements over 20 similarity measures, including common string distances and string kernels, such as the Levenshtein distance and the Subsequence kernel. The tool has been designed with efficiency in mind and allows for multi-threaded as well as distributed computing, enabling the analysis of large data sets of strings. Harry supports common data formats and thus can interface with analysis environments, such as Matlab, Pylab and Weka.

14 citations


Network Information
Related Topics (5)
Server
79.5K papers, 1.4M citations
81% related
Cluster analysis
146.5K papers, 2.9M citations
80% related
Scheduling (computing)
78.6K papers, 1.3M citations
79% related
Network packet
159.7K papers, 2.2M citations
78% related
Optimization problem
96.4K papers, 2.1M citations
78% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
20238
202230
202132
202030
201948
201839