scispace - formally typeset
Open AccessJournal Article

Fast approximate string matching with finite automata

Mans Hulden
- 01 Sep 2009 - 
- Iss: 43, pp 57-64
Reads0
Chats0
TLDR
A fast algorithm for finding approximate matches of a string in a finite-state automaton, given some metric of similarity, which can be adapted to use a variety of metrics for determining the distance between two words.
Abstract
We present a fast algorithm for finding approximate matches of a string in a finite-state automaton, given some metric of similarity. The algorithm can be adapted to use a variety of metrics for determining the distance between two words.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book ChapterDOI

HFST—Framework for Compiling and Applying Morphologies

TL;DR: HFST–Helsinki Finite-State Technology offers a path from language descriptions to efficient language applications in key environments and operating systems and provides an opportunity to exchange transducers between different software providers in order to get the best out of each finite-state library.
Proceedings ArticleDOI

Correcting noisy OCR: context beats confusion

John Evershed, +1 more
TL;DR: This work describes a system for automatic post OCR text correction of digital collections of historical texts, which uses a "noisy channel" approach and shows good improvements in word error rate.
Proceedings Article

Arabic Word Generation and Modelling for Spell Checking

TL;DR: This work creates an adequate, open-source and large-coverage word list for Arabic containing 9,000,000 fully inflected surface words and creates a character-based tri-gram language model to approximate knowledge about permissible character clusters in Arabic, creating a novel method for detecting spelling errors.

Finite-State Spell-Checking with Weighted Language and Error Models

TL;DR: This paper uses a freely available open-source implementation of Finnish morphology, made with traditional finite-state morphology tools, and demonstrates rapid building of Northern Sámi and English spell checkers from tools and resources available from the Internet.
Proceedings Article

Effective Spell Checking Methods Using Clustering Algorithms

TL;DR: A novel approach to spell checking using dictionary clustering that combines the application of anomalous pattern initialization and partition around medoids (PAM) and an English misspelling list compiled using real examples extracted from the Birkbeck spelling error corpus is presented.
References
More filters
Journal ArticleDOI

A Formal Basis for the Heuristic Determination of Minimum Cost Paths

TL;DR: How heuristic information from the problem domain can be incorporated into a formal mathematical theory of graph searching is described and an optimality property of a class of search strategies is demonstrated.
Book

The Design and Analysis of Computer Algorithms

TL;DR: This text introduces the basic data structures and programming techniques often used in efficient algorithms, and covers use of lists, push-down stacks, queues, trees, and graphs.
Journal ArticleDOI

Depth-First Search and Linear Graph Algorithms

TL;DR: The value of depth-first search or “backtracking” as a technique for solving problems is illustrated by two examples of an improved version of an algorithm for finding the strongly connected components of a directed graph.
Book

Heuristics : intelligent search strategies for computer problem solving

TL;DR: In this article, the authors present, characterizes and analyzes problem solving strategies that are guided by heuristic information, and characterise and analyze problem-solving strategies with heuristics.
Book

Finite State Morphology

TL;DR: This volume is a practical guide to finite-state theory and the affiliated programming languages lexc and xfst, and readers will learn how to write tokenizers, spelling checkers, and especially morphological analyzer/generators for words in English, French, Finnish, Hungarian and other languages.