Approximate String Joins in a Database (Almost) for Free -- Erratum

doi:10.7916/D8M90HHN

Open AccessDOI

Approximate String Joins in a Database (Almost) for Free -- Erratum

TLDR

This paper develops a technique for building approximate string join capabilities on top of commercial databases by exploiting facilities already available in them, and demonstrates experimentally the benefits of the technique over the direct use of UDFs.

About:

The article was published on 2003-01-01 and is currently open access. It has received 543 citations till now. The article focuses on the topics: String (computer science) & Joins.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

Duplicate Record Detection: A Survey

Elmagarmid, +2 more

- 01 Jan 2007 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.

...read moreread less

Journal ArticleDOI

Duplicate Record Detection: A Survey

Ahmed K. Elmagarmid, +2 more

- 01 Jan 2007 -

IEEE Transactions on Knowledge and Data ...

TL;DR: This paper presents an extensive set of duplicate detection algorithms that can detect approximately duplicate records in a database and covers similarity metrics that are commonly used to detect similar field entries.

...read moreread less

Proceedings ArticleDOI

Robust and fast similarity search for moving object trajectories

Lei Chen, +2 more

TL;DR: Analysis and comparison of EDR with other popular distance functions, such as Euclidean distance, Dynamic Time Warping (DTW), Edit distance with Real Penalty (ERP), and Longest Common Subsequences, indicate that EDR is more robust than Euclideans distance, DTW and ERP, and it is on average 50% more accurate than LCSS.

...read moreread less

Proceedings ArticleDOI

Interactive deduplication using active learning

Sunita Sarawagi, +1 more

TL;DR: This work presents the design of a learning-based deduplication system that uses a novel method of interactively discovering challenging training pairs using active learning and investigates various design issues that arise in building a system to provide interactive response, fast convergence, and interpretable output.

...read moreread less

Book

Data Matching: Concepts and Techniques for Record Linkage, Entity Resolution, and Duplicate Detection

Peter Christen

TL;DR: Data matching (also known as record or data linkage, entity resolution, object identification, or field matching) is the task of identifying, matching and merging records that correspond to the same entities from several databases or even within one database as mentioned in this paper.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

Identification of common molecular subsequences.

Temple F. Smith, +1 more

- 25 Mar 1981 -

Journal of Molecular Biology

TL;DR: This letter extends the heuristic homology algorithm of Needleman & Wunsch (1970) to find a pair of segments, one from each of two long sequences, such that there is no other Pair of segments with greater similarity (homology).

...read moreread less

Journal ArticleDOI

A guided tour to approximate string matching

Gonzalo Navarro

- 01 Mar 2001 -

ACM Computing Surveys

TL;DR: This work surveys the current techniques to cope with the problem of string matching that allows errors, and focuses on online searching and mostly on edit distance, explaining the problem and its relevance, its statistical behavior, its history and current developments, and the central ideas of the algorithms.

...read moreread less

Journal ArticleDOI

Approximate string-matching with q -grams and maximal matches

Esko Ukkonen

TL;DR: Two string distance functions that are computable in linear time give a lower bound for the edit distance (in the unit cost model), which leads to fast hybrid algorithms for the edited distance based string matching.

...read moreread less

Proceedings Article

Near Neighbor Search in Large Metric Spaces

Sergey Brin

TL;DR: A data structure to solve the problem of finding approximate matches in a large database called a GNAT { Geometric Near-neighbor Access Tree} is introduced based on the philosophy that the data structure should act as a hierarchical geometrical model of the data as opposed to a simple decomposition of theData that does not use its intrinsic geometry.

...read moreread less

Proceedings Article