Semantic text similarity using corpus-based word similarity and string similarity

doi:10.1145/1376815.1376819

Journal ArticleDOI

Semantic text similarity using corpus-based word similarity and string similarity

Aminul Islam, +1 more

- 24 Jul 2008 -

ACM Transactions on Knowledge Discovery ...

- Vol. 2, Iss: 2, pp 10

TLDR

A method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence string matching algorithm is presented.

Abstract:

We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.

Citations

PDF

Open Access

More filters

Journal ArticleDOI

A Survey of Text Similarity Approaches

Wael Hassan Gomaa, +1 more

- 18 Apr 2013 -

International Journal of Computer Applic...

TL;DR: This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities, and samples of combination between these similarities are presented.

...read moreread less

Proceedings ArticleDOI

Short Text Similarity with Word Embeddings

Tom Kenter, +1 more

TL;DR: This work proposes to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings, and derives multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embedDings.

...read moreread less

Book

语义学引论 = Linguistic Semantics

John Lyons, +1 more

Proceedings Article

TakeLab: Systems for Measuring Semantic Text Similarity

Frane Šarić, +4 more

TL;DR: The two systems for determining the semantic similarity of short texts submitted to the SemEval 2012 Task 6 ranked in the top 5, for the three overall evaluation metrics used.

...read moreread less

Proceedings Article

Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

Mohammad Taher Pilehvar, +2 more

TL;DR: This work presents a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents, and leverages a common probabilistic representation over word senses in order to compare different types of linguistic data.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Citations

A Survey of Text Similarity Approaches

Short Text Similarity with Word Embeddings

语义学引论 = Linguistic Semantics

TakeLab: Systems for Measuring Semantic Text Similarity

Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

References

Bleu: a Method for Automatic Evaluation of Machine Translation

WordNet : an electronic lexical database

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

Introduction to WordNet: An On-line Lexical Database

An introduction to latent semantic analysis

Related Papers (5)

Verb semantics and lexical selection

An introduction to latent semantic analysis

An Information-Theoretic Definition of Similarity

WordNet: a lexical database for English

WordNet : an electronic lexical database