scispace - formally typeset
Journal ArticleDOI

Semantic text similarity using corpus-based word similarity and string similarity

TLDR
A method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence string matching algorithm is presented.
Abstract
We present a method for measuring the semantic similarity of texts using a corpus-based measure of semantic word similarity and a normalized and modified version of the Longest Common Subsequence (LCS) string matching algorithm. Existing methods for computing text similarity have focused mainly on either large documents or individual words. We focus on computing the similarity between two sentences or two short paragraphs. The proposed method can be exploited in a variety of applications involving textual knowledge representation and knowledge discovery. Evaluation results on two different data sets show that our method outperforms several competing methods.

read more

Citations
More filters
Journal ArticleDOI

A Survey of Text Similarity Approaches

TL;DR: This survey discusses the existing works on text similarity through partitioning them into three approaches; String-based, Corpus-based and Knowledge-based similarities, and samples of combination between these similarities are presented.
Proceedings ArticleDOI

Short Text Similarity with Word Embeddings

TL;DR: This work proposes to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings, and derives multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embedDings.
Proceedings Article

TakeLab: Systems for Measuring Semantic Text Similarity

TL;DR: The two systems for determining the semantic similarity of short texts submitted to the SemEval 2012 Task 6 ranked in the top 5, for the three overall evaluation metrics used.
Proceedings Article

Align, Disambiguate and Walk: A Unified Approach for Measuring Semantic Similarity

TL;DR: This work presents a unified approach to semantic similarity that operates at multiple levels, all the way from comparing word senses to comparing text documents, and leverages a common probabilistic representation over word senses in order to compare different types of linguistic data.
References
More filters
Proceedings ArticleDOI

Bleu: a Method for Automatic Evaluation of Machine Translation

TL;DR: This paper proposed a method of automatic machine translation evaluation that is quick, inexpensive, and language-independent, that correlates highly with human evaluation, and that has little marginal cost per run.
Journal ArticleDOI

WordNet : an electronic lexical database

Christiane Fellbaum
- 01 Sep 2000 - 
TL;DR: The lexical database: nouns in WordNet, Katherine J. Miller a semantic network of English verbs, and applications of WordNet: building semantic concordances are presented.
Journal ArticleDOI

A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge.

TL;DR: A new general theory of acquired similarity and knowledge representation, latent semantic analysis (LSA), is presented and used to successfully simulate such learning and several other psycholinguistic phenomena.
Journal ArticleDOI

Introduction to WordNet: An On-line Lexical Database

TL;DR: Standard alphabetical procedures for organizing lexical information put together words that are spelled alike and scatter words with similar or related meanings haphazardly through the list.
Journal ArticleDOI

An introduction to latent semantic analysis

TL;DR: The adequacy of LSA's reflection of human knowledge has been established in a variety of ways, for example, its scores overlap those of humans on standard vocabulary and subject matter tests; it mimics human word sorting and category judgments; it simulates word‐word and passage‐word lexical priming data.