A Survey of Text Similarity Approaches
Citations
2 citations
2 citations
Cites background from "A Survey of Text Similarity Approac..."
...The use of n-gram frequencies as input features to authorship attribution models has been proposed by References [11, 50] and as a simple method capable of capturing both lexical content and local context....
[...]
...Inverse document frequency, introduced by Reference [57] as term specificity, is a method of weighting term frequency values by the rarity of terms across all documents in the corpus....
[...]
...The origins of this influence are easy to understand when one imagines the challenges of presenting a coherent curriculum of philosophical study to students over a period of 15 or more years of systematic religious training (cf. Reference [10] for an example curriculum)....
[...]
...In Reference [19], a distinction is made between string-based, corpus-based, and knowledge-based similarity metrics....
[...]
...Reference [50] describes a tradeoff in setting the n-gram order, n, which denotes the length of ngrams to examine....
[...]
2 citations
Cites methods from "A Survey of Text Similarity Approac..."
...We choose six most common distancemetrics for entity resolution, including three character-based metrics, which are Q-Gram, Jaro and Levenshtein and three tokenbased metrics, which are Overlap Coefficient, Cosine and Jaccard....
[...]
...We use the Overlap Coefficient [3] to compute the similarity of record pairs, and the similarity result is as follows: the similarities of matching pairs m1 and m2 are 0.9 and 1 respectively, and the similarities of nonmatching pairs n1-n4 are 0.35, 0.33, 0.54 and 0.54, respectively....
[...]
...We use the Overlap Coefficient [3] to compute the similarity of record pairs, and the similarity result is as follows: the similarities of matching pairs m1 and m2 are 0....
[...]
2 citations
Cites background from "A Survey of Text Similarity Approac..."
...Other than PEG, C-rater, Erater, and Latent Semantic Analysis (LSA) are some other applications developed for automatic essay assessment [2][3]....
[...]
2 citations
References
13,049 citations
11,844 citations
10,500 citations
"A Survey of Text Similarity Approac..." refers background in this paper
...Dice’s coefficient is defined as twice the number of common terms in the compared strings divided by the total number of terms in both strings [11]....
[...]
10,262 citations
"A Survey of Text Similarity Approac..." refers background in this paper
...It is useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context [8]....
[...]
6,014 citations
"A Survey of Text Similarity Approac..." refers methods in this paper
...The GLSA approach can combine any kind of similarity measure on the space of terms with any suitable method of dimensionality reduction....
[...]
...LSA assumes that words that are close in meaning will occur in similar pieces of text....
[...]
...Latent Semantic Analysis (LSA) [15] is the most popular technique of Corpus-Based similarity....
[...]
...Generalized Latent Semantic Analysis (GLSA) [16] is a framework for computing semantically motivated term and document vectors....
[...]
...Mining the web for synonyms: PMIIR versus LSA on TOEFL....
[...]