A Survey of Text Similarity Approaches
Citations
1 citations
1 citations
1 citations
1 citations
1 citations
Cites background or methods from "A Survey of Text Similarity Approac..."
...• Overlap coefficient is very similar to Dice’s coefficient; however, if one document is a subset of the other document, we will consider the similarity as a full match [41]....
[...]
...In addition, there are character-based similarity approaches such as Longest Common SubString (LCS) algorithm which considers the similarity between two strings as the length of contiguous chain of characters that are common in both strings, or N-grams where the similarity is defined as the count of the common N-grams in two strings over the maximal number of the Ngrams in two strings [9, 41]....
[...]
...• Dice’s coefficient is computed as twice the number of common terms in two documents over the total number of terms in both documents [30, 41]....
[...]
...• Jaccard similarity is defined as the number of common terms divided by the number of the unique terms in the documents [41, 55]...
[...]
...• Matching coefficient is a vector-based scheme where we count the number of similar terms in the documents where both document vectors are non-zero [41]....
[...]
References
13,049 citations
11,844 citations
10,500 citations
"A Survey of Text Similarity Approac..." refers background in this paper
...Dice’s coefficient is defined as twice the number of common terms in the compared strings divided by the total number of terms in both strings [11]....
[...]
10,262 citations
"A Survey of Text Similarity Approac..." refers background in this paper
...It is useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context [8]....
[...]
6,014 citations
"A Survey of Text Similarity Approac..." refers methods in this paper
...The GLSA approach can combine any kind of similarity measure on the space of terms with any suitable method of dimensionality reduction....
[...]
...LSA assumes that words that are close in meaning will occur in similar pieces of text....
[...]
...Latent Semantic Analysis (LSA) [15] is the most popular technique of Corpus-Based similarity....
[...]
...Generalized Latent Semantic Analysis (GLSA) [16] is a framework for computing semantically motivated term and document vectors....
[...]
...Mining the web for synonyms: PMIIR versus LSA on TOEFL....
[...]