scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Patent
18 Jan 2012
TL;DR: In this article, a method of evaluating a semantic relatedness of terms is proposed, which comprises providing a plurality of text segments, calculating, using a processor, a pluralityof weights each for another of the plurality of texts, calculating a prevalence of a co-appearance of each of the pairs of terms in the plurality, and evaluating the relatedness between members of each pair according to a combination of a respective the prevalence and a weight of each text segment wherein a coappearance occurs.
Abstract: A method of evaluating a semantic relatedness of terms. The method comprises providing a plurality of text segments, calculating, using a processor, a plurality of weights each for another of the plurality of text segments, calculating a prevalence of a co-appearance of each of a plurality of pairs of terms in the plurality of text segments, and evaluating a semantic relatedness between members of each the pair according to a combination of a respective the prevalence and a weight of each of the plurality of text segments wherein a co-appearance of the pair occurs.

130 citations

01 Jan 2002
TL;DR: This thesis presents techniques and algorithms for performing various stages of the reconstruction process automatically of proto-languages from surviving cognates, and introduces a method of identifying cognates directly from the vocabularies of related languages on the basis of phonetic and semantic similarity.
Abstract: Genetically related languages originate from a common proto-language. In the absence of historical records, proto-languages have to be reconstructed from surviving cognates, that is words that existed in the proto-language and are still present in some form in its descendants. The language reconstruction methods have so far been largely based on informal and intuitive criteria. In this thesis, I present techniques and algorithms for performing various stages of the reconstruction process automatically. The thesis is divided into three main parts that correspond to the principal steps of language reconstruction. The first part presents a new algorithm for the alignment of cognates, which is sufficiently general to align any two phonetic strings that exhibit some affinity. The second part introduces a method of identifying cognates directly from the vocabularies of related languages on the basis of phonetic and semantic similarity. The third part describes an approach to the determination of recurrent sound correspondences in bilingual wordlists by inducing models similar to those developed for statistical machine translation. The proposed solutions are firmly grounded in computer science and incorporate recent advances in computational linguistics, articulatory phonetics, and bioinformatics. The applications of the new techniques are not limited to diachronic phonology, but extend to other areas of computational linguistics, such as machine translation.

130 citations

Journal ArticleDOI
TL;DR: This work introduces MedSim, a novel approach for ranking candidate genes for a particular disease based on functional comparisons involving the Gene Ontology, which uses functional annotations of known disease genes for assessing the similarity of diseases as well as the disease relevance of candidate genes.
Abstract: Motivation: Many hereditary human diseases are polygenic, resulting from sequence alterations in multiple genes. Genomic linkage and association studies are commonly performed for identifying disease-related genes. Such studies often yield lists of up to several hundred candidate genes, which have to be prioritized and validated further. Recent studies discovered that genes involved in phenotypically similar diseases are often functionally related on the molecular level. Results: Here, we introduce MedSim, a novel approach for ranking candidate genes for a particular disease based on functional comparisons involving the Gene Ontology. MedSim uses functional annotations of known disease genes for assessing the similarity of diseases as well as the disease relevance of candidate genes. We benchmarked our approach with genes known to be involved in 99 diseases taken from the OMIM database. Using artificial quantitative trait loci, MedSim achieved excellent performance with an area under the ROC curve of up to 0.90 and a sensitivity of over 70% at 90% specificity when classifying gene products according to their disease relatedness. This performance is comparable or even superior to related methods in the field, albeit using less and thus more easily accessible information. Availability: MedSim is offered as part of our FunSimMat web service ( http://www.funsimmat.de). Contact: mario.albrecht@mpi-inf.mpg.de Supplementary information:Supplementary data are available at Bioinformatics online.

130 citations

Journal ArticleDOI
TL;DR: It was found that naming latencies for both object and action words were modulated by the semantic similarity between the exemplars in each block, providing evidence in both domains of graded semantic effects.

129 citations

Book ChapterDOI
11 Oct 2005
TL;DR: The method relies solely on the structure of a conceptual network and eliminates the need for performing additional corpus analysis and can be easily applied to compute semantic relatedness based on alternative conceptual networks, e.g. in the domain of life sciences.
Abstract: We present a new method for computing semantic relatedness of concepts. The method relies solely on the structure of a conceptual network and eliminates the need for performing additional corpus analysis. The network structure is employed to generate artificial conceptual glosses. They replace textual definitions proper written by humans and are processed by a dictionary based metric of semantic relatedness [1]. We implemented the metric on the basis of GermaNet, the German counterpart of WordNet, and evaluated the results on a German dataset of 57 word pairs rated by human subjects for their semantic relatedness. Our approach can be easily applied to compute semantic relatedness based on alternative conceptual networks, e.g. in the domain of life sciences.

129 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787