scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Proceedings ArticleDOI
Chao Jiang1, Mounica Maddela, Wuwei Lan1, Yang Zhong1, Wei Xu1 
05 May 2020
TL;DR: This paper proposed a neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents, but also utilizes a neural sentence pair model to capture semantic similarity for text simplification.
Abstract: The success of a text simplification system heavily depends on the quality and quantity of complex-simple sentence pairs in the training corpus, which are extracted by aligning sentences between parallel articles. To evaluate and improve sentence alignment quality, we create two manually annotated sentence-aligned datasets from two commonly used text simplification corpora, Newsela and Wikipedia. We propose a novel neural CRF alignment model which not only leverages the sequential nature of sentences in parallel documents but also utilizes a neural sentence pair model to capture semantic similarity. Experiments demonstrate that our proposed approach outperforms all the previous work on monolingual sentence alignment task by more than 5 points in F1. We apply our CRF aligner to construct two new text simplification datasets, Newsela-Auto and Wiki-Auto, which are much larger and of better quality compared to the existing datasets. A Transformer-based seq2seq model trained on our datasets establishes a new state-of-the-art for text simplification in both automatic and human evaluation.

96 citations

Journal ArticleDOI
TL;DR: The data suggest that the visual world paradigm can, together with other methodologies, converge on the evidence that may help adjudicate between different theoretical accounts of the psychological semantics, and provide further evidence that language-mediated eye movements to objects in the concurrent visual environment are driven by semantic similarity rather than all-or-none categorical knowledge.

96 citations

Journal ArticleDOI
TL;DR: An R-based software package that can be used to detect disease-driven gene modules, and to annotate the modules for functions and pathways, and can reflect the modular characteristic of disease related genes and promote the understanding of the complex pathogenesis of diseases.
Abstract: The construction of the Disease Ontology (DO) has helped promote the investigation of diseases and disease risk factors. DO enables researchers to analyse disease similarity by adopting semantic similarity measures, and has expanded our understanding of the relationships between different diseases and to classify them. Simultaneously, similarities between genes can also be analysed by their associations with similar diseases. As a result, disease heterogeneity is better understood and insights into the molecular pathogenesis of similar diseases have been gained. However, bioinformatics tools that provide easy and straight forward ways to use DO to study disease and gene similarity simultaneously are required. We have developed an R-based software package (DOSim) to compute the similarity between diseases and to measure the similarity between human genes in terms of diseases. DOSim incorporates a DO-based enrichment analysis function that can be used to explore the disease feature of an independent gene set. A multilayered enrichment analysis (GO and KEGG annotation) annotation function that helps users explore the biological meaning implied in a newly detected gene module is also part of the DOSim package. We used the disease similarity application to demonstrate the relationship between 128 different DO cancer terms. The hierarchical clustering of these 128 different cancers showed modular characteristics. In another case study, we used the gene similarity application on 361 obesity-related genes. The results revealed the complex pathogenesis of obesity. In addition, the gene module detection and gene module multilayered annotation functions in DOSim when applied on these 361 obesity-related genes helped extend our understanding of the complex pathogenesis of obesity risk phenotypes and the heterogeneity of obesity-related diseases. DOSim can be used to detect disease-driven gene modules, and to annotate the modules for functions and pathways. The DOSim package can also be used to visualise DO structure. DOSim can reflect the modular characteristic of disease related genes and promote our understanding of the complex pathogenesis of diseases. DOSim is available on the Comprehensive R Archive Network (CRAN) or http://bioinfo.hrbmu.edu.cn/dosim .

96 citations

Journal ArticleDOI
TL;DR: A new similarity measure is defined that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy to re-weight term frequency in the VSM.
Abstract: This paper presents a new knowledge-based vector space model (VSM) for text clustering. In the new model, semantic relationships between terms (e.g., words or concepts) are included in representing text documents as a set of vectors. The idea is to calculate the dissimilarity between two documents more effectively so that text clustering results can be enhanced. In this paper, the semantic relationship between two terms is defined by the similarity of the two terms. Such similarity is used to re-weight term frequency in the VSM. We consider and study two different similarity measures for computing the semantic relationship between two terms based on two different approaches. The first approach is based on the existing ontologies like WordNet and MeSH. We define a new similarity measure that combines the edge-counting technique, the average distance and the position weighting method to compute the similarity of two terms from an ontology hierarchy. The second approach is to make use of text corpora to construct the relationships between terms and then calculate their semantic similarities. Three clustering algorithms, bisecting k-means, feature weighting k-means and a hierarchical clustering algorithm, have been used to cluster real-world text data represented in the new knowledge-based VSM. The experimental results show that the clustering performance based on the new model was much better than that based on the traditional term-based VSM.

95 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787