scispace - formally typeset
Search or ask a question
Topic

Semantic similarity

About: Semantic similarity is a research topic. Over the lifetime, 14605 publications have been published within this topic receiving 364659 citations. The topic is also known as: semantic relatedness.


Papers
More filters
Journal ArticleDOI
TL;DR: This work considered the impact of number of features, number of senses, semantic neighborhood density, imageability, and body–object interaction across five visual word recognition tasks: standard lexical decision, go/no-go lexical decided, speeded pronunciation, progressive demasking, and semantic classification.
Abstract: There is considerable evidence (e.g., Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008) that semantically rich words, which are associated with relatively more semantic information, are recognized faster across different lexical processing tasks. The present study extends this earlier work by providing the most comprehensive evaluation to date of semantic richness effects on visual word recognition performance. Specifically, using regression analyses to control for the influence of correlated lexical variables, we considered the impact of contextual dispersion, number of features, number of senses, semantic neighborhood density, imageability, and body-object interaction across five visual word recognition tasks: standard lexical decision, go/no-go lexical decision, speeded pronunciation, semantic classification, and progressive demasking. Semantic richness effects could be reliably detected in all tasks of lexical processing, indicating that semantic representations, particularly their imaginal and featural aspects, play a fundamental role in visual word recognition. However, there was also evidence that the strength of certain richness effects could be flexibly and adaptively modulated by task demands, consistent with an intriguing interplay between task-specific mechanisms and differentiated semantic processing.

124 citations

Proceedings ArticleDOI
16 May 2016
TL;DR: A unified probabilistic generative model, User-Community-Geo-Topic (UCGT), is proposed to simulate the generative process of communities as a result of network proximities, spatiotemporal co-occurrences and semantic similarity.
Abstract: Social community detection is a growing field of interest in the area of social network applications, and many approaches have been developed, including graph partitioning, latent space model, block model and spectral clustering. Most existing work purely focuses on network structure information which is, however, often sparse, noisy and lack of interpretability. To improve the accuracy and interpretability of community discovery, we propose to infer users' social communities by incorporating their spatiotemporal data and semantic information. Technically, we propose a unified probabilistic generative model, User-Community-Geo-Topic (UCGT), to simulate the generative process of communities as a result of network proximities, spatiotemporal co-occurrences and semantic similarity. With a well-designed multi-component model structure and a parallel inference implementation to leverage the power of multicores and clusters, our UCGT model is expressive while remaining efficient and scalable to growing large-scale geo-social networking data. We deploy UCGT to two application scenarios of user behavior predictions: check-in prediction and social interaction prediction. Extensive experiments on two large-scale geo-social networking datasets show that UCGT achieves better performance than existing state-of-the-art comparison methods.

124 citations

Book ChapterDOI
30 Aug 2011
TL;DR: A proper metric to quantify process similarity based on behavioral profiles is introduced, grounded in the Jaccard coefficient and leverages behavioral relations between pairs of process model activities.
Abstract: With the increasing influence of Business Process Management, large process model repositories emerged in enterprises and public administrations. Their effective utilization requires meaningful and efficient capabilities to search for models that go beyond text based search or folder navigation, e.g., by similarity. Existing measures for process model similarity are often not applicable for efficient similarity search, as they lack metric features. In this paper, we introduce a proper metric to quantify process similarity based on behavioral profiles. It is grounded in the Jaccard coefficient and leverages behavioral relations between pairs of process model activities. The metric is successfully evaluated towards its approximation of human similarity assessment.

123 citations

Proceedings ArticleDOI
01 Apr 2001
TL;DR: This paper introduces and describes the use of the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit, and presents an algorithm to eAEciently retrieve information units.
Abstract: Since WWW encourages hypertext and hypermedia document authoring (e.g., HTML or XML), Web authors tend to create documents that are composed of multiple pages connected with hyperlinks or frames. A Web document may be authored in multiple ways, such as (1) all information in one physical page, or (2) a main page and the related information in separate linked pages. Existing Web search engines, however, return only physical pages. In this paper, we introduce and describe the use of the concept of information unit, which can be viewed as a logical Web document consisting of multiple physical pages as one atomic retrieval unit. We present an algorithm to eAEciently retrieve information units. Our algorithm can perform progressive query processing over a Web index by considering both document semantic similarity and link structures. Experimental results on synthetic graphs and real Web data show the effectiveness and usefulness of the proposed information unit retrieval technique.

123 citations

Journal ArticleDOI
TL;DR: This work proposes several approaches for sentence‐level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus.
Abstract: Motivation The amount of information available in textual format is rapidly increasing in the biomedical domain. Therefore, natural language processing (NLP) applications are becoming increasingly important to facilitate the retrieval and analysis of these data. Computing the semantic similarity between sentences is an important component in many NLP tasks including text retrieval and summarization. A number of approaches have been proposed for semantic sentence similarity estimation for generic English. However, our experiments showed that such approaches do not effectively cover biomedical knowledge and produce poor results for biomedical text. Methods We propose several approaches for sentence-level semantic similarity computation in the biomedical domain, including string similarity measures and measures based on the distributed vector representations of sentences learned in an unsupervised manner from a large biomedical corpus. In addition, ontology-based approaches are presented that utilize general and domain-specific ontologies. Finally, a supervised regression based model is developed that effectively combines the different similarity computation metrics. A benchmark data set consisting of 100 sentence pairs from the biomedical literature is manually annotated by five human experts and used for evaluating the proposed methods. Results The experiments showed that the supervised semantic sentence similarity computation approach obtained the best performance (0.836 correlation with gold standard human annotations) and improved over the state-of-the-art domain-independent systems up to 42.6% in terms of the Pearson correlation metric. Availability and implementation A web-based system for biomedical semantic sentence similarity computation, the source code, and the annotated benchmark data set are available at: http://tabilab.cmpe.boun.edu.tr/BIOSSES/ . Contact gizemsogancioglu@gmail.com or arzucan.ozgur@boun.edu.tr.

123 citations


Network Information
Related Topics (5)
Web page
50.3K papers, 975.1K citations
84% related
Graph (abstract data type)
69.9K papers, 1.2M citations
84% related
Unsupervised learning
22.7K papers, 1M citations
83% related
Feature vector
48.8K papers, 954.4K citations
83% related
Web service
57.6K papers, 989K citations
82% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023202
2022522
2021641
2020837
2019866
2018787