Showing papers on "Semantic similarity published in 2011"

PDF

Open Access

Journal Article•DOI•

REVIGO Summarizes and Visualizes Long Lists of Gene Ontology Terms

[...]

Fran Supek, Matko Bošnjak, Nives Škunca, Tomislav Šmuc

18 Jul 2011-PLOS ONE

TL;DR: REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures.

...read moreread less

Abstract: Outcomes of high-throughput biological experiments are typically interpreted by statistical testing for enriched gene functional categories defined by the Gene Ontology (GO). The resulting lists of GO terms may be large and highly redundant, and thus difficult to interpret. REVIGO is a Web server that summarizes long, unintelligible lists of GO terms by finding a representative subset of the terms using a simple clustering algorithm that relies on semantic similarity measures. Furthermore, REVIGO visualizes this non-redundant GO term set in multiple ways to assist in interpretation: multidimensional scaling and graph-based visualizations accurately render the subdivisions and the semantic relationships in the data, while treemaps and tag clouds are also offered as alternative views. REVIGO is freely available at http://revigo.irb.hr/.

...read moreread less

4,919 citations

Proceedings Article•DOI•

A word at a time: computing word relatedness using temporal semantic analysis

[...]

Kira Radinsky¹, Eugene Agichtein², Evgeniy Gabrilovich³, Shaul Markovitch¹•Institutions (3)

Technion – Israel Institute of Technology¹, Emory University², Yahoo!³

28 Mar 2011

TL;DR: This paper proposes a new semantic relatedness model, Temporal Semantic Analysis (TSA), which captures this temporal information in word semantics as a vector of concepts over a corpus of temporally-ordered documents.

...read moreread less

Abstract: Computing the degree of semantic relatedness of words is a key functionality of many language applications such as search, clustering, and disambiguation. Previous approaches to computing semantic relatedness mostly used static language resources, while essentially ignoring their temporal aspects. We believe that a considerable amount of relatedness information can also be found in studying patterns of word usage over time. Consider, for instance, a newspaper archive spanning many years. Two words such as "war" and "peace" might rarely co-occur in the same articles, yet their patterns of use over time might be similar. In this paper, we propose a new semantic relatedness model, Temporal Semantic Analysis (TSA), which captures this temporal information. The previous state of the art method, Explicit Semantic Analysis (ESA), represented word semantics as a vector of concepts. TSA uses a more refined representation, where each concept is no longer scalar, but is instead represented as time series over a corpus of temporally-ordered documents. To the best of our knowledge, this is the first attempt to incorporate temporal evidence into models of semantic relatedness. Empirical evaluation shows that TSA provides consistent improvements over the state of the art ESA results on multiple benchmarks.

...read moreread less

482 citations

Proceedings Article•

Learning Discriminative Projections for Text Similarity Measures

[...]

Wen-tau Yih¹, Kristina Toutanova¹, John Platt¹, Christopher Meek¹•Institutions (1)

Microsoft¹

23 Jun 2011

TL;DR: A novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space, which not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.

...read moreread less

Abstract: Traditional text similarity measures consider each term similar only to itself and do not model semantic relatedness of terms. We propose a novel discriminative training method that projects the raw term vectors into a common, low-dimensional vector space. Our approach operates by finding the optimal matrix to minimize the loss of the pre-selected similarity function (e.g., cosine) of the projected vectors, and is able to efficiently handle a large number of training examples in the high-dimensional space. Evaluated on two very different tasks, cross-lingual document retrieval and ad relevance measure, our method not only outperforms existing state-of-the-art approaches, but also achieves high accuracy at low dimensions and is thus more efficient.

...read moreread less

298 citations

Journal Article•DOI•

Semantically-based crossover in genetic programming: application to real-valued symbolic regression

[...]

Nguyen Quang Uy¹, Nguyen Xuan Hoai, Michael O'Neill¹, Robert I. McKay², Edgar Galván-López¹ - Show less +1 more•Institutions (2)

University College Dublin¹, Seoul National University²

01 Jun 2011-Genetic Programming and Evolvable Machines

TL;DR: This analysis leads to a conclusion that SSC is more constructive and has higher locality than SAC, NSM and SC; it believes these are the main reasons for the improved performance of SSC.

...read moreread less

Abstract: We investigate the effects of semantically-based crossover operators in genetic programming, applied to real-valued symbolic regression problems. We propose two new relations derived from the semantic distance between subtrees, known as semantic equivalence and semantic similarity. These relations are used to guide variants of the crossover operator, resulting in two new crossover operators--semantics aware crossover (SAC) and semantic similarity-based crossover (SSC). SAC, was introduced and previously studied, is added here for the purpose of comparison and analysis. SSC extends SAC by more closely controlling the semantic distance between subtrees to which crossover may be applied. The new operators were tested on some real-valued symbolic regression problems and compared with standard crossover (SC), context aware crossover (CAC), Soft Brood Selection (SBS), and No Same Mate (NSM) selection. The experimental results show on the problems examined that, with computational effort measured by the number of function node evaluations, only SSC and SBS were significantly better than SC, and SSC was often better than SBS. Further experiments were also conducted to analyse the perfomance sensitivity to the parameter settings for SSC. This analysis leads to a conclusion that SSC is more constructive and has higher locality than SAC, NSM and SC; we believe these are the main reasons for the improved performance of SSC.

...read moreread less

259 citations

Journal Article•DOI•

Ontology-based information content computation

[...]

David Sánchez, Montserrat Batet, David Isern

01 Mar 2011-Knowledge Based Systems

TL;DR: This paper analyzes ontology-based approaches for IC computation and proposes several improvements aimed to better capture the semantic evidence modelled in the ontology for the particular concept.

...read moreread less

Abstract: The information content (IC) of a concept provides an estimation of its degree of generality/concreteness, a dimension which enables a better understanding of concept's semantics. As a result, IC has been successfully applied to the automatic assessment of the semantic similarity between concepts. In the past, IC has been estimated as the probability of appearance of concepts in corpora. However, the applicability and scalability of this method are hampered due to corpora dependency and data sparseness. More recently, some authors proposed IC-based measures using taxonomical features extracted from an ontology for a particular concept, obtaining promising results. In this paper, we analyse these ontology-based approaches for IC computation and propose several improvements aimed to better capture the semantic evidence modelled in the ontology for the particular concept. Our approach has been evaluated and compared with related works (both corpora and ontology-based ones) when applied to the task of semantic similarity estimation. Results obtained for a widely used benchmark show that our method enables similarity estimations which are better correlated with human judgements than related works.

...read moreread less

256 citations

Journal Article•DOI•

An ontology-based measure to compute semantic similarity in biomedicine

[...]

Montserrat Batet, David Sánchez, Aida Valls

01 Feb 2011-Journal of Biomedical Informatics

TL;DR: A new measure based on the exploitation of the taxonomical structure of a biomedical ontology is proposed, using SNOMED CT as the input ontology and shows that it outperforms most of the previous measures avoiding, at the same time, some of their limitations.

...read moreread less

239 citations

Book Chapter•DOI•

Linking lexical resources and ontologies on the semantic web with lemon

[...]

John P. McCrae¹, Dennis Spohr¹, Philipp Cimiano¹•Institutions (1)

Citec¹

29 May 2011

TL;DR: It is shown that the adoption of Semantic Web standards can provide added value for lexicon models by supporting a rich axiomatization of linguistic categories that can be used to constrain the usage of the model and to perform consistency checks.

...read moreread less

Abstract: There are a large number of ontologies currently available on the Semantic Web. However, in order to exploit them within natural language processing applications, more linguistic information than can be represented in current Semantic Web standards is required. Further, there are a large number of lexical resources available representing a wealth of linguistic information, but this data exists in various formats and is difficult to link to ontologies and other resources. We present a model we call lemon (Lexicon Model for Ontologies) that supports the sharing of terminological and lexicon resources on the Semantic Web as well as their linking to the existing semantic representations provided by ontologies. We demonstrate that lemon can succinctly represent existing lexical resources and in combination with standard NLP tools we can easily generate new lexica for domain ontologies according to the lemon model. We demonstrate that by combining generated and existing lexica we can collaboratively develop rich lexical descriptions of ontology entities. We also show that the adoption of Semantic Web standards can provide added value for lexicon models by supporting a rich axiomatization of linguistic categories that can be used to constrain the usage of the model and to perform consistency checks.

...read moreread less

229 citations

Proceedings Article•

Learning to Grade Short Answer Questions using Semantic Similarity Measures and Dependency Graph Alignments

[...]

Michael Mohler¹, Razvan Bunescu², Rada Mihalcea¹•Institutions (2)

University of North Texas¹, Ohio University²

19 Jun 2011

TL;DR: This work combines several graph alignment features with lexical semantic similarity measures using machine learning techniques and shows that the student answers can be more accurately graded than if the semantic measures were used in isolation.

...read moreread less

Abstract: In this work we address the task of computerassisted assessment of short student answers. We combine several graph alignment features with lexical semantic similarity measures using machine learning techniques and show that the student answers can be more accurately graded than if the semantic measures were used in isolation. We also present a first attempt to align the dependency graphs of the student and the instructor answers in order to make use of a structural component in the automatic grading of student answers.

...read moreread less

219 citations

Journal Article•DOI•

A Web Search Engine-Based Approach to Measure Semantic Similarity between Words

[...]

Danushka Bollegala¹, Yutaka Matsuo¹, Mitsuru Ishizuka¹•Institutions (1)

University of Tokyo¹

01 Jul 2011-IEEE Transactions on Knowledge and Data Engineering

TL;DR: This work proposes an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words, and proposes a novel pattern extraction algorithm and a pattern clustering algorithm that significantly improves the accuracy in a community mining task.

...read moreread less

Abstract: Measuring the semantic similarity between words is an important component in various tasks on the web such as relation extraction, community mining, document clustering, and automatic metadata extraction. Despite the usefulness of semantic similarity measures in these applications, accurately measuring semantic similarity between two words (or entities) remains a challenging task. We propose an empirical method to estimate semantic similarity using page counts and text snippets retrieved from a web search engine for two words. Specifically, we define various word co-occurrence measures using page counts and integrate those with lexical patterns extracted from text snippets. To identify the numerous semantic relations that exist between two given words, we propose a novel pattern extraction algorithm and a pattern clustering algorithm. The optimal combination of page counts-based co-occurrence measures and lexical pattern clusters is learned using support vector machines. The proposed method outperforms various baselines and previously proposed web-based semantic similarity measures on three benchmark data sets showing a high correlation with human ratings. Moreover, the proposed method significantly improves the accuracy in a community mining task.

...read moreread less

218 citations

A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus.

[...]

Kristina Gulordava¹, Marco Baroni¹•Institutions (1)

University of Trento¹

31 Jul 2011

TL;DR: This paper presents a novel approach for automatic detection of semantic change of words based on distributional similarity models and shows that the method obtains good results with respect to a reference ranking produced by human raters.

...read moreread less

Abstract: This paper presents a novel approach for automatic detection of semantic change of words based on distributional similarity models. We show that the method obtains good results with respect to a reference ranking produced by human raters. The evaluation also analyzes the performance of frequency-based methods, comparing them to the similarity method proposed.

...read moreread less

195 citations

Proceedings Article•DOI•

Visual and semantic similarity in ImageNet

[...]

Thomas Deselaers¹, Vittorio Ferrari¹•Institutions (1)

ETH Zurich¹

20 Jun 2011

TL;DR: The insights gained from analysis enable building a novel distance function between images assessing whether they are from the same basic-level category, which goes beyond direct visual distance as it also exploits semantic similarity measured through ImageNet.

...read moreread less

Abstract: Many computer vision approaches take for granted positive answers to questions such as “Are semantic categories visually separable?” and “Is visual similarity correlated to semantic similarity?”. In this paper, we study experimentally whether these assumptions hold and show parallels to questions investigated in cognitive science about the human visual system. The insights gained from our analysis enable building a novel distance function between images assessing whether they are from the same basic-level category. This function goes beyond direct visual distance as it also exploits semantic similarity measured through ImageNet. We demonstrate experimentally that it outperforms purely visual distances.

...read moreread less

Proceedings Article•DOI•

Towards a theory of semantic communication

[...]

Jie Bao¹, Prithwish Basu², Mike Dean², Craig Partridge², Ananthram Swami³, Will Leland², James A. Hendler¹ - Show less +3 more•Institutions (3)

Rensselaer Polytechnic Institute¹, BBN Technologies², United States Army Research Laboratory³

22 Jun 2011

TL;DR: A model-theoretical approach for semantic data compression and reliable semantic communication is investigated and it is shown that Shannon's source and channel coding theorems have semantic counterparts.

...read moreread less

Abstract: This paper studies methods of quantitatively measuring semantic information in communication. We review existing work on quantifying semantic information, then investigate a model-theoretical approach for semantic data compression and reliable semantic communication. We relate our approach to the statistical measurement of information by Shannon, and show that Shannon's source and channel coding theorems have semantic counterparts.

...read moreread less

Journal Article•DOI•

[...]

David Sánchez, Montserrat Batet

01 Oct 2011-Journal of Biomedical Informatics

TL;DR: It is found that an information-theoretical redefinition of well-known semantic measures and similarity coefficients, and an intrinsic estimation of concept IC result in noticeable improvements in their accuracy, resulting in new semantic similarity measures expressed in terms of concept Information Content.

...read moreread less

Journal Article•

Learning Multi-modal Similarity

[...]

Brian McFee, Gert R. G. Lanckriet

01 Feb 2011-Journal of Machine Learning Research

TL;DR: To cope with the ubiquitous problems of subjectivity and inconsistency in multi-media similarity, this work develops graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

...read moreread less

Abstract: In many applications involving multi-media data, the definition of similarity between items is integral to several key tasks, including nearest-neighbor retrieval, classification, and recommendation. Data in such regimes typically exhibits multiple modalities, such as acoustic and visual content of video. Integrating such heterogeneous data to form a holistic similarity space is therefore a key challenge to be overcome in many real-world applications. We present a novel multiple kernel learning technique for integrating heterogeneous data into a single, unified similarity space. Our algorithm learns an optimal ensemble of kernel transformations which conform to measurements of human perceptual similarity, as expressed by relative comparisons. To cope with the ubiquitous problems of subjectivity and inconsistency in multi-media similarity, we develop graph-based techniques to filter similarity measurements, resulting in a simplified and robust training procedure.

...read moreread less

Proceedings Article•

[...]

Samer Hassan¹, Rada Mihalcea¹•Institutions (1)

University of North Texas¹

07 Aug 2011

TL;DR: A novel method for measuring semantic relatedness using semantic profiles constructed from salient encyclopedic features built on the notion that the meaning of a word can be characterized by the salient concepts found in its immediate context is introduced.

...read moreread less

Abstract: This paper introduces a novel method for measuring semantic relatedness using semantic profiles constructed from salient encyclopedic features. The model is built on the notion that the meaning of a word can be characterized by the salient concepts found in its immediate context. In addition to being computationally efficient, the new model has superior performance and remarkable consistency when compared to both knowledge-based and corpus-based state-of-the-art semantic relatedness models.

...read moreread less

Journal Article•DOI•

Redundancy in Perceptual and Linguistic Experience: Comparing Feature-Based and Distributional Models of Semantic Representation

[...]

Brian Riordan¹, Michael N. Jones¹•Institutions (1)

Indiana University¹

01 Apr 2011-Topics in Cognitive Science

TL;DR: It is argued that the amount of perceptual and other semantic information that can be learned from purely distributional statistics has been underappreciated and that future focus should be on understanding the cognitive mechanisms humans use to integrate the two sources.

...read moreread less

Abstract: Since their inception, distributional models of semantics have been criticized as inadequate cognitive theories of human semantic learning and representation. A principal challenge is that the representations derived by distributional models are purely symbolic and are not grounded in perception and action; this challenge has led many to favor feature-based models of semantic representation. We argue that the amount of perceptual and other semantic information that can be learned from purely distributional statistics has been underappreciated. We compare the representations of three feature-based and nine distributional models using a semantic clustering task. Several distributional models demonstrated semantic clustering comparable with clustering-based on feature-based representations. Furthermore, when trained on child-directed speech, the same distributional models perform as well as sensorimotor-based feature representations of children's lexical semantic knowledge. These results suggest that, to a large extent, information relevant for extracting semantic categories is redundantly coded in perceptual and linguistic experience. Detailed analyses of the semantic clusters of the feature-based and distributional models also reveal that the models make use of complementary cues to semantic organization from the two data streams. Rather than conceptualizing feature-based and distributional models as competing theories, we argue that future focus should be on understanding the cognitive mechanisms humans use to integrate the two sources.

...read moreread less

Journal Article•DOI•

SyMSS: A syntax-based measure for short-text semantic similarity

[...]

Jesús Oliva¹, José Ignacio Serrano¹, María Dolores del Castillo¹, Ángel Iglesias¹•Institutions (1)

Spanish National Research Council¹

01 Apr 2011

TL;DR: The results show that SyMSS outperforms state-of-the-art methods in terms of rank correlation with human intuition, thus proving the importance of syntactic information in sentence semantic similarity computation.

...read moreread less

Abstract: Sentence and short-text semantic similarity measures are becoming an important part of many natural language processing tasks, such as text summarization and conversational agents. This paper presents SyMSS, a new method for computing short-text and sentence semantic similarity. The method is based on the notion that the meaning of a sentence is made up of not only the meanings of its individual words, but also the structural way the words are combined. Thus, SyMSS captures and combines syntactic and semantic information to compute the semantic similarity of two sentences. Semantic information is obtained from a lexical database. Syntactic information is obtained through a deep parsing process that finds the phrases in each sentence. With this information, the proposed method measures the semantic similarity between concepts that play the same syntactic role. Psychological plausibility is added to the method by using previous findings about how humans weight different syntactic roles when computing semantic similarity. The results show that SyMSS outperforms state-of-the-art methods in terms of rank correlation with human intuition, thus proving the importance of syntactic information in sentence semantic similarity computation.

...read moreread less

Journal Article•DOI•

Analysis of user keyword similarity in online social networks

[...]

Prantik Bhattacharyya¹, Ankush Garg¹, Shyhtsun Felix Wu¹•Institutions (1)

University of California, Davis¹

01 Jul 2011-Social Network Analysis and Mining

TL;DR: A model to relate keywords based on their semantic relationship and define similarity functions to quantify the similarity between a pair of users is developed and it is concluded that direct friends are more similar than any other user pair.

...read moreread less

Abstract: How do two people become friends? What role does homophily play in bringing two people closer to help them forge friendship? Is the similarity between two friends different from the similarity between any two people? How does the similarity between a friend of a friend compare to similarity between direct friends? In this work, our goal is to answer these questions. We study the relationship between semantic similarity of user profile entries and the social network topology. A user profile in an on-line social network is characterized by its profile entries. The entries are termed as user keywords. We develop a model to relate keywords based on their semantic relationship and define similarity functions to quantify the similarity between a pair of users. First, we present a ‘forest model’ to categorize keywords across multiple categorization trees and define the notion of distance between keywords. Second, we use the keyword distance to define similarity functions between a pair of users. Third, we analyze a set of Facebook data according to the model to determine the effect of homophily in on-line social networks. Based on our evaluations, we conclude that direct friends are more similar than any other user pair. However, the more striking observation is that except for direct friends, similarities between users are approximately equal, irrespective of the topological distance between them.

...read moreread less

Journal Article•DOI•

Semantic guidance of eye movements in real-world scenes.

[...]

Alex D. Hwang¹, Hsueh-Cheng Wang¹, Marc Pomplun¹•Institutions (1)

University of Massachusetts Boston¹

25 May 2011-Vision Research

TL;DR: Semantic saliency maps of real-world scenes based on the semantic similarity of scene objects to the currently fixated object or the search target are generated and reveal a preference for transitions to objects that were semantically similar to the Currently inspected one.

...read moreread less

Book Chapter•DOI•

Behavioral similarity: a proper metric

[...]

Matthias Kunze¹, Matthias Weidlich¹, Mathias Weske¹•Institutions (1)

Hasso Plattner Institute¹

30 Aug 2011

TL;DR: A proper metric to quantify process similarity based on behavioral profiles is introduced, grounded in the Jaccard coefficient and leverages behavioral relations between pairs of process model activities.

...read moreread less

Abstract: With the increasing influence of Business Process Management, large process model repositories emerged in enterprises and public administrations. Their effective utilization requires meaningful and efficient capabilities to search for models that go beyond text based search or folder navigation, e.g., by similarity. Existing measures for process model similarity are often not applicable for efficient similarity search, as they lack metric features. In this paper, we introduce a proper metric to quantify process similarity based on behavioral profiles. It is grounded in the Jaccard coefficient and leverages behavioral relations between pairs of process model activities. The metric is successfully evaluated towards its approximation of human similarity assessment.

...read moreread less

Journal Article•DOI•

[...]

Philip Resnik¹•Institutions (1)

University of Maryland, College Park¹

27 May 2011-arXiv: Artificial Intelligence

TL;DR: This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content that performs better than the traditional edge-counting approach.

...read moreread less

Abstract: This article presents a measure of semantic similarity in an IS-A taxonomy based on the notion of shared information content. Experimental evaluation against a benchmark set of human similarity judgments demonstrates that the measure performs better than the traditional edge-counting approach. The article presents algorithms that take advantage of taxonomic similarity in resolving syntactic and semantic ambiguity, along with experimental results demonstrating their effectiveness.

...read moreread less

Journal Article•DOI•

Taxonomy induction based on a collaboratively built knowledge repository

[...]

Simone Paolo Ponzetto¹, Michael Strube²•Institutions (2)

Heidelberg University¹, Heidelberg Institute for Theoretical Studies²

01 Jun 2011-Artificial Intelligence

TL;DR: A method is proposed which manually determines the quality and automatically compares its coverage with ResearchCyc, one of the largest manually created ontologies, and the lexical database WordNet, which shows that the taxonomy compares favorably in quality and coverage with broad-coverage manually created resources.

...read moreread less

Patent•

Indexing content at semantic level

[...]

Jiangbo Dang¹, Murat Kalender¹, Candemir Toklu¹, Kenneth Hampel¹•Institutions (1)

Siemens¹

01 Feb 2011

TL;DR: In this paper, the authors provide ontology mapping algorithms and concept weighting algorithms that create accurate semantic tags that can be used to improve enterprise content management, and search for better knowledge management and collaboration.

...read moreread less

Abstract: Systems and methods are disclosed that perform automated semantic tagging. Automated semantic tagging produces semantically linked tags for a given text content. Embodiments provide ontology mapping algorithms and concept weighting algorithms that create accurate semantic tags that can be used to improve enterprise content management, and search for better knowledge management and collaboration.

...read moreread less

Journal Article•DOI•

The semantics of similarity in geographic information retrieval

[...]

Krzysztof Janowicz, Martin Raubal, Werner Kuhn

25 May 2011-Journal of Spatial Information Science

TL;DR: This work introduces a framework to specify the semantics of similarity, and discusses similarity-based information retrieval paradigms as well as their implementation in web-based user interfaces for geo- graphic information retrieval to demonstrate the applicability of the framework.

...read moreread less

Abstract: Similarity measures have a long tradition in fields such as information retrieval, artificial intelligence, and cognitive science. Within the last years, these measures have been extended and reused to measure semantic similarity; i.e., for comparing meanings rather than syntactic differences. Various measures for spatial applications have been de- veloped, but a solid foundation for answering what they measure; how they are best ap- plied in information retrieval; which role contextual information plays; and how similarity values or rankings should be interpreted is still missing. It is therefore difficult to decide which measure should be used for a particular application or to compare results from dif- ferent similarity theories. Based on a review of existing similarity measures, we introduce a framework to specify the semantics of similarity. We discuss similarity-based information retrieval paradigms as well as their implementation in web-based user interfaces for geo- graphic information retrieval to demonstrate the applicability of the framework. Finally, we formulate open challenges for similarity research.

...read moreread less

Journal Article•DOI•

The semantic mapping of words and co-words in contexts

[...]

Loet Leydesdorff¹, Kasper Welbers¹•Institutions (1)

University of Amsterdam¹

01 Jul 2011-Journal of Informetrics

TL;DR: This communication provides an introduction, an example, pointers to relevant software, and summarizes the choices that can be made by the analyst, so that visualization (“semantic mapping”) is made more accessible.

...read moreread less

Journal Article•DOI•

Is more always better? Effects of semantic richness on lexical decision, speeded pronunciation, and semantic classification.

[...]

Melvin J. Yap¹, Sarah E. Tan¹, Penny M. Pexman², Ian S. Hargreaves²•Institutions (2)

National University of Singapore¹, University of Calgary²

15 Apr 2011-Psychonomic Bulletin & Review

TL;DR: It was found that number of features and contexts consistently facilitated word recognition but that the effects of semantic neighborhood density and number of associates were less robust, which point to how the results are selectively and adaptively modulated by task-specific demands.

...read moreread less

Abstract: Evidence from large-scale studies (Pexman, Hargreaves, Siakaluk, Bodner, & Pope, 2008) suggests that semantic richness, a multidimensional construct reflecting the extent of variability in the information associated with a word's meaning, facilitates visual word recognition. Specifically, recognition is better for words that (1) have more semantic neighbors, (2) possess referents with more features, and (3) are associated with more contexts. The present study extends Pexman et al. (2008) by examining how two additional measures of semantic richness, number of senses and number of associates (Pexman, Hargreaves, Edwards, Henry, & Goodyear, 2007), influence lexical decision, speeded pronunciation, and semantic classification performance, after controlling for an array of lexical and semantic variables. We found that number of features and contexts consistently facilitated word recognition but that the effects of semantic neighborhood density and number of associates were less robust. Words with more senses also elicited faster lexical decisions but less accurate semantic classifications. These findings point to how the effects of different semantic dimensions are selectively and adaptively modulated by task-specific demands.

...read moreread less

Journal Article•DOI•

Watson, more than a Semantic Web search engine

[...]

Mathieu d'Aquin, Enrico Motta¹•Institutions (1)

Open University¹

01 Jan 2011-Social Work

TL;DR: An overview of the Watson system, a Semantic Web search engine providing various functionalities not only to find and locate ontologies and semantic data online, but also to explore the content of these semantic documents.

...read moreread less

Abstract: In this tool report, we present an overview of the Watson system, a Semantic Web search engine providing various functionalities not only to find and locate ontologies and semantic data online, but also to explore the content of these semantic documents. Beyond the simple facade of a search engine for the Semantic Web, we show that the availability of such a component brings new possibilities in terms of developing semantic applications that exploit the content of the Semantic Web. Indeed, Watson provides a set of APIs containing high level functions for finding, exploring and querying semantic data and ontologies that have been published online. Thanks to these APIs, new applications have emerged that connect activities such as ontology construction, matching, sense disambiguation and question answering to the Semantic Web, developed by our group and others. In addition, we also describe Watson as a unprecedented research platform for the study the Semantic Web, and of formalised knowledge in general.

...read moreread less

Proceedings Article•DOI•

Clickthrough-based latent semantic models for web search

[...]

Jianfeng Gao¹, Kristina Toutanova¹, Wen-tau Yih¹•Institutions (1)

Microsoft¹

24 Jul 2011

TL;DR: Two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR) are presented.

...read moreread less

Abstract: This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the documents clicked on for that query, large amounts of query-title pairs are constructed from clickthrough data; two latent semantic models are learned from this data. One is a bilingual topic model within the language modeling framework. It ranks documents for a query by the likelihood of the query being a semantics-based translation of the documents. The semantic representation is language independent and learned from query-title pairs, with the assumption that a query and its paired titles share the same distribution over semantic topics. The other is a discriminative projection model within the vector space modeling framework. Unlike Latent Semantic Analysis and its variants, the projection matrix in our model, which is used to map from term vectors into sematic space, is learned discriminatively such that the distance between a query and its paired title, both represented as vectors in the projected semantic space, is smaller than that between the query and the titles of other documents which have no clicks for that query. These models are evaluated on the Web search task using a real world data set. Results show that they significantly outperform their corresponding baseline models, which are state-of-the-art.

...read moreread less

Journal Article•DOI•

A semantic term weighting scheme for text categorization

[...]

Qiming Luo¹, Enhong Chen¹, Hui Xiong²•Institutions (2)

University of Science and Technology of China¹, Rutgers University²

01 Sep 2011-Expert Systems With Applications

TL;DR: Experimental results show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories and compares favorably with two previous studies.

...read moreread less

Abstract: Traditional term weighting schemes in text categorization, such as TF-IDF, only exploit the statistical information of terms in documents. Instead, in this paper, we propose a novel term weighting scheme by exploiting the semantics of categories and indexing terms. Specifically, the semantics of categories are represented by senses of terms appearing in the category labels as well as the interpretation of them by WordNet. Also, the weight of a term is correlated to its semantic similarity with a category. Experimental results on three commonly used data sets show that the proposed approach outperforms TF-IDF in the cases that the amount of training data is small or the content of documents is focused on well-defined categories. In addition, the proposed approach compares favorably with two previous studies.

...read moreread less

Book Chapter•DOI•

Querying linked data using semantic relatedness: a vocabulary independent approach

[...]

André Freitas¹, João Gabriel Oliveira², Sean O'Riain¹, Edward Curry¹, João Carlos Pereira da Silva² - Show less +1 more•Institutions (2)

National University of Ireland, Galway¹, Federal University of Rio de Janeiro²

28 Jun 2011

TL;DR: This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipediabased semantic relatedness measure and spreading activation.

...read moreread less

Abstract: Linked Data brings the promise of incorporating a new dimension to the Web where the availability of Web-scale data can determine a paradigmatic transformation of the Web and its applications. However, together with its opportunities, Linked Data brings inherent challenges in the way users and applications consume the available data. Users consuming Linked Data on the Web, or on corporate intranets, should be able to search and query data spread over potentially a large number of heterogeneous, complex and distributed datasets. Ideally, a query mechanism for Linked Data should abstract users from the representation of data. This work focuses on the investigation of a vocabulary independent natural language query mechanism for Linked Data, using an approach based on the combination of entity search, a Wikipediabased semantic relatedness measure and spreading activation. The combination of these three elements in a query mechanism for Linked Data is a new contribution in the space. Wikipedia-based relatedness measures address existing limitations of existing works which are based on similarity measures/term expansion based on WordNet. Experimental results using the query mechanism to answer 50 natural language queries over DBPedia achieved a mean reciprocal rank of 61.4%, an average precision of 48.7% and average recall of 57.2%, answering 70% of the queries.

...read moreread less

Collapse