Showing papers on "Semantic similarity published in 2007"

PDF

Open Access

Proceedings Article•

Computing semantic relatedness using Wikipedia-based explicit semantic analysis

[...]

Evgeniy Gabrilovich¹, Shaul Markovitch¹•Institutions (1)

Technion – Israel Institute of Technology¹

06 Jan 2007

TL;DR: This work proposes Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia that results in substantial improvements in correlation of computed relatedness scores with human judgments.

...read moreread less

Abstract: Computing semantic relatedness of natural language texts requires access to vast amounts of common-sense and domain-specific world knowledge. We propose Explicit Semantic Analysis (ESA), a novel method that represents the meaning of texts in a high-dimensional space of concepts derived from Wikipedia. We use machine learning techniques to explicitly represent the meaning of any text as a weighted vector of Wikipedia-based concepts. Assessing the relatedness of texts in this space amounts to comparing the corresponding vectors using conventional metrics (e.g., cosine). Compared with the previous state of the art, using ESA results in substantial improvements in correlation of computed relatedness scores with human judgments: from r = 0.56 to 0.75 for individual words and from r = 0.60 to 0.72 for texts. Importantly, due to the use of natural concepts, the ESA model is easy to explain to human users.

...read moreread less

2,285 citations

Journal Article•DOI•

A new method to measure the semantic similarity of GO terms

[...]

James Z. Wang¹, Zhidian Du¹, Rapeeporn Payattakool¹, Philip S. Yu¹, Chin-Fu Chen¹ - Show less +1 more•Institutions (1)

Clemson University¹

22 Mar 2007-Bioinformatics

TL;DR: A novel method to encode a GO term's semantics into a numeric value by aggregating the semantic contributions of their ancestor terms in the GO graph is proposed and, in turn, an algorithm is designed to measure the semantic similarity of GO terms.

...read moreread less

Abstract: Motivation: Although controlled biochemical or biological vocabularies, such as Gene Ontology (GO) ( http://www.geneontology.org), address the need for consistent descriptions of genes in different data sources, there is still no effective method to determine the functional similarities of genes based on gene annotation information from heterogeneous data sources. Results: To address this critical need, we proposed a novel method to encode a GO term's semantics (biological meanings) into a numeric value by aggregating the semantic contributions of their ancestor terms (including this specific term) in the GO graph and, in turn, designed an algorithm to measure the semantic similarity of GO terms. Based on the semantic similarities of GO terms used for gene annotation, we designed a new algorithm to measure the functional similarity of genes. The results of using our algorithm to measure the functional similarities of genes in pathways retrieved from the saccharomyces genome database (SGD), and the outcomes of clustering these genes based on the similarity values obtained by our algorithm are shown to be consistent with human perspectives. Furthermore, we developed a set of online tools for gene similarity measurement and knowledge discovery. Availability: The online tools are available at: http://bioinformatics.clemson.edu/G-SESAME Contact: jzwang@cs.clemson.edu Supplementary information: http://bioinformatics.clemson.edu/Publication/Supplement/gsp.htm

...read moreread less

1,067 citations

Journal Article•DOI•

Supervised Learning of Semantic Classes for Image Annotation and Retrieval

[...]

Gustavo Carneiro¹, Antoni B. Chan², Pedro J. Moreno³, Nuno Vasconcelos•Institutions (3)

Princeton University¹, University of California, San Diego², Google³

01 Mar 2007-IEEE Transactions on Pattern Analysis and Machine Intelligence

TL;DR: The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost and to be fairly robust to parameter tuning.

...read moreread less

Abstract: A probabilistic formulation for semantic image annotation and retrieval is proposed. Annotation and retrieval are posed as classification problems where each class is defined as the group of database images labeled with a common semantic label. It is shown that, by establishing this one-to-one correspondence between semantic labels and semantic classes, a minimum probability of error annotation and retrieval are feasible with algorithms that are 1) conceptually simple, 2) computationally efficient, and 3) do not require prior semantic segmentation of training images. In particular, images are represented as bags of localized feature vectors, a mixture density estimated for each image, and the mixtures associated with all images annotated with a common semantic label pooled into a density estimate for the corresponding semantic class. This pooling is justified by a multiple instance learning argument and performed efficiently with a hierarchical extension of expectation-maximization. The benefits of the supervised formulation over the more complex, and currently popular, joint modeling of semantic label and visual feature distributions are illustrated through theoretical arguments and extensive experiments. The supervised formulation is shown to achieve higher accuracy than various previously published methods at a fraction of their computational cost. Finally, the proposed method is shown to be fairly robust to parameter tuning

...read moreread less

962 citations

Journal Article•DOI•

Dependency-Based Construction of Semantic Space Models

[...]

Sebastian Padó¹, Mirella Lapata•Institutions (1)

University of Edinburgh¹

01 Jun 2007-Computational Linguistics

TL;DR: This article presents a novel framework for constructing semantic spaces that takes syntactic relations into account, and introduces a formalization for this class of models, which allows linguistic knowledge to guide the construction process.

...read moreread less

Abstract: Traditionally, vector-based semantic space models use word co-occurrence counts from large corpora to represent lexical meaning. In this article we present a novel framework for constructing semantic spaces that takes syntactic relations into account. We introduce a formalization for this class of models, which allows linguistic knowledge to guide the construction process. We evaluate our framework on a range of tasks relevant for cognitive science and natural language processing: semantic priming, synonymy detection, and word sense disambiguation. In all cases, our framework obtains results that are comparable or superior to the state of the art.

...read moreread less

696 citations

Proceedings Article•DOI•

Measuring Semantic Similarity between Words Using Web Search Engines

[...]

Danushka Bollegala¹, Yutaka Matsuo², Mitsuru Ishizuka¹•Institutions (2)

University of Tokyo¹, National Institute of Advanced Industrial Science and Technology²

01 Jan 2007

TL;DR: A robust semantic similarity measure that uses the information available on the Web to measure similarity between words or entities and a novel approach to compute semantic similarity using automatically extracted lexico-syntactic patterns from text snippets is proposed.

...read moreread less

Abstract: Semantic similarity measures play important roles in information retrieval and Natural Language Processing. Previous work in semantic web-related applications such as community mining, relation extraction, automatic meta data extraction have used various semantic similarity measures. Despite the usefulness of semantic similarity measures in these applications, robustly measuring semantic similarity between two words (or entities) remains a challenging task. We propose a robust semantic similarity measure that uses the information available on the Web to measure similarity between words or entities. The proposed method exploits page counts and text snippets returned by a Web search engine. We deflne various similarity scores for two given words P and Q, using the page counts for the queries P, Q and P AND Q. Moreover, we propose a novel approach to compute semantic similarity using automatically extracted lexico-syntactic patterns from text snippets. These difierent similarity scores are integrated using support vector machines, to leverage a robust semantic similarity measure. Experimental results on Miller-Charles benchmark dataset show that the proposed measure outperforms all the existing web-based semantic similarity measures by a wide margin, achieving a correlation coe‐cient of 0:834. Moreover, the proposed semantic similarity measure signiflcantly improves the accuracy (F-measure of 0:78) in a community mining task, and in an entity disambiguation task, thereby verifying the capability of the proposed measure to capture semantic similarity using web content.

...read moreread less

601 citations

Journal Article•DOI•

Measures of semantic similarity and relatedness in the biomedical domain

[...]

Ted Pedersen¹, Serguei V. S. Pakhomov², Siddharth Patwardhan³, Christopher G. Chute²•Institutions (3)

University of Minnesota¹, University of Rochester², University of Utah³

01 Jun 2007-Journal of Biomedical Informatics

TL;DR: There is a role both for more flexible measures of relatedness based on information derived from corpora, as well as for measures that rely on existing ontological structures.

...read moreread less

572 citations

Proceedings Article•

Deriving a large scale taxonomy from Wikipedia

[...]

Simone Paolo Ponzetto, Michael Strube

22 Jul 2007

TL;DR: A large scale taxonomy containing a large amount of subsumption is derived using methods based on connectivity in the network and lexicosyntactic matching to label the semantic relations between categories in Wikipedia.

...read moreread less

Abstract: We take the category system in Wikipedia as a conceptual network. We label the semantic relations between categories using methods based on connectivity in the network and lexicosyntactic matching. As a result we are able to derive a large scale taxonomy containing a large amount of subsumption, i.e. isa, relations. We evaluate the quality of the created resource by comparing it with ResearchCyc, one of the largest manually annotated ontologies, as well as computing semantic similarity between words in benchmarking datasets.

...read moreread less

502 citations

Proceedings Article•DOI•

A model for enriching trajectories with semantic geographical information

[...]

Luis Otavio Alvares¹, Vania Bogorny¹, Bart Kuijpers¹, José Antônio Fernandes de Macêdo², Bart Moelans¹, Alejandro A. Vaisman³ - Show less +2 more•Institutions (3)

University of Hasselt¹, École Polytechnique Fédérale de Lausanne², University of Buenos Aires³

07 Nov 2007

TL;DR: This paper proposes a data preprocessing model to add semantic information to trajectories in order to facilitate trajectory data analysis in different application domains and shows that the query complexity for the semantic analysis of trajectories will be significantly reduced.

...read moreread less

Abstract: The collection of moving object data is becoming more and more common, and therefore there is an increasing need for the efficient analysis and knowledge extraction of these data in different application domains. Trajectory data are normally available as sample points, and do not carry semantic information, which is of fundamental importance for the comprehension of these data. Therefore, the analysis of trajectory data becomes expensive from a computational point of view and complex from a user's perspective. Enriching trajectories with semantic geographical information may simplify queries, analysis, and mining of moving object data. In this paper we propose a data preprocessing model to add semantic information to trajectories in order to facilitate trajectory data analysis in different application domains. The model is generic enough to represent the important parts of trajectories that are relevant to the application, not being restricted to one specific application. We present an algorithm to compute the important parts and show that the query complexity for the semantic analysis of trajectories will be significantly reduced with the proposed model.

...read moreread less

434 citations

Journal Article•DOI•

Semantic Modeling of Natural Scenes for Content-Based Image Retrieval

[...]

Julia Vogel¹, Bernt Schiele²•Institutions (2)

University of British Columbia¹, Technische Universität Darmstadt²

21 Apr 2007-International Journal of Computer Vision

TL;DR: A novel image representation is presented that renders it possible to access natural scenes by local semantic description by using a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking.

...read moreread less

Abstract: In this paper, we present a novel image representation that renders it possible to access natural scenes by local semantic description. Our work is motivated by the continuing effort in content-based image retrieval to extract and to model the semantic content of images. The basic idea of the semantic modeling is to classify local image regions into semantic concept classes such as water, rocks, or foliage. Images are represented through the frequency of occurrence of these local concepts. Through extensive experiments, we demonstrate that the image representation is well suited for modeling the semantic content of heterogenous scene categories, and thus for categorization and retrieval. The image representation also allows us to rank natural scenes according to their semantic similarity relative to certain scene categories. Based on human ranking data, we learn a perceptually plausible distance measure that leads to a high correlation between the human and the automatically obtained typicality ranking. This result is especially valuable for content-based image retrieval where the goal is to present retrieval results in descending semantic similarity from the query.

...read moreread less

433 citations

Proceedings Article•

Using Semantic Roles to Improve Question Answering

[...]

Dan Shen¹, Mirella Lapata²•Institutions (2)

Saarland University¹, University of Edinburgh²

01 Jun 2007

TL;DR: This work introduces a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm and views semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching.

...read moreread less

Abstract: Shallow semantic parsing, the automatic identification and labeling of sentential constituents, has recently received much attention. Our work examines whether semantic role information is beneficial to question answering. We introduce a general framework for answer extraction which exploits semantic role annotations in the FrameNet paradigm. We view semantic role assignment as an optimization problem in a bipartite graph and answer extraction as an instance of graph matching. Experimental results on the TREC datasets demonstrate improvements over state-of-the-art models.

...read moreread less

429 citations

Book Chapter•DOI•

[...]

Donald Metzler¹, Susan T. Dumais², Christopher Meek²•Institutions (2)

University of Massachusetts Amherst¹, Microsoft²

02 Apr 2007

TL;DR: This work formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log, and provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency.

...read moreread less

Abstract: Measuring the similarity between documents and queries has been extensively studied in information retrieval However, there are a growing number of tasks that require computing the similarity between two very short segments of text These tasks include query reformulation, sponsored search, and image retrieval Standard text similarity measures perform poorly on such tasks because of data sparseness and the lack of context In this work, we study this problem from an information retrieval perspective, focusing on text representations and similarity measures We examine a range of similarity measures, including purely lexical measures, stemming, and language modeling-based measures We formally evaluate and analyze the methods on a query-query similarity task using 363,822 queries from a web search log Our analysis provides insights into the strengths and weaknesses of each method, including important tradeoffs between effectiveness and efficiency

...read moreread less

Proceedings Article•DOI•

Local Probabilistic Models for Link Prediction

[...]

Chao Wang¹, Venu Satuluri¹, Srinivasan Parthasarathy¹•Institutions (1)

Ohio State University¹

28 Oct 2007

TL;DR: A novel local probabilistic graphical model method that can scale to large graphs to estimate the joint co-occurrence probability of two nodes and is demonstrated to be effective both in isolation and in combination with other topological and semantic features for predicting co-authorship collaborations on real datasets.

...read moreread less

Abstract: One of the core tasks in social network analysis is to predict the formation of links (i.e. various types of relationships) over time. Previous research has generally represented the social network in the form of a graph and has leveraged topological and semantic measures of similarity between two nodes to evaluate the probability of link formation. Here we introduce a novel local probabilistic graphical model method that can scale to large graphs to estimate the joint co-occurrence probability of two nodes. Such a probability measure captures information that is not captured by either topological measures or measures of semantic similarity, which are the dominant measures used for link prediction. We demonstrate the effectiveness of the co-occurrence probability feature by using it both in isolation and in combination with other topological and semantic features for predicting co-authorship collaborations on real datasets.

...read moreread less

Patent•

Method and system for automatically extracting relations between concepts included in text

[...]

Marco Varone

04 May 2007

TL;DR: In this article, a semantic network comprising a plurality of lemmas that are grouped into synsets representing concepts, each of the synset having a corresponding sense, and links connected between the synnsets that represent semantic relations between the Synsets is described.

...read moreread less

Abstract: A method and system for automatically extracting relations between concepts included in electronic text is described. Aspects the exemplary embodiment include a semantic network comprising a plurality of lemmas that are grouped into synsets representing concepts, each of the synsets having a corresponding sense, and a plurality of links connected between the synsets that represent semantic relations between the synsets. The semantic network further includes semantic information comprising at least one of: 1) an expanded set of semantic relation links representing: hierarchical semantic relations, synset/corpus semantic relations verb/subject semantic relations, verb/direct object semantic relations, and fine grain/coarse grain semantic relationship; 2) a hierarchical category tree having a plurality of categories, wherein each of the categories contains a group of one or more synsets and a set of attributes, wherein the set of attributes of each of the categories are associated with each of the synsets in the respective category; and 3) a plurality of domains, wherein one or more of the domains is associated with at least a portion of the synsets, wherein each domain adds information regarding a linguistic context in which the corresponding synset is used in a language. A linguistic engine uses the semantic network to performing semantic disambiguation on the electronic text using one or more of the expanded set of semantic relation links, the hierarchical category tree, and the plurality of domains to assign a respective one of the senses to elements in the electronic text independently from contextual reference.

...read moreread less

Book Chapter•DOI•

Semantic matching: algorithms and implementation

[...]

Fausto Giunchiglia¹, Mikalai Yatskevich¹, Pavel Shvaiko¹•Institutions (1)

University of Trento¹

01 Jan 2007-Journal on Data Semantics

TL;DR: Semantic matching as discussed by the authors is an operator that takes two graph-like structures (e.g., classifications, XML schemas) and produces a mapping between the nodes of these graphs that correspond semantically to each other.

...read moreread less

Abstract: We view match as an operator that takes two graph-like structures (e.g., classifications, XML schemas) and produces a mapping between the nodes of these graphs that correspond semantically to each other. Semantic matching is based on two ideas: (i) we discover mappings by computing semantic relations (e.g., equivalence, more general); (ii) we determine semantic relations by analyzing the meaning (concepts, not labels) which is codified in the elements and the structures of schemas. In this paper we present basic and optimized algorithms for semantic matching, and we discuss their implementation within the S-Match system. We evaluate S-Match against three state of the art matching systems, thereby justifying empirically the strength of our approach.

...read moreread less

Proceedings Article•DOI•

Unsupervised Graph-basedWord Sense Disambiguation Using Measures of Word Semantic Similarity

[...]

R. Sinha¹, Rada Mihalcea¹•Institutions (1)

University of North Texas¹

17 Sep 2007

TL;DR: The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense disambiguation, as measured on standard data sets.

...read moreread less

Abstract: This paper describes an unsupervised graph-based method for word sense disambiguation, and presents comparative evaluations using several measures of word semantic similarity and several algorithms for graph centrality. The results indicate that the right combination of similarity metrics and graph centrality algorithms can lead to a performance competing with the state-of-the-art in unsupervised word sense disambiguation, as measured on standard data sets.

...read moreread less

Journal Article•DOI•

Knowledge derived from wikipedia for computing semantic relatedness

[...]

Simone Paolo Ponzetto, Michael Strube

01 Sep 2007-Journal of Artificial Intelligence Research

TL;DR: Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and it is shown that Wikipedia outperforms WordNet on some datasets.

...read moreread less

Abstract: Wikipedia provides a semantic network for computing semantic relatedness in a more structured fashion than a search engine and with more coverage than WordNet. We present experiments on using Wikipedia for computing semantic relatedness and compare it to WordNet on various benchmarking datasets. Existing relatedness measures perform better using Wikipedia than a baseline given by Google counts, and we show that Wikipedia outperforms WordNet on some datasets. We also address the question whether and how Wikipedia can be integrated into NLP applications as a knowledge base. Including Wikipedia improves the performance of a machine learning based coreference resolution system, indicating that it represents a valuable resource for NLP applications. Finally, we show that our method can be easily used for languages other than English by computing semantic relatedness for a German dataset.

...read moreread less

Journal Article•DOI•

Measuring semantic similarity between Gene Ontology terms

[...]

Francisco M. Couto¹, Mário J. Silva¹, Pedro M. Coutinho²•Institutions (2)

University of Lisbon¹, University of Provence²

01 Apr 2007

TL;DR: GraSM, a novel method that uses all the information in the graph structure of the Gene Ontology, instead of considering it as a hierarchical tree, gives a consistently higher family similarity correlation on all aspects of GO than the original semantic similarity measures.

...read moreread less

Abstract: Many bioinformatics applications would benefit from comparing proteins based on their biological role rather than their sequence. This paper adds two new contributions. First, a study of the correlation between Gene Ontology (GO) terms and family similarity demonstrates that protein families constitute an appropriate baseline for validating GO similarity. Secondly, we introduce GraSM, a novel method that uses all the information in the graph structure of the Gene Ontology, instead of considering it as a hierarchical tree. GraSM gives a consistently higher family similarity correlation on all aspects of GO than the original semantic similarity measures.

...read moreread less

Journal Article•DOI•

AquaLog: An ontology-driven question answering system for organizational semantic intranets

[...]

Vanessa Lopez¹, Victoria Uren¹, Enrico Motta¹, Michele Pasin¹•Institutions (1)

Open University¹

01 Jun 2007-Journal of Web Semantics

TL;DR: AquaLog is a portable question-answering system which takes queries expressed in natural language and an ontology as input, and returns answers drawn from one or more knowledge bases (KBs) because the configuration time required to customize the system for a particular ontology is negligible.

...read moreread less

Proceedings Article•

Lexical Semantic Relatedness with Random Graph Walks

[...]

Thad Hughes, Daniel Ramage¹•Institutions (1)

Stanford University¹

01 Jun 2007

TL;DR: A new model of lexical semantic relatedness is proposed that incorporates information from every explicit or implicit path connecting the two words in the entire graph and is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions.

...read moreread less

Abstract: Many systems for tasks such as question answering, multi-document summarization, and information retrieval need robust numerical measures of lexical relatedness. Standard thesaurus-based measures of word pair similarity are based on only a single path between those words in the thesaurus graph. By contrast, we propose a new model of lexical semantic relatedness that incorporates information from every explicit or implicit path connecting the two words in the entire graph. Our model uses a random walk over nodes and edges derived from WordNet links and corpus statistics. We treat the graph as a Markov chain and compute a word-specific stationary distribution via a generalized PageRank algorithm. Semantic relatedness of a word pair is scored by a novel divergence measure, ZKL, that outperforms existing measures on certain classes of distributions. In our experiments, the resulting relatedness measure is the WordNet-based measure most highly correlated with human similarity judgments by rank ordering at = .90.

...read moreread less

Proceedings Article•DOI•

SemEval-2007 Task 04: Classification of Semantic Relations between Nominals

[...]

Roxana Girju¹, Preslav Nakov², Vivi Nastase, Stan Szpakowicz³, Peter D. Turney⁴, Deniz Yuret⁵ - Show less +2 more•Institutions (5)

University of Illinois at Urbana–Champaign¹, University of California, Berkeley², University of Ottawa³, National Research Council⁴, Koç University⁵

23 Jun 2007

TL;DR: An evaluation task designed to provide a framework for comparing different approaches to classifying semantic relations between nominals in a sentence as part of SemEval, the 4th edition of the semantic evaluation event previously known as SensEval.

...read moreread less

Abstract: The NLP community has shown a renewed interest in deeper semantic analyses, among them automatic recognition of relations between pairs of words in a text. We present an evaluation task designed to provide a framework for comparing different approaches to classifying semantic relations between nominals in a sentence. This is part of SemEval, the 4th edition of the semantic evaluation event previously known as SensEval. We define the task, describe the training/test data and their creation, list the participating systems and discuss their results. There were 14 teams who submitted 15 systems.

...read moreread less

Book•

Latent semantic mapping

[...]

Jerome R. Bellegarda

01 Jan 2007

Journal Article•DOI•

Information theory applied to the sparse gene ontology annotation network to predict novel gene function

[...]

Ying Tao¹, Lee Sam², Jianrong Li², Carol Friedman², Yves A. Lussier² - Show less +1 more•Institutions (2)

Columbia University¹, University of Chicago²

01 Jul 2007

TL;DR: A novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations is proposed, able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed.

...read moreread less

Abstract: Motivation: Despite advances in the gene annotation process, the functions of a large portion of gene products remain insufficiently characterized. In addition, the in silico prediction of novel Gene Ontology (GO) annotations for partially characterized gene functions or processes is highly dependent on reverse genetic or functional genomic approaches. To our knowledge, no prediction method has been demonstrated to be highly accurate for sparsely annotated GO terms (those associated to fewer than 10 genes). Results: We propose a novel approach, information theory-based semantic similarity (ITSS), to automatically predict molecular functions of genes based on existing GO annotations. Using a 10-fold cross-validation, we demonstrate that the ITSS algorithm obtains prediction accuracies (precision 97%, recall 77%) comparable to other machine learning algorithms when compared in similar conditions over densely annotated portions of the GO datasets. This method is able to generate highly accurate predictions in sparsely annotated portions of GO, where previous algorithms have failed. As a result, our technique generates an order of magnitude more functional predictions than previous methods. A 10-fold cross validation demonstrated a precision of 90% at a recall of 36% for the algorithm over sparsely annotated networks of the recent GO annotations (about 1400 GO terms and 11 000 genes in Homo sapiens). To our knowledge, this article presents the first historical rollback validation for the predicted GO annotations, which may represent more realistic conditions than more widely used cross-validation approaches. By manually assessing a random sample of 100 predictions conducted in a historical rollback evaluation, we estimate that a minimum precision of 51% (95% confidence interval: 43–58%) can be achieved for the human GO Annotation file dated 2003. Availability: The program is available on request. The 97 732 positive predictions of novel gene annotations from the 2005 GO Annotation dataset and other supplementary information is available at http://phenos.bsd.uchicago.edu/ITSS/ Contact: Lussier@uchicago.edu Supplementary information: Supplementary data are available atBioinformatics online.

...read moreread less

Journal Article•DOI•

Semantic integration to identify overlapping functional modules in protein interaction networks

[...]

Young-Rae Cho¹, Woochang Hwang¹, Murali Ramanathan¹, Aidong Zhang¹•Institutions (1)

State University of New York System¹

24 Jul 2007-BMC Bioinformatics

TL;DR: A flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks and shows that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence.

...read moreread less

Abstract: The systematic analysis of protein-protein interactions can enable a better understanding of cellular organization, processes and functions. Functional modules can be identified from the protein interaction networks derived from experimental data sets. However, these analyses are challenging because of the presence of unreliable interactions and the complex connectivity of the network. The integration of protein-protein interactions with the data from other sources can be leveraged for improving the effectiveness of functional module detection algorithms. We have developed novel metrics, called semantic similarity and semantic interactivity, which use Gene Ontology (GO) annotations to measure the reliability of protein-protein interactions. The protein interaction networks can be converted into a weighted graph representation by assigning the reliability values to each interaction as a weight. We presented a flow-based modularization algorithm to efficiently identify overlapping modules in the weighted interaction networks. The experimental results show that the semantic similarity and semantic interactivity of interacting pairs were positively correlated with functional co-occurrence. The effectiveness of the algorithm for identifying modules was evaluated using functional categories from the MIPS database. We demonstrated that our algorithm had higher accuracy compared to other competing approaches. The integration of protein interaction networks with GO annotation data and the capability of detecting overlapping modules substantially improve the accuracy of module identification.

...read moreread less

Proceedings Article•

[...]

Katrin Erk¹•Institutions (1)

University of Texas at Austin¹

01 Dec 2007

TL;DR: A new, simple model for the automatic induction of selectional preferences, using corpus-based semantic similarity metrics, focuses on the task of semantic role labeling and shows lower error rates than both Resnik's WordNet-based model and the EM-based clustering model.

...read moreread less

Abstract: We propose a new, simple model for the automatic induction of selectional preferences, using corpus-based semantic similarity metrics. Focusing on the task of semantic role labeling, we compute selectional preferences for semantic roles. In evaluations the similarity-based model shows lower error rates than both Resnik’s WordNet-based model and the EM-based clustering model, but has coverage problems.

...read moreread less

Proceedings Article•

Analysis of the Wikipedia Category Graph for NLP Applications

[...]

Torsten Zesch, Iryna Gurevych

18 Mar 2007

TL;DR: A graphtheoretic analysis of the category graph is performed, and it is shown that it is a scale-free, small world graph like other well-known lexical semantic networks.

...read moreread less

Abstract: In this paper, we discuss two graphs in Wikipedia (i) the article graph, and (ii) the category graph. We perform a graphtheoretic analysis of the category graph, and show that it is a scale-free, small world graph like other well-known lexical semantic networks. We substantiate our findings by transferring semantic relatedness algorithms defined on WordNet to the Wikipedia category graph. To assess the usefulness of the category graph as an NLP resource, we analyze its coverage and the performance of the transferred semantic relatedness algorithms.

...read moreread less

Journal Article•DOI•

Refractory effects in stroke aphasia: a consequence of poor semantic control.

[...]

Elizabeth Jefferies¹, Stephen S. Baker¹, Mark Doran², Matthew A. Lambon Ralph¹•Institutions (2)

University of Manchester¹, Walton Centre²

01 Jan 2007-Neuropsychologia

TL;DR: Unlike patients with classical "semantic access impairment", the authors' semantically impaired stroke patients showed significant test-retest consistency, indicating that their difficulties did not result from an unpredictable failure of semantic access--instead, their deficits were interpreted as arising from failures of semantic control.

...read moreread less

Journal Article•DOI•

To Predict or Not to Predict: Influences of Task and Strategy on the Processing of Semantic Relations

[...]

Dietmar Roehm¹, Ina Bornkessel-Schlesewsky¹, Frank Rösler², Matthias Schlesewsky²•Institutions (2)

Max Planck Society¹, University of Marburg²

01 Aug 2007-Journal of Cognitive Neuroscience

TL;DR: Three ERP studies contrasting the processing of antonym relations with that of related and unrelated word pairs revealed that the P300 effect is not only a function of stimulus constraints and experimental task, but that it is also crucially influenced by individual processing strategies used to achieve successful task performance.

...read moreread less

Abstract: We report a series of event-related potential experiments designed to dissociate the functionally distinct processes involved in the comprehension of highly restricted lexical-semantic relations (antonyms). We sought to differentiate between influences of semantic relatedness (which are independent of the experimental setting) and processes related to predictability (which differ as a function of the experimental environment). To this end, we conducted three ERP studies contrasting the processing of antonym relations (black-white) with that of related (black-yellow) and unrelated (black-nice) word pairs. Whereas the lexical-semantic manipulation was kept constant across experiments, the experimental environment and the task demands varied: Experiment 1 presented the word pairs in a sentence context of the form The opposite of X is Y and used a sensicality judgment. Experiment 2 used a word pair presentation mode and a lexical decision task. Experiment 3 also examined word pairs, but with an antonymy judgment task. All three experiments revealed a graded N400 response (unrelated > related > antonyms), thus supporting the assumption that semantic associations are processed automatically. In addition, the experiments revealed that, in highly constrained task environments, the N400 gradation occurs simultaneously with a P300 effect for the antonym condition, thus leading to the superficial impression of an extremely “reduced” N400 for antonym pairs. Comparisons across experiments and participant groups revealed that the P300 effect is not only a function of stimulus constraints (i.e., sentence context) and experimental task, but that it is also crucially influenced by individual processing strategies used to achieve successful task performance.

...read moreread less

Proceedings Article•DOI•

Keyword generation for search engine advertising using semantic similarity between terms

[...]

Vibhanshu Abhishek¹, Kartik Hosanagar¹•Institutions (1)

University of Pennsylvania¹

19 Aug 2007

TL;DR: This paper seeks to establish a mathematical formulation of this problem and suggests a method for generation of several terms from a seed keyword by using a web based kernel function to establish semantic similarity between terms.

...read moreread less

Abstract: An important problem in search engine advertising is key-word1 generation. In the past, advertisers have preferred to bid for keywords that tend to have high search volumes and hence are more expensive. An alternate strategy involves bidding for several related but low volume, inexpensive terms that generate the same amount of traffic cumulatively but are much cheaper. This paper seeks to establish a mathematical formulation of this problem and suggests a method for generation of several terms from a seed keyword. This approach uses a web based kernel function to establish semantic similarity between terms. The similarity graph is then traversed to generate keywords that are related but cheaper.

...read moreread less

Proceedings Article•

A semantic space for music derived from social tags

[...]

Mark Levy, Mark Sandler

01 Jan 2007

TL;DR: This paper shows that, despite the ad hoc and informal language of tagging, tags define a low-dimensional semantic space that is extremely well-behaved at the track level, in particular being highly organised by artist and musical genre.

...read moreread less

Abstract: In this paper we investigate social tags as a novel highvolume source of semantic metadata for music, using techniques from the fields of information retrieval and multivariate data analysis. We show that, despite the ad hoc and informal language of tagging, tags define a low-dimensional semantic space that is extremely well-behaved at the track level, in particular being highly organised by artist and musical genre. We introduce the use of Correspondence Analysis to visualise this semantic space, and show how it can be applied to create a browse-by-mood interface for a psychologically-motivated two-dimensional subspace representing musical emotion.

...read moreread less

Proceedings Article•

[...]

Wen-tau Yih¹, Christopher Meek¹•Institutions (1)

Microsoft¹

22 Jul 2007

TL;DR: A Web-relevance similarity measure is introduced and it is shown that one can further improve the accuracy of similarity measures by using a machine learning approach.

...read moreread less

Abstract: In this paper we improve previous work on measuring the similarity of short segments of text in two ways. First, we introduce a Web-relevance similarity measure and demonstrate its effectiveness. This measure extends the Web-kernel similarity function introduced by Sahami and Heilman (2006) by using relevance weighted inner-product of term occurrences rather than TF×IDF. Second, we show that one can further improve the accuracy of similarity measures by using a machine learning approach. Our methods outperform other state-of-the-art methods in a general query suggestion task for multiple evaluation metrics.

...read moreread less

Collapse