Showing papers on "Semantic similarity published in 2014"

PDF

Open Access

Journal Article•DOI•

[...]

Elia Bruni¹, Nam Khanh Tran, Marco Baroni¹•Institutions (1)

01 Jan 2014-Journal of Artificial Intelligence Research

TL;DR: This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

...read moreread less

Abstract: Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human semantic knowledge. We address the lack of perceptual grounding of distributional models by exploiting computer vision techniques that automatically identify discrete "visual words" in images, so that the distributional representation of a word can be extended to also encompass its co-occurrence with the visual words of images it is associated with. We propose a flexible architecture to integrate text- and image-based distributional information, and we show in a set of empirical tests that our integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

...read moreread less

900 citations

Proceedings Article•DOI•

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

[...]

Yelong Shen¹, Xiaodong He¹, Jianfeng Gao¹, Li Deng¹, Grégoire Mesnil² - Show less +1 more•Institutions (2)

Microsoft¹, Université de Montréal²

03 Nov 2014

TL;DR: A new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents is proposed.

...read moreread less

Abstract: In this paper, we propose a new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents. In order to capture the rich contextual structures in a query or a document, we start with each word within a temporal context window in a word sequence to directly capture contextual features at the word n-gram level. Next, the salient word n-gram features in the word sequence are discovered by the model and are then aggregated to form a sentence-level feature vector. Finally, a non-linear transformation is applied to extract high-level semantic information to generate a continuous vector representation for the full text string. The proposed convolutional latent semantic model (CLSM) is trained on clickthrough data and is evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that the proposed model effectively captures salient semantic information in queries and documents for the task while significantly outperforming previous state-of-the-art semantic models.

...read moreread less

723 citations

Proceedings Article•DOI•

Learning semantic representations using convolutional neural networks for web search

[...]

Yelong Shen¹, Xiaodong He², Jianfeng Gao², Li Deng², Grégoire Mesnil³ - Show less +1 more•Institutions (3)

Kent State University¹, Microsoft², Université de Montréal³

07 Apr 2014

TL;DR: This paper presents a series of new latent semantic models based on a convolutional neural network to learn low-dimensional semantic vectors for search queries and Web documents that significantly outperforms other se-mantic models in retrieval performance.

...read moreread less

Abstract: This paper presents a series of new latent semantic models based on a convolutional neural network (CNN) to learn low-dimensional semantic vectors for search queries and Web documents. By using the convolution-max pooling operation, local contextual information at the word n-gram level is modeled first. Then, salient local fea-tures in a word sequence are combined to form a global feature vector. Finally, the high-level semantic information of the word sequence is extracted to form a global vector representation. The proposed models are trained on clickthrough data by maximizing the conditional likelihood of clicked documents given a query, us-ing stochastic gradient ascent. The new models are evaluated on a Web document ranking task using a large-scale, real-world data set. Results show that our model significantly outperforms other se-mantic models, which were state-of-the-art in retrieval performance prior to this work.

...read moreread less

706 citations

Proceedings Article•DOI•

SemEval-2014 Task 10: Multilingual Semantic Textual Similarity

[...]

Eneko Agirre¹, Carmen Banea², Claire Cardie³, Daniel Cer⁴, Mona Diab⁵, Aitor Gonzalez-Agirre¹, Weiwei Guo⁶, Rada Mihalcea⁷, German Rigau¹, Janyce Wiebe⁸ - Show less +6 more•Institutions (8)

University of the Basque Country¹, University of North Texas², Cornell University³, Stanford University⁴, George Washington University⁵, Columbia University⁶, Carnegie Mellon University⁷, University of Pittsburgh⁸

01 Aug 2014

TL;DR: This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity, and the annotations for both tasks leveraged crowdsourcing.

...read moreread less

Abstract: In Semantic Textual Similarity, systems rate the degree of semantic equivalence between two text snippets. This year, the participants were challenged with new data sets for English, as well as the introduction of Spanish, as a new language in which to assess semantic similarity. For the English subtask, we exposed the systems to a diversity of testing scenarios, by preparing additional OntoNotesWordNet sense mappings and news headlines, as well as introducing new genres, including image descriptions, DEFT discussion forums, DEFT newswire, and tweet-newswire headline mappings. For Spanish, since, to our knowledge, this is the first time that official evaluations are conducted, we used well-formed text, by featuring sentences extracted from encyclopedic content and newswire. The annotations for both tasks leveraged crowdsourcing. The Spanish subtask engaged 9 teams participating with 22 system runs, and the English subtask attracted 15 teams with 38 system runs.

...read moreread less

509 citations

Proceedings Article•DOI•

Semantic Parsing for Single-Relation Question Answering

[...]

Wen-tau Yih¹, Xiaodong He¹, Christopher Meek¹•Institutions (1)

Microsoft¹

01 Jun 2014

TL;DR: A semantic parsing framework based on semantic similarity for open domain question answering (QA) that achieves higher precision across different recall points compared to the previous approach, and can improve F1 by 7 points.

...read moreread less

Abstract: We develop a semantic parsing framework based on semantic similarity for open domain question answering (QA). We focus on single-relation questions and decompose each question into an entity mention and a relation pattern. Using convolutional neural network models, we measure the similarity of entity mentions with entities in the knowledge base (KB) and the similarity of relation patterns and relations in the KB. We score relational triples in the KB using these measures and select the top scoring relational triple to answer the question. When evaluated on an open-domain QA task, our method achieves higher precision across different recall points compared to the previous approach, and can improve F1 by 7 points.

...read moreread less

424 citations

Proceedings Article•DOI•

SemEval-2014 Task 1: Evaluation of Compositional Distributional Semantic Models on Full Sentences through Semantic Relatedness and Textual Entailment

[...]

Marco Marelli¹, Luisa Bentivogli², Marco Baroni¹, Raffaella Bernardi¹, Stefano Menini², Roberto Zamparelli¹ - Show less +2 more•Institutions (2)

University of Trento¹, fondazione bruno kessler²

01 Aug 2014

TL;DR: This paper presents the task on the evaluation of Compositional Distributional Semantics Models on full sentences organized for the first time within SemEval2014, and attracted 21 teams, most of which participated in both subtasks.

...read moreread less

Abstract: This paper presents the task on the evaluation of Compositional Distributional Semantics Models on full sentences organized for the first time within SemEval2014. Participation was open to systems based on any approach. Systems were presented with pairs of sentences and were evaluated on their ability to predict human judgments on (i) semantic relatedness and (ii) entailment. The task attracted 21 teams, most of which participated in both subtasks. We received 17 submissions in the relatedness subtask (for a total of 66 runs) and 18 in the entailment subtask (65 runs).

...read moreread less

414 citations

Proceedings Article•DOI•

Improving Lexical Embeddings with Semantic Knowledge

[...]

Mo Yu¹, Mark Dredze²•Institutions (2)

Harbin Institute of Technology¹, Johns Hopkins University²

01 Jun 2014

TL;DR: This work proposes a new learning objective that incorporates both a neural language model objective (Mikolov et al., 2013) and prior knowledge from semantic resources to learn improved lexical semantic embeddings.

...read moreread less

Abstract: Word embeddings learned on unlabeled data are a popular tool in semantics, but may not capture the desired semantics. We propose a new learning objective that incorporates both a neural language model objective (Mikolov et al., 2013) and prior knowledge from semantic resources to learn improved lexical semantic embeddings. We demonstrate that our embeddings improve over those learned solely on raw text in three settings: language modeling, measuring semantic similarity, and predicting human judgements.

...read moreread less

335 citations

Journal Article•DOI•

[...]

Grigori Sidorov, Alexander Gelbukh, Helena Gómez-Adorno¹, David Pinto²•Institutions (2)

Instituto Politécnico Nacional¹, Benemérita Universidad Autónoma de Puebla²

29 Sep 2014-Computación Y Sistemas

TL;DR: The proposed similarity measure soft similarity is a generalize of the well-known cosine similarity measure in VSM by introducing what it is called “soft cosine measure” and various formulas for exact or approximate calculation of the softcosine measure are proposed.

...read moreread less

Abstract: We show how to consider similarity between features for calculation of similarity of objects in the Vector Space Model (VSM) for machine learning algorithms and other classes of methods that involve similarity between objects. Unlike LSA, we assume that similarity between features is known (say, from a synonym dictionary) and does not need to be learned from the data.We call the proposed similarity measure soft similarity. Similarity between features is common, for example, in natural language processing: words, n-grams, or syntactic n-grams can be somewhat different (which makes them different features) but still have much in common: for example, words “play” and “game” are different but related. When there is no similarity between features then our soft similarity measure is equal to the standard similarity. For this, we generalize the well-known cosine similarity measure in VSM by introducing what we call “soft cosine measure”. We propose various formulas for exact or approximate calculation of the soft cosine measure. For example, in one of them we consider for VSM a new feature space consisting of pairs of the original features weighted by their similarity. Again, for features that bear no similarity to each other, our formulas reduce to the standard cosine measure. Our experiments show that our soft cosine measure provides better performance in our case study: entrance exams question answering task at CLEF. In these experiments, we use syntactic n-grams as features and Levenshtein distance as the similarity between n-grams, measured either in characters or in elements of n-grams.

...read moreread less

297 citations

Proceedings Article•DOI•

Learning Semantic Hierarchies via Word Embeddings

[...]

Fu Ruiji¹, Jiang Guo¹, Bing Qin¹, Wanxiang Che¹, Haifeng Wang², Ting Liu¹ - Show less +2 more•Institutions (2)

Harbin Institute of Technology¹, Baidu²

01 Jun 2014

TL;DR: This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings, which can be used to measure the semantic relationship between words.

...read moreread less

Abstract: Semantic hierarchy construction aims to build structures of concepts linked by hypernym‐hyponym (“is-a”) relations. A major challenge for this task is the automatic discovery of such relations. This paper proposes a novel and effective method for the construction of semantic hierarchies based on word embeddings, which can be used to measure the semantic relationship between words. We identify whether a candidate word pair has hypernym‐hyponym relation by using the word-embedding-based semantic projections between words and their hypernyms. Our result, an F-score of 73.74%, outperforms the state-of-theart methods on a manually labeled test dataset. Moreover, combining our method with a previous manually-built hierarchy extension method can further improve Fscore to 80.29%.

...read moreread less

287 citations

Proceedings Article•DOI•

Learning Image Embeddings using Convolutional Neural Networks for Improved Multi-Modal Semantics

[...]

Douwe Kiela¹, Léon Bottou•Institutions (1)

University of Cambridge¹

01 Oct 2014

TL;DR: This work constructs multi-modal concept representations by concatenating a skip-gram linguistic representation vector with a visual concept representation vector computed using the feature extraction layers of a deep convolutional neural network trained on a large labeled object recognition dataset.

...read moreread less

Abstract: We construct multi-modal concept representations by concatenating a skip-gram linguistic representation vector with a visual concept representation vector computed using the feature extraction layers of a deep convolutional neural network (CNN) trained on a large labeled object recognition dataset. This transfer learning approach brings a clear performance gain over features based on the traditional bag-of-visual-word approach. Experimental results are reported on the WordSim353 and MEN semantic relatedness evaluation tasks. We use visual features computed using either ImageNet or ESP Game images.

...read moreread less

239 citations

Journal Article•DOI•

Large-scale Semantic Parsing without Question-Answer Pairs

[...]

Siva Reddy¹, Mirella Lapata¹, Mark Steedman¹•Institutions (1)

University of Edinburgh¹

07 Oct 2014-Transactions of the Association for Computational Linguistics

TL;DR: This paper introduces a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs and converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision.

...read moreread less

Abstract: In this paper we introduce a novel semantic parsing approach to query Freebase in natural language without requiring manual annotations or question-answer pairs. Our key insight is to represent natural language via semantic graphs whose topology shares many commonalities with Freebase. Given this representation, we conceptualize semantic parsing as a graph matching problem. Our model converts sentences to semantic graphs using CCG and subsequently grounds them to Freebase guided by denotations as a form of weak supervision. Evaluation experiments on a subset of the F REE 917 and W EB Q UESTIONS benchmark datasets show our semantic parser improves over the state of the art.

...read moreread less

Proceedings Article•DOI•

Blanket execution: dynamic similarity testing for program binaries and components

[...]

Manuel Egele¹, Maverick Woo¹, Peter Chapman¹, David Brumley¹•Institutions (1)

Carnegie Mellon University¹

20 Aug 2014

TL;DR: This work proposes blanket execution, a novel dynamic equivalence testing primitive that achieves complete coverage by overriding the intended program logic under a controlled randomized environment, and builds a binary search engine that identifies similar functions across optimization boundaries.

...read moreread less

Abstract: Matching function binaries--the process of identifying similar functions among binary executables--is a challenge that underlies many security applications such as malware analysis and patch-based exploit generation. Recent work tries to establish semantic similarity based on static analysis methods. Unfortunately, these methods do not perform well if the compared binaries are produced by different compiler toolchains or optimization levels. In this work, we propose blanket execution, a novel dynamic equivalence testing primitive that achieves complete coverage by overriding the intended program logic. Blanket execution collects the side effects of functions during execution under a controlled randomized environment. Two functions are deemed similar, if their corresponding side effects, as observed under the same environment, are similar too. We implement our blanket execution technique in a system called BLEX. We evaluate BLEX rigorously against the state of the art binary comparison tool BinDiff. When comparing optimized and un-optimized executables from the popular GNU coreutils package, BLEX outperforms BinDiff by up to 3.5 times in correctly identifying similar functions. BLEX also outperforms BinDiff if the binaries have been compiled by different compilers. Using the functionality in BLEX, we have also built a binary search engine that identifies similar functions across optimization boundaries. Averaged over all indexed functions, our search engine ranks the correct matches among the top ten results 77% of the time.

...read moreread less

Proceedings Article•DOI•

Decorrelating Semantic Visual Attributes by Resisting the Urge to Share

[...]

Dinesh Jayaraman¹, Fei Sha², Kristen Grauman¹•Institutions (2)

University of Texas at Austin¹, University of Southern California²

23 Jun 2014

TL;DR: It is shown that accounting for structure in the visual attribute space is key to learning attribute models that preserve semantics, yielding improved generalizability that helps in the recognition and discovery of unseen object categories.

...read moreread less

Abstract: Existing methods to learn visual attributes are prone to learning the wrong thing -- namely, properties that are correlated with the attribute of interest among training samples. Yet, many proposed applications of attributes rely on being able to learn the correct semantic concept corresponding to each attribute. We propose to resolve such confusions by jointly learning decorrelated, discriminative attribute models. Leveraging side information about semantic relatedness, we develop a multi-task learning approach that uses structured sparsity to encourage feature competition among unrelated attributes and feature sharing among related attributes. On three challenging datasets, we show that accounting for structure in the visual attribute space is key to learning attribute models that preserve semantics, yielding improved generalizability that helps in the recognition and discovery of unseen object categories.

...read moreread less

Journal Article•DOI•

Semantic Link Network-Based Model for Organizing Multimedia Big Data

[...]

Chuanping Hu¹, Zheng Xu¹, Yunhuai Liu¹, Lin Mei¹, Lan Chen¹, Xiangfeng Luo² - Show less +2 more•Institutions (2)

Chinese Ministry of Public Security¹, Shanghai University²

10 Apr 2014-IEEE Transactions on Emerging Topics in Computing

TL;DR: A whole model for generating the association relation between multimedia resources using semantic link network model is proposed, which shows the proposed method can measure the semantic relatedness between Flickr images accurately and robustly.

...read moreread less

Abstract: Recent research shows that multimedia resources in the wild are growing at a staggering rate. The rapid increase number of multimedia resources has brought an urgent need to develop intelligent methods to organize and process them. In this paper, the semantic link network model is used for organizing multimedia resources. A whole model for generating the association relation between multimedia resources using semantic link network model is proposed. The definitions, modules, and mechanisms of the semantic link network are used in the proposed method. The integration between the semantic link network and multimedia resources provides a new prospect for organizing them with their semantics. The tags and the surrounding texts of multimedia resources are used to measure their semantic association. The hierarchical semantic of multimedia resources is defined by their annotated tags and surrounding texts. The semantics of tags and surrounding texts are different in the proposed framework. The modules of semantic link network model are implemented to measure association relations. A real data set including 100 thousand images with social tags from Flickr is used in our experiments. Two evaluation methods, including clustering and retrieval, are performed, which shows the proposed method can measure the semantic relatedness between Flickr images accurately and robustly.

...read moreread less

Proceedings Article•DOI•

A Systematic Study of Semantic Vector Space Model Parameters

[...]

Douwe Kiela¹, Stephen Clark¹•Institutions (1)

University of Cambridge¹

01 Apr 2014

TL;DR: A systematic study of parameters used in the construction of semantic vector space models finds some novel findings, including a similarity metric that outperforms the alternatives on all tasks considered.

...read moreread less

Abstract: We present a systematic study of parameters used in the construction of semantic vector space models. Evaluation is carried out on a variety of similarity tasks, including a compositionality dataset, using several source corpora. In addition to recommendations for optimal parameters, we present some novel findings, including a similarity metric that outperforms the alternatives on all tasks considered.

...read moreread less

Journal Article•DOI•

A framework for unifying ontology-based semantic similarity measures

[...]

Sébastien Harispe, David Sánchez¹, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain - Show less +1 more•Institutions (1)

Rovira i Virgili University¹

01 Apr 2014-Journal of Biomedical Informatics

TL;DR: This paper presents a unifying framework that aims to improve the understanding of semantic measures, to highlight their equivalences and to propose bridges between their theoretical bases, and unify a large number of state-of-the-art semantic similarity measures through common expressions.

...read moreread less

Proceedings Article•

Multilingual Distributed Representations without Word Alignment

[...]

Karl Moritz Hermann¹, Phil Blunsom¹•Institutions (1)

University of Oxford¹

01 Jan 2014

TL;DR: The authors proposed a method for learning distributed representations in a multilingual setup, which learns to assign similar embeddings to aligned sentences and dissimilar ones to sentences which are not aligned while not requiring word alignments.

...read moreread less

Abstract: Distributed representations of meaning are a natural way to encode covariance relationships between words and phrases in NLP. By overcoming data sparsity problems, as well as providing information about semantic relatedness which is not available in discrete representations, distributed representations have proven useful in many NLP tasks. Recent work has shown how compositional semantic representations can successfully be applied to a number of monolingual applications such as sentiment analysis. At the same time, there has been some initial success in work on learning shared word-level representations across languages. We combine these two approaches by proposing a method for learning distributed representations in a multilingual setup. Our model learns to assign similar embeddings to aligned sentences and dissimilar ones to sentence which are not aligned while not requiring word alignments. We show that our representations are semantically informative and apply them to a cross-lingual document classification task where we outperform the previous state of the art. Further, by employing parallel corpora of multiple language pairs we find that our model learns representations that capture semantic relationships across languages for which no parallel data was used.

...read moreread less

Proceedings Article•DOI•

Discourse Complements Lexical Semantics for Non-factoid Answer Reranking

[...]

Peter Jansen, Mihai Surdeanu, Peter Clark¹•Institutions (1)

Allen Institute for Artificial Intelligence¹

01 Jun 2014

TL;DR: A robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information is proposed, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory.

...read moreread less

Abstract: We propose a robust answer reranking model for non-factoid questions that integrates lexical semantics with discourse information, driven by two representations of discourse: a shallow representation centered around discourse markers, and a deep one based on Rhetorical Structure Theory. We evaluate the proposed model on two corpora from different genres and domains: one from Yahoo! Answers and one from the biology domain, and two types of non-factoid questions: manner and reason. We experimentally demonstrate that the discourse structure of nonfactoid answers provides information that is complementary to lexical semantic similarity between question and answer, improving performance up to 24% (relative) over a state-of-the-art model that exploits lexical semantic similarity alone. We further demonstrate excellent domain transfer of discourse information, suggesting these discourse features have general utility to non-factoid question answering.

...read moreread less

Proceedings Article•DOI•

Knowledge-based graph document modeling

[...]

Michael Schuhmacher¹, Simone Paolo Ponzetto¹•Institutions (1)

University of Mannheim¹

24 Feb 2014

TL;DR: This work proposes a graph-based semantic model for representing document content that combines DBpedia's structure with an information-theoretic measure of concept association, based on its explicit semantic relations, and achieves a performance close to that of highly specialized methods that have been tuned to these specific tasks.

...read moreread less

Abstract: We propose a graph-based semantic model for representing document content. Our method relies on the use of a semantic network, namely the DBpedia knowledge base, for acquiring fine-grained information about entities and their semantic relations, thus resulting in a knowledge-rich document model. We demonstrate the benefits of these semantic representations in two tasks: entity ranking and computing document semantic similarity. To this end, we couple DBpedia's structure with an information-theoretic measure of concept association, based on its explicit semantic relations, and compute semantic similarity using a Graph Edit Distance based measure, which finds the optimal matching between the documents' entities using the Hungarian method. Experimental results show that our general model outperforms baselines built on top of traditional methods, and achieves a performance close to that of highly specialized methods that have been tuned to these specific tasks.

...read moreread less

Proceedings Article•DOI•

Medical Semantic Similarity with a Neural Language Model

[...]

Lance De Vine¹, Guido Zuccon¹, Bevan Koopman¹, Laurianne Sitbon¹, Peter Bruza¹ - Show less +1 more•Institutions (1)

Queensland University of Technology¹

03 Nov 2014

TL;DR: This article explored a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in freetext, for the task of measuring semantic similarity between medical concepts.

...read moreread less

Abstract: Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors (≈0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

...read moreread less

Proceedings Article•

On dataless hierarchical text classification

[...]

Yangqiu Song¹, Dan Roth¹•Institutions (1)

University of Illinois at Urbana–Champaign¹

27 Jul 2014

TL;DR: The results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples and how to improve the classification using boot-strapping.

...read moreread less

Abstract: In this paper, we systematically study the problem of dataless hierarchical text classification. Unlike standard text classification schemes that rely on supervised training, dataless classification depends on understanding the labels of the sought after categories and requires no labeled data. Given a collection of text documents and a set of labels, we show that understanding the labels can be used to accurately categorize the documents. This is done by embedding both labels and documents in a semantic space that allows one to compute meaningful semantic similarity between a document and a potential label. We show that this scheme can be used to support accurate multiclass classification without any supervision. We study several semantic representations and show how to improve the classification using boot-strapping. Our results show that bootstrapped dataless classification is competitive with supervised classification with thousands of labeled examples.

...read moreread less

Journal Article•DOI•

The semantic measures library and toolkit: fast computation of semantic similarity and relatedness using biomedical ontologies

[...]

Sébastien Harispe, Sylvie Ranwez, Stefan Janaqi, Jacky Montmain

01 Mar 2014-Bioinformatics

TL;DR: The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures that can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies.

...read moreread less

Abstract: Summary: The semantic measures library and toolkit are robust open-source and easy to use software solutions dedicated to semantic measures. They can be used for large-scale computations and analyses of semantic similarities between terms/concepts defined in terminologies and ontologies. The comparison of entities (e.g. genes) annotated by concepts is also supported. A large collection of measures is available. Not limited to a specific application context, the library and the toolkit can be used with various controlled vocabularies and ontology specifications (e.g. Open Biomedical Ontology, Resource Description Framework). The project targets both designers and practitioners of semantic measures providing a JAVA library, as well as a command-line tool that can be used on personal computers or computer clusters. Availability and implementation: Downloads, documentation, tutorials, evaluation and support are available at http://www.semantic-measures-library.org.

...read moreread less

Journal Article•DOI•

CONSTAnT – A Conceptual Data Model for Semantic Trajectories of Moving Objects

[...]

Vania Bogorny¹, Chiara Renso, Artur Ribeiro de Aquino¹, Fernando de Lucca Siqueira¹, Luis Otavio Alvares¹ - Show less +1 more•Institutions (1)

Universidade Federal de Santa Catarina¹

01 Feb 2014-Transactions in Gis

TL;DR: A semantic trajectory conceptual data model named CONSTAnT is presented, which defines the most important aspects of semantic trajectories and believes that this model will be the foundation for the design of semantic trajectory databases, where several aspects that make a trajectory “semantic” are taken into account.

...read moreread less

Abstract: Several works have been proposed in the last few years for raw trajectory data analysis, and some attempts have been made to define trajectories from a more semantic point of view. Semantic trajectory data analysis has received significant attention recently, but the formal definition of semantic trajectory, the set of aspects that should be considered to semantically enrich trajectories and a conceptual data model integrating these aspects from a broad sense is still missing. This article presents a semantic trajectory conceptual data model named CONSTAnT, which defines the most important aspects of semantic trajectories. We believe that this model will be the foundation for the design of semantic trajectory databases, where several aspects that make a trajectory “semantic” are taken into account. The proposed model includes the concepts of semantic subtrajectory, semantic points, geographical places, events, goals, environment and behavior, to create a general concept of semantic trajectory. The proposed model is the result of several years of work by the authors in an effort to add more semantics to raw trajectory data for real applications. Two application examples and different queries show the flexibility of the model for different domains.

...read moreread less

Proceedings Article•DOI•

Bilingually-constrained Phrase Embeddings for Machine Translation

[...]

Jiajun Zhang¹, Shujie Liu¹, Mu Li², Ming Zhou², Chengqing Zong² - Show less +1 more•Institutions (2)

Chinese Academy of Sciences¹, Microsoft²

01 Jun 2014

TL;DR: This work proposes Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings.

...read moreread less

Abstract: We propose Bilingually-constrained Recursive Auto-encoders (BRAE) to learn semantic phrase embeddings (compact vector representations for phrases), which can distinguish the phrases with different semantic meanings. The BRAE is trained in a way that minimizes the semantic distance of translation equivalents and maximizes the semantic distance of nontranslation pairs simultaneously. After training, the model learns how to embed each phrase semantically in two languages and also learns how to transform semantic embedding space in one language to the other. We evaluate our proposed method on two end-to-end SMT tasks (phrase table pruning and decoding with phrasal semantic similarities) which need to measure semantic similarity between a source phrase and its translation candidates. Extensive experiments show that the BRAE is remarkably effective in these two tasks.

...read moreread less

Journal Article•DOI•

SemFunSim: a new method for measuring disease similarity by integrating semantic and gene functional association.

[...]

Liang Cheng¹, Jie Li¹, Peng Ju², Jiajie Peng¹, Yadong Wang¹ - Show less +1 more•Institutions (2)

Harbin Institute of Technology¹, Nanyang Technological University²

16 Jun 2014-PLOS ONE

TL;DR: The high average AUC (area under the receiver operating characteristic curve) shows that SemFunSim is an effective method for drug repositioning, and when using the method on diseases without annotated compounds in CTD, it could confirm many of the predicted candidate compounds from literature.

...read moreread less

Abstract: Background Measuring similarity between diseases plays an important role in disease-related molecular function research. Functional associations between disease-related genes and semantic associations between diseases are often used to identify pairs of similar diseases from different perspectives. Currently, it is still a challenge to exploit both of them to calculate disease similarity. Therefore, a new method (SemFunSim) that integrates semantic and functional association is proposed to address the issue. Methods SemFunSim is designed as follows. First of all, FunSim (Functional similarity) is proposed to calculate disease similarity using disease-related gene sets in a weighted network of human gene function. Next, SemSim (Semantic Similarity) is devised to calculate disease similarity using the relationship between two diseases from Disease Ontology. Finally, FunSim and SemSim are integrated to measure disease similarity. Results The high average AUC (area under the receiver operating characteristic curve) (96.37%) shows that SemFunSim achieves a high true positive rate and a low false positive rate. 79 of the top 100 pairs of similar diseases identified by SemFunSim are annotated in the Comparative Toxicogenomics Database (CTD) as being targeted by the same therapeutic compounds, while other methods we compared could identify 35 or less such pairs among the top 100. Moreover, when using our method on diseases without annotated compounds in CTD, we could confirm many of our predicted candidate compounds from literature. This indicates that SemFunSim is an effective method for drug repositioning.

...read moreread less

Journal Article•

Medical semantic similarity with a neural language model

[...]

Lance De Vine, Guido Zuccon, Bevan Koopman, Laurianne Sitbon, Peter Bruza - Show less +1 more

01 Jan 2014-Science & Engineering Faculty

TL;DR: The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

...read moreread less

Abstract: Advances in neural network language models have demonstrated that these models can effectively learn representations of words meaning. In this paper, we explore a variation of neural language models that can learn on concepts taken from structured ontologies and extracted from free-text, rather than directly from terms in free-text. This model is employed for the task of measuring semantic similarity between medical concepts, a task that is central to a number of techniques in medical informatics and information retrieval. The model is built with two medical corpora (journal abstracts and patient records) and empirically validated on two ground-truth datasets of human-judged concept pairs assessed by medical professionals. Empirically, our approach correlates closely with expert human assessors ($\approx$ 0.9) and outperforms a number of state-of-the-art benchmarks for medical semantic similarity. The demonstrated superiority of this model for providing an effective semantic similarity measure is promising in that this may translate into effectiveness gains for techniques in medical information retrieval and medical informatics (e.g., query expansion and literature-based discovery).

...read moreread less

Proceedings Article•DOI•

ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment

[...]

Jiang Zhao¹, Tiantian Zhu¹, Man Lan¹•Institutions (1)

East China Normal University¹

01 Aug 2014

TL;DR: This paper extracted seven types of features including text difference measures proposed in entailment judgement subtask, as well as common text similarity measures used in both subtasks to solve the both subtasking by considering them as a regression and a classification task respectively.

...read moreread less

Abstract: This paper presents our approach to semantic relatedness and textual entailment subtasks organized as task 1 in SemEval 2014. Specifically, we address two questions: (1) Can we solve these two subtasks together? (2) Are features proposed for textual entailment task still effective for semantic relatedness task? To address them, we extracted seven types of features including text difference measures proposed in entailment judgement subtask, as well as common text similarity measures used in both subtasks. Then we exploited the same feature set to solve the both subtasks by considering them as a regression and a classification task respectively and performed a study of influence of different features. We achieved the first and the second rank for relatedness and entailment task respectively.

...read moreread less

Journal Article•DOI•

Semantic preserving distance metric learning and applications

[...]

Jun Yu¹, Dapeng Tao², Jonathan Li³, Jun Cheng⁴•Institutions (4)

Hangzhou Dianzi University¹, South China University of Technology², Xiamen University³, Chinese Academy of Sciences⁴

01 Oct 2014-Information Sciences

TL;DR: A Semantic Preserving Distance Metric Learning (SP-DML) algorithm is developed to explore the complementary characteristics of the visual features and pairwise constraints in a unified feature space to integrate semantic contents in distance metric learning.

...read moreread less

Proceedings Article•DOI•

Distributed Representations of Geographically Situated Language

[...]

David Bamman¹, Chris Dyer¹, Noah A. Smith¹•Institutions (1)

Carnegie Mellon University¹

01 Jun 2014

TL;DR: In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, the joint model outperforms comparable independent models that learn meaning in isolation.

...read moreread less

Abstract: We introduce a model for incorporating contextual information (such as geography) in learning vector-space representations of situated language. In contrast to approaches to multimodal representation learning that have used properties of the object being described (such as its color), our model includes information about the subject (i.e., the speaker), allowing us to learn the contours of a word’s meaning that are shaped by the context in which it is uttered. In a quantitative evaluation on the task of judging geographically informed semantic similarity between representations learned from 1.1 billion words of geo-located tweets, our joint model outperforms comparable independent models that learn meaning in isolation.

...read moreread less

Journal Article•DOI•

Simulating the N400 ERP component as semantic network error: Insights from a feature-based connectionist attractor model of word meaning

[...]

Milena Rabovsky¹, Ken McRae²•Institutions (2)

Humboldt University of Berlin¹, University of Western Ontario²

01 Jul 2014-Cognition

TL;DR: This paper explored the mechanisms underlying the N400 by examining how a connectionist model's performance measures covary with N400 amplitudes and found that network error was consistently in the same direction as N400 amplitude, namely larger for low frequency words, larger for words with many features, and smaller for semantically related target words as well as repeated words.

...read moreread less

Collapse