scispace - formally typeset
Search or ask a question

Showing papers on "Semantic similarity published in 1995"


Posted Content
TL;DR: In this article, a new measure of semantic similarity in an IS-A taxonomy based on the notion of information content is presented, and experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r < 0.90 for human subjects performing the same task).
Abstract: This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).

3,533 citations


Proceedings Article
20 Aug 1995
TL;DR: This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content, which performs encouragingly well and is significantly better than the traditional edge counting approach.
Abstract: This paper presents a new measure of semantic similarity in an IS-A taxonomy, based on the notion of information content. Experimental evaluation suggests that the measure performs encouragingly well (a correlation of r = 0.79 with a benchmark set of human similarity judgments, with an upper bound of r = 0.90 for human subjects performing the same task), and significantly better than the traditional edge counting approach (r = 0.66).

2,253 citations


Journal ArticleDOI
TL;DR: Using data from corpora of up to 120 million words, it is shown that the lemma CAUSE occurs in predominantly "unpleasant" collocations, such as cause of the trouble and cause of death.
Abstract: Current work on lexical collocations uses two ideas: (i) words have distinctive semantic profiles or "prosodies"; and (ii) the strength of association between words can be measured in quantitative terms. These ideas can be combined to provide comparative semantic profiles of words, which show the frequent and characteristic collocates of node words, and make explicit the semantic relations between the collocates.Using data from corpora of up to 120 million words, it is shown that the lemma CAUSE occurs in predominantly "unpleasant" collocations, such as cause of the trouble and cause of death. A case study of this lemma is used to illustrate quantitative methods for investigating collocations. Various methods proposed in the literature are of great practical value in establishing collocational sets, but their theoretical basis is less clear. Brief comparative semantic profiles are given for related lemmas, e.g. REASON and CONSEQUENCE. Implications for the relation between system and use are discussed.

472 citations


Journal ArticleDOI
TL;DR: It is proposed that semantic representations of words have the largest impact on translating orthography to phonology when this translation process is slow or noisy (i.e., for low-frequency exceptions) and that words with rich semantic representations are most likely to benefit from this interaction.
Abstract: Three experiments demonstrated that, for lower frequency words, reading aloud is affected not only by spelling-sound typicality but also by a semantic variable, imageability Participants were slower and more error prone when naming exception words with abstract meanings (eg, scarce) than when naming either abstract regular words (eg, scribe) or imageable exception words (eg, soot) It is proposed that semantic representations of words have the largest impact on translating orthography to phonology when this translation process is slow or noisy (ie, for low-frequency exceptions) and that words with rich semantic representations (ie, high-imageability words) are most likely to benefit from this interaction

437 citations


Journal ArticleDOI
TL;DR: In this article, the types of semantic information that are automatically retrieved from the mental lexicon on hearing a word were investigated in three semantic priming experiments, and the authors found significant priming for category and functionally related targets, both with and without an additional associative relation.
Abstract: The types of semantic information that are automatically retrieved from the mental lexicon on hearing a word were investigated in 3 semantic priming experiments. The authors probed for activation of information about a word's category membership by using prime-target pairs that were members of a common semantic category (e.g., pig-horse) and 2 types of functional semantic properties: instrument relations (e.g., broom-floor) and script relations (e.g., restaurant-wine). The authors crossed type of semantic relation between prime and target with degree of normative association strength. In a paired and a single-word presentation version of an auditory lexicaldecision priming task, the authors found significant priming for category and functionally related targets, both with and without an additional associative relation. In all cases there was a significant associative boost. However, in a visual version of the single-word lexical-decision paradigm, a different pattern of results was found for each type of semantic relation. Category coordinates primed only when they were normatively associated, instrument relations primed both with and without association, and script relations primed in neither condition.

303 citations


Journal ArticleDOI
01 Sep 1995-Memory
TL;DR: A computational model of the processes involved in retrieving stored semantic and name information from objects, using a simple interactive activation and competition architecture, finds evidence showing a cross-over in normal reaction times to make semantic classification and identification responses to objects from categories with either structurally similar or structurally dissimilar exemplars.
Abstract: We present a computational model of the processes involved in retrieving stored semantic and name information from objects, using a simple interactive activation and competition architecture. We simulate evidence showing a cross-over in normal reaction times to make semantic classification and identification responses to objects from categories with either structurally similar or structurally dissimilar exemplars, and that identification times to objects from these two different classes correlate differentially with measures of the structural similarity of objects within the category and the frequency of the object's name. Structural similarity exerts a negative effect on object decision as well as naming, though this effect is larger on naming. Also, on naming, structural similarity interacts with the effects of name frequency, captured in the model by varying the weight on connections from semantic to name units; frequency effects are larger with structurally dissimilar items. In addition, (1) the range of potential errors for objects from these two classes, when responses are elicited before activation reached a stable state, differ--a wider range of errors occur to objects from categories with structurally similar exemplars; and (2) simulated lesions to different locations within the model produce selective impairments to identification but not to semantic classification responses to objects from categories with structurally similar exemplars. We discuss the results in relation to data on visual object processing in both normality and pathology.

122 citations


Proceedings Article
20 Aug 1995
TL;DR: An algorithm is presented for automatic word sense disambiguation based on lexical knowledge contained in WordNet and on the results of surface-syntactic analysis that results indicate that even on a relatively small text the proposed method produces correct noun meaning more than 72% of the time.
Abstract: We present an algorithm for automatic word sense disambiguation based on lexical knowledge contained in WordNet and on the results of surface-syntactic analysis The algorithm is part of a system that analyzes texts in order to acquire knowledge in the presence of as little pre-coded semantic knowledge as possible On the other hand, we want to make the besl use of public-domain information sources such as WordNet Rather than depend on large amounts of hand-crafted knowledge or statistical data from large corpora, we use syntactic information and information in WordNet and minimize the need for other knowledge sources in the word sense disambiguation process We propose to guide disambiguation by semantic similarity between words and heuristic rules based on this similarity The algorithm has been applied to the Canadian Income Tax Guide Test results indicate that even on a relatively small text the proposed method produces correct noun meaning more than 72% of the time.

117 citations



Journal ArticleDOI
01 Feb 1995
TL;DR: A classification of semantic conflicts which can be used as the basis for the incremental discovery and resolution of these conflicts and provides a systematic representation of alternative semantic interpretations of conflicts during the reconciliation process.
Abstract: Increasingly companies are doing business in an environment replete with heterogeneous information systems which must cooperate. Cooperation between these systems presupposes the resolution of the semantic conflicts that are bound to occur. In this article, we propose a classification of semantic conflicts which can be used as the basis for the incremental discovery and resolution of these conflicts. We classify conflicts along the two dimensions of naming and abstraction which, taken together, capture the semantic mapping of the conflict. We add a third dimension, level of heterogeneity to assist in the schematic mapping between two databases. The classification provides a systematic representation of alternative semantic interpretations of conflicts during the reconciliation process. As a result, the design of query‐directed dynamic reconciliation systems is possible. The classification is shown to be sound and minimal. Completeness is discussed.

91 citations


Journal ArticleDOI
TL;DR: It is demonstrated that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning, and some of the problems and tradeoffs for a method which has just one content ontology are explored.
Abstract: This paper defends the choice of a linguistically-based content ontology for natural language processing and demonstrates that a single common-sense ontology produces plausible interpretations at all levels from parsing through reasoning. The paper explores some of the problems and tradeoffs for a method which has just one content ontology. A linguistically-based content ontology represents the "world view" encoded in natural language. The content ontology (as opposed to the formal semantic ontology which distinguishes events from propositions, and so on) is best grounded in the culture, rather than in the world itself, or in the mind. By "world view" we mean naive assumptions about "what there is" in the world, and how it should be classified. These assumptions are time-worn and reflected in language at several levels: morphology, syntax and lexical semantics. The content ontology presented in the paper is part of a Naive Semantic lexicon, Naive Semantics is a lexical theory in which associated with each word sense is a naive theory (or set of beliefs) about the objects or events of reference. While naive semantic representations are not combinations of a closed set of primitives, they are also limited by a shallowness assumption. Included is just the information required to form a semantic interpretation incrementally, not all of the information known about objects. The Naive Semantic ontology is based upon a particular language, its syntax and its word senses. To the extent that other languages codify similar world views, we predict that their ontologies are similar. Applied in a computational natural language understanding system, this linguistically-motivated ontology (along with other native semantic information) is sufficient to disambiguate words, disambiguate syntactic structure, disambiguate formal semantic representations, resolve anaphoric expressions and perform reasoning tasks with text.

73 citations


Journal ArticleDOI
01 Sep 1995-Memory
TL;DR: It is argued that the similarity of semantic errors in object naming, resulting from damage to different components of the naming processes by which they are activated by visual input, as well as the processes byWhich they activate output representations, are similar.
Abstract: We present evidence that semantic errors in object naming can arise not only from impairment to the semantic system but also from damage to input and output processes. Although each of these levels of disruption can result in similar types of semantic errors in object naming, they have different types of consequences for performance on other lexical tasks, such as comprehension and naming to definition. We show that the analysis of the co-occurrence of semantic errors in naming with different patterns of performance in other lexical processing tasks can be used to localise the source of semantic errors in the naming process. Finally, we argue that the similarity of semantic errors in object naming, resulting from damage to different components of the naming process, reflects the compositional nature of lexical semantic representations, and the processes by which they are activated by visual input, as well as the processes by which they activate output representations.

Journal ArticleDOI
01 Apr 1995-Brain
TL;DR: It is concluded that category-specific language organization can emerge from the inherent nature of semantic features themselves, and does not require special internal categorical organization of semantic memory.
Abstract: Category-specific language impairments have been postulated to require the existence of an explicit category organization within semantic memory. However, it may be possible to demonstrate analytically that this is not necessary. We hypothesize that category-specific organization can emerge from perceptual, functional, and associative feature information about objects that is maintained in order to process language. In this paper, we conduct several experiments to test the computational validity of this hypothesis. Physical objects were encoded in terms of semantic features, based on basic perceptual and motor modalities and higher level knowledge of function, for use in artificial neural networks. Mathematical methods were used to analyse the encodings and the neural networks. The results demonstrate the emergence of semantic categories in the networks. although such information was not preprogrammed. We conclude that category-specific language organization can emerge from the inherent nature of semantic features themselves, and does not require special internal categorical organization of semantic memory.

Journal ArticleDOI
TL;DR: The results show that both interference effects increase with the size of the target set, and the use of a relatively small number of target pictures may account for remarkably small, or even nonsignificant, picture-word interference effects in a number of previous studies.
Abstract: In the picture-word interference task the naming of a picture is hampered by the presence of a distractor word that is to be ignored. Two main components of this interference effect can be distinguished: an interference effect induced by an unrelated distractor word in comparison with a nonword control, and an additional interference effect that is due to a semantic similarity between target and distractor (calledsemantic interference). We examine whether the size of these two interference effects is affected by the number of different target pictures in an experiment. The results show that both interference effects increase with the size of the target set. This finding has two implications. First, at an empirical level, the use of a relatively small number of target pictures may account for remarkably small, or even nonsignificant, picture-word interference effects in a number of previous studies. Second, at a theoretical level, the present finding is in accordance with a name-retrieval account of picture-word interference.

Journal ArticleDOI
TL;DR: Given the significantly greater number of errors committed by PAD patients, it is concluded that their disruption in semantic processing occurs at some point between the elicitation of N400 and the generation of the reaction time response.

Proceedings ArticleDOI
27 Nov 1995
TL;DR: This paper compares the application of two prominent self-organizing neural networks to the same problem domain, namely the organization of software libraries, and shows that both models successfully arrange software components according to their semantic similarity.
Abstract: This paper is concerned with a case study in content-based classification of textual documents. In particular we compare the application of two prominent self-organizing neural networks to the same problem domain, namely the organization of software libraries. The two models are adaptive resonance theory and self-organizing maps. As a result we are able to show that both models successfully arrange software components according to their semantic similarity.

01 Jan 1995
TL;DR: A fundamental framework for realizing semantic interoperability at the level of semantic relationships between data items in a multi database environment is proposed and a metadatabase system which extracts the significant information from different databases is presented.
Abstract: In multidatabase research, the realization of semantic interoperability is the most im­ portant issue for resolving semantic heterogeneity between different databases. In this paper, we propose a fundamental framework for realizing semantic interoperability at the level of semantic relationships between data items in a multi database environment. We present a metadatabase system which extracts the significant information from dif­ ferent databases. The metadatabase system uses a mathematical model of meaning to dynamically recognize the semantic equivalence, similarity, and difference between data items. The essential feature of this model is that the specific meaning of a data item is dynamically fixed and unambiguously recognized according to the context by semantic interpretation mechanisms. © 1995 John Wiley & Sons, Inc.

Journal ArticleDOI
01 Dec 1995
TL;DR: This paper proposes an approach to their detection using meta-modeling and similarity analysis, which accommodates different models for representing specifications, and analysis scales up to manage large, complex specification because the complexity of similarity analysis is polynomial.
Abstract: Requirements analysis usually results in a set of different specifications for the same system, which must be integrated. Integration involves the detection and elimination of discrepancies between them. Discrepancies may be due to differences in representation models, modeling perspectives or practices. As instances of the semantic heterogeneity problem (Gangopadhyay and Barsalou, 1991), discrepancies are broader than logical inconsistencies, and therefore not always detectable using theorem proving. This paper proposes an approach to their detection using meta-modeling and similarity analysis. Specification components are classified under a meta-model of domain independent semantic modeling abstractions and thereby compared according to a newly developed model of similarity. Similarity analysis results in an isomorphic mapping between them, which can be used as a basis for reconciling and merging them. The approach is extensible in the sense that it accommodates different models for representing specifications, and analysis scales up to manage large, complex specification because the complexity of similarity analysis is polynomial.

Book ChapterDOI
14 Aug 1995
TL;DR: This paper describes a technique for translating the semantic information encoded in a conceptual graph into an English language sentence and shows clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation.
Abstract: This paper describes a technique for translating the semantic information encoded in a conceptual graph into an English language sentence. The use of a non-hierarchically structured semantic representation (conceptual graphs) allows us to investigate a more general version of the sentence generation problem where one is not pre-committed to a choice of the syntactically prominent elements in the initial semantics. We show clearly how the semantic structure is declaratively related to linguistically motivated syntactic representation. Our technique provides flexibility to address cases where the entire input cannot be precisely expressed in a single sentence.

Book ChapterDOI
01 Jan 1995
TL;DR: This paper describes an information retrieval system—GUIDANCE1—that is accessible and usable by people who are not experts in computing but are experts in their own domain and has a system of semantic sanctions to control the creation of implied concepts.
Abstract: This paper describes an information retrieval system—GUIDANCE1—that is accessible and usable by people who are not experts in computing but are experts in their own domain. This particular user group needs to be supported by a system that is easy to use and reflects their own knowledge of the world. The system presented is based on descriptions — the logical structure of the database is concealed in favour of an interface which supports the question ‘What can I say about People?’ regardless of how many objects, roles, or attributes represent ‘People’. A full and relevant description is implemented by two models, one containing conceptual knowledge, and the other database specific information. These models are represented in, and related by, a descriptive subsumption-based classification formalism GRAIL2, which has a system of semantic sanctions to control the creation of implied concepts and a mechanism for ensuring their uniqueness.

Journal ArticleDOI
TL;DR: The behavioral analyses of response time and accuracy were sensitive to changes in word length, but inconclusive for semantic relatedness, and repeated-measures ANOVAs of peak amplitude, latency, and mean area amplitude of the N400 ERP revealed significant effects of word length and semantic priming.

Proceedings Article
20 Aug 1995
TL;DR: This work presents a similarity-based learning method from databases in the context of rough set theory that can analyse the attribute in the databases by using roughSet theory and identify the relevant attributes to the task attributes.
Abstract: Many data mining algorithms developed recently are based on inductive learning methods. Very few are based on similarity-based learning. However, similarity-based learning accrues advantages, such as simple representations for concept descriptions, low incremental learning costs, small storage requirements, etc. We present a similarity-based learning method from databases in the context of rough set theory. Unlike the previous similarity-based learning methods, which only consider the syntactic distance between instances and treat all attributes equally important in the similarity measure, our method can analyse the attribute in the databases by using rough set theory and identify the relevant attributes to the task attributes. We also eliminate superfluous attributes for the task attribute and assign a weight to the relevant attributes according to their significance to the task attributes. Our similarity measure takes into account the semantic information embedded in the databases.

Proceedings ArticleDOI
20 Feb 1995
TL;DR: The theoretical aspects of the architecture of the whole experimental AI system built in order to provide effective online assistance to users of new technological systems: the understanding of "how it works" and "how to complete tasks" from queries in natural languages are described.
Abstract: We describe the theoretical aspects of the architecture of the whole experimental AI system we built in order to provide effective online assistance to users of new technological systems: the understanding of "how it works" and "how to complete tasks" from queries in natural languages. In our model, procedural semantic networks are used to describe the knowledge of an "ideal" expert while fuzzy sets are used both to describe the approximative and uncertain knowledge of novice users in fuzzy semantic networks which intervene to match fuzzy labels of a query with categories from our "ideal" expert. >

Proceedings ArticleDOI
TL;DR: The notion of spatial similarity for 2D symbolic images is formalized and a framework for characterizing the robustness of spatial similarities algorithms with respect to their ability to deal with translation, scale, rotation (both perfect and multiple) variants as well as the variants obtained by an arbitrary composition of translation, Scale, and rotation is provided.
Abstract: Similarity based retrieval of images is an important task in multimedia applications. A major class of users queries require retrieving those images in the database that are spatially similar to the query image. To process these queries, a spatial similarity function is desired. A spatial similarity function assesses the degree to which the spatial relationships in a database image conform to those specified in the query image. In this paper, we formalize the notion of spatial similarity for 2D symbolic images and provide a framework for characterizing the robustness of spatial similarity algorithms with respect to their ability to deal with translation, scale, rotation (both perfect and multiple) variants as well as the variants obtained by an arbitrary composition of translation, scale, and rotation. This characterization in turn is useful for comparing various algorithms for spatial similarity systematically. As an example, a few spatial similarity algorithms are characterized and then experimentally contrasted using a testbed of images.© (1995) COPYRIGHT SPIE--The International Society for Optical Engineering. Downloading of the abstract is permitted for personal use only.

01 Sep 1995
TL;DR: This document discusses the techniques that are used to extract semantic features from a domain with limited human intervention, and the combination of distributional and taxonomic techniques to obtain a set of semantic classes for a given domain.
Abstract: Natural Language Processing (NLP) and message understanding systems often use semantic information in order to perform lexical and syntactic disambiguation and to assist them in "understanding" the text. Such information is domain-specific in nature and hence difficult to acquire in an automatic manner. This causes a problem whenever an NLP system is moved from one domain to another. Portability of an NLP system can be improved if these semantic features can be acquired with limited human intervention. The semantic information needed by an NLP system may take several different forms. This dissertation focuses on two such semantic features--semantic classes present in a given domain, and lexico-semantic patterns that exist between content words in the domain. This document discusses the techniques that are used to extract these semantic features from a domain with limited human intervention. Semantic classes are discovered by clustering different objects on the basis of the lexico-syntactic environments in which they appear in the corpus. The results of some experiments with augmenting the noun semantic classes with class information obtained from WordNet are presented. A methodology for formally evaluating the semantic classes extracted by the system against classes provided by experts is also presented. Once semantic classes have been obtained, they are then used to generate lexico-semantic patterns that are prevalent in the given domain. A noteworthy feature of this research is that the techniques used to acquire the semantic features require very limited human intervention. The combination of distributional and taxonomic techniques to obtain a set of semantic classes for a given domain has also been found to be useful.

Posted Content
Mark Dras1
TL;DR: This paper develops a computationally tractable definition of semantic weight, concentrating on what it means for a word to be semantically light; the definition involves looking at the frequency of a word in particular syntactic constructions which are indicative of lightness.
Abstract: Current definitions of notions of lexical density and semantic weight are based on the division of words into closed and open classes, and on intuition. This paper develops a computationally tractable definition of semantic weight, concentrating on what it means for a word to be semantically light; the definition involves looking at the frequency of a word in particular syntactic constructions which are indicative of lightness. Verbs such as "make" and "take", when they function as support verbs, are often considered to be semantically light. To test our definition, we carried out an experiment based on that of Grefenstette and Teufel (1995), where we automatically identify light instances of these words in a corpus; this was done by incorporating our frequency-related definition of semantic weight into a statistical approach similar to that of Grefenstette and Teufel. The results show that this is a plausible definition of semantic lightness for verbs, which can possibly be extended to defining semantic lightness for other classes of words.

Journal ArticleDOI
TL;DR: This paper reports two experiments that investigated the semantic distance model (SDM) of relevance assessment and investigated whether the poor performance of the help‐menu terms was an experimental design artifact reflecting the comparison of terse help terms with verbose classification headings.
Abstract: This paper reports two experiments that investigated the semantic distance model (SDM) of relevance assessment. In the first experiment graduate students of mathematics and economics assessed the relevance relationships between bibliographic records and hierarchies of terms composed of classification headings or help‐menu terms. The relevance assessments of the classification headings, but not the help‐menu terms, exhibited both a semantic distance effect and a semantic direction effect as predicted by the sdm. Topical subject expertise enhanced both these effects. The second experiment investigated whether the poor performance of the help‐menu terms was an experimental design artifact reflecting the comparison of terse help terms with verbose classification headings. In the second experiment the help‐menu terms were compared to a hierarchy of single‐word terms where they exhibited both a semantic distance and semantic direction effect.

01 Jan 1995
TL;DR: An heuristic method based on conceptual distance that uses information from an external wide-coverage semantic taxonomy (WordNet) to overcome the problem in an automatic way or to provide the user with complementary information in order to make his/her choice easier.
Abstract: TGE (Tlink Generator Environment) is a system for semi-automatically extracting translation links. The system was developed within the ACQUILEX II project as a tool for supporting the construction of a multi-lingual lexical knowledge base containing detailed syntactic and semantic information from MRD resources. A drawback of the original system was the need of human intervention for selecting the more appropriate translation links in the case where more than one were extracted and proposed by the system. This paper deals with the task of overcoming this drawback. What is presented is an heuristic method based on conceptual distance that uses information from an external wide-coverage semantic taxonomy (WordNet). Our aim is to overcome the problem in an automatic way or to provide the user with complementary information in order to make his/her choice easier.

Book ChapterDOI
01 Jan 1995
TL;DR: A concise overview of some other methods aiming at comparing fuzzy sets and a new approach based on the concept of a semantic distance, using areas instead of extreme values or values at the point of intersection, will be illustrated.
Abstract: In Chapter 5 it has been shown that an important problem in traditional discrete multicriteria methods in a fuzzy environment is the comparison of fuzzy sets. Here, first a concise overview of some other methods aiming at comparing fuzzy sets will be presented, and then a new approach based on the concept of a semantic distance, using areas instead of extreme values or values at the point of intersection, will be illustrated.

Book ChapterDOI
Michael Wolverton1
23 Oct 1995
TL;DR: This paper reports on a comparison of retrieval by marker passing or spreading activation in a semantic network with Knowledge-Directed Spreading Activation, a method developed to be well-suited for retrieving semantically distant analogues from a large knowledge base.
Abstract: If analogy and case-based reasoning systems are to scale up to very large case bases, it is important to analyze the various methods used for retrieving analogues to identify the features of the problem for which they are appropriate. This paper reports on one such analysis, a comparison of retrieval by marker passing or spreading activation in a semantic network with Knowledge-Directed Spreading Activation, a method developed to be well-suited for retrieving semantically distant analogues from a large knowledge base. The analysis has two complementary components: (1) a theoretical model of the retrieval time based on a number of problem characteristics, and (2) experiments showing how the retrieval time of the approaches varies with the knowledge base size. These two components, taken together, suggest that KDSA is more likely than SA to be able to scale up to retrieval in large knowledge bases.