scispace - formally typeset
Search or ask a question

Showing papers on "Semantic similarity published in 1996"


Journal ArticleDOI
TL;DR: A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word, which provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).
Abstract: A procedure that processes a corpus of text and produces numeric vectors containing information about its meanings for each word is presented. This procedure is applied to a large corpus of natural language text taken from Usenet, and the resulting vectors are examined to determine what information is contained within them. These vectors provide the coordinates in a high-dimensional space in which word relationships can be analyzed. Analyses of both vector similarity and multidimensional scaling demonstrate that there is significant semantic information carried in the vectors. A comparison of vector similarity with human reaction times in a single-word priming experiment is presented. These vectors provide the basis for a representational model of semantic memory, hyperspace analogue to language (HAL).

1,717 citations


Proceedings ArticleDOI
26 Feb 1996
TL;DR: This work describes the fundamental types of "similarity queries" that should be supported and proposes a new dynamic structure for similarity indexing called the similarity search tree or SS-tree, which performs better than the R*-tree in nearly every test.
Abstract: Efficient indexing of high dimensional feature vectors is important to allow visual information systems and a number other applications to scale up to large databases. We define this problem as "similarity indexing" and describe the fundamental types of "similarity queries" that we believe should be supported. We also propose a new dynamic structure for similarity indexing called the similarity search tree or SS-tree. In nearly every test we performed on high dimensional data, we found that this structure performed better than the R*-tree. Our tests also show that the SS-tree is much better suited for approximate queries than the R*-tree.

697 citations


Journal ArticleDOI
12 Dec 1996
TL;DR: This work tries to reconcile the dual (schematic and semantic) perspectives by enumerating possible semantic similarities between objects having schema and data conflicts, and modeling schema correspondences as the projection of semantic proximity with respect to (wrt) context.
Abstract: In a multidatabase system, schematic conflicts between two objects are usually of interest only when the objects have some semantic similarity. We use the concept of semantic proximity, which is essentially an abstraction/mapping between the domains of the two objects associated with the context of comparison. An explicit though partial context representation is proposed and the specificity relationship between contexts is defined. The contexts are organized as a meet semi-lattice and associated operations like the greatest lower bound are defined. The context of comparison and the type of abstractions used to relate the two objects form the basis of a semantic taxonomy. At the semantic level, the intensional description of database objects provided by the context is expressed using description logics. The terms used to construct the contexts are obtained from {\em domain-specific ontologies}. Schema correspondences are used to store mappings from the semantic level to the data level and are associated with the respective contexts. Inferences about database content at the federation level are modeled as changes in the context and the associated schema correspondences. We try to reconcile the dual (schematic and semantic) perspectives by enumerating possible semantic similarities between objects having schema and data conflicts, and modeling schema correspondences as the projection of semantic proximity with respect to (wrt) context.

501 citations


Journal ArticleDOI
TL;DR: This paper summarizes three experiments that illustrate how LSA may be used in text-based research by describing methods for analyzing a subject’s essay for determining from what text a subject learned the information and for grading the quality of information cited in the essay.
Abstract: Latent semantic analysis (LSA) is a statistical model of word usage that permits comparisons of semantic similarity between pieces of textual information. This paper summarizes three experiments that illustrate how LSA may be used in text-based research. Two experiments describe methods for analyzing a subject’s essay for determining from what text a subject learned the information and for grading the quality of information cited in the essay. The third experiment describes using LSA to measure the coherence and comprehensibility of texts.

253 citations


Posted Content
TL;DR: A method for measuring semantic similarity between words as a new tool for text analysis on a semantic network constructed systematically from a subset of the English dictionary, LDOCE (Longman Dictionary of Contemporary English).
Abstract: This paper proposes a method for measuring semantic similarity between words as a new tool for text analysis. The similarity is measured on a semantic network constructed systematically from a subset of the English dictionary, LDOCE (Longman Dictionary of Contemporary English). Spreading activation on the network can directly compute the similarity between any two words in the Longman Defining Vocabulary, and indirectly the similarity of all the other words in LDOCE. The similarity represents the strength of lexical cohesion or semantic relation, and also provides valuable information about similarity and coherence of texts.

126 citations


Journal ArticleDOI
TL;DR: It is argued that the findings on kinship should generalize to all semantic domains--e.g., animals, emotions, etc.
Abstract: Culture consists of shared cognitive representations in the minds of individuals. This paper investigates the extent to which English speakers share the "same" semantic structure of English kinship terms. The semantic structure is defined as the arrangement of the terms relative to each other as represented in a metric space in which items judged more similar are placed closer to each other than items judged as less similar. The cognitive representation of the semantic structure, residing in the mind of an individual, is measured by judged similarity tasks involving comparisons among terms. Using six independent measurements, from each of 122 individuals, correspondence analysis represents the data in a common multidimensional spatial representation. Judged by a variety of statistical procedures, the individuals in our sample share virtually identical cognitive representations of the semantic structure of kinship terms. This model of culture accounts for 70-90% of the total variability in these data. We argue that our findings on kinship should generalize to all semantic domains--e.g., animals, emotions, etc. The investigation of semantic domains is important because they may reside in localized functional units in the brain, because they relate to a variety of cognitive processes, and because they have the potential to provide methods for diagnosing individual breakdowns in the structure of cognitive representations typical of such ailments as Alzheimer disease.

123 citations


Journal ArticleDOI
TL;DR: The authors found no difference in the recall of two-digit numbers when distractors were either numbers or words and non-words that were designed to be phonologically similar to the targets.
Abstract: Two experiments that tested whether semantic similarity between visually presented targets and auditorily presented distractors has an effect on serial recall of the visual targets are reported. In Experiment 1, we found no difference in the recall of two-digit numbers when distractors were either numbers or words and non-words that were designed to be phonologically similar to the targets. In Experiment 2 the “semantic distance” between targets and distractors had no effect on serial recall. Taken together, these experiments conceptually replicate and extend earlier results, and they establish constraints for models of the effect of unattended acoustic information on serial recall.

88 citations


Journal ArticleDOI
TL;DR: The authors argued that Starreveld and La Heij's rejection of serial access was based on an oversimplified conception of the seriality view and that interaction, rather than additivity, is predicted by existing conceptions of serial Access.
Abstract: P. A. Starreveld and W. La Heij (see record 1995-42762-001) tested the seriality view of lexical access in speech production, according to which lexical selection and the encoding of a word's form proceed in serial order without feedback. In 2 experiments, they looked at the combined effect of semantic and orthographic relatedness of written distractor words in tasks that required conceptually driven naming. They found an interaction between semantic relatedness and orthographic relatedness and argued that the observed interaction refutes the seriality view of lexical access. In this comment, the authors argue that Starreveld and La Heij's rejection of serial access was based on an oversimplified conception of the seriality view and that interaction, rather than additivity, is predicted by existing conceptions of serial access. Interaction between semantic and orthographic factors in conceptually driven naming: Comment on Starreveld and La Heij (1995).

83 citations


Journal ArticleDOI
TL;DR: Investigating the nature of naming errors produced on the Boston Naming Test by patients with mild and moderate Alzheimer's disease and elderly and young controls finds some available evidence of semantic loss in AD may be an artifact of the methodology chosen for evaluating naming errors.

66 citations


Proceedings ArticleDOI
03 Oct 1996
TL;DR: An evaluation methodology is used that assesses performance at different semantic levels, including the database response comparison used in the ARPA ATIS paradigm, and replaces the system of rules for the semantic analysis with a relatively simple first-order hidden Markov model.
Abstract: A stochastically based approach for the semantic analysis component of a natural spoken language system for the ARPA Air Travel Information Services (ATIS) task has been developed. The semantic analyzer of the spoken language system already in use at LIMSI makes use of a rule-based case grammar. In this work, the system of rules for the semantic analysis is replaced with a relatively simple first-order hidden Markov model. The performances of the two approaches can be compared because they use identical semantic representations, despite their rather different methods for meaning extraction. We use an evaluation methodology that assesses performance at different semantic levels, including the database response comparison used in the ARPA ATIS paradigm.

47 citations


Journal ArticleDOI
TL;DR: This paper proposes a solution to the problem of how to combine different similarity measures in a coherent and intuitive way, which is loosely based on ideas derived from fuzzy logic in that it uses the equivalent in the similarity domain of the and, or and not operations.
Abstract: Image databases will require a completely new organization due to the unstructured and ‘perceptual’ structure of the data they contain. We argue that similarity measures, rather than matching, will be the organizing principle of image databases. Similarity is a very elusive and complex judgment, and typical databases will have to rely on a number of different metrics to satisfy the different needs of their users. This poses the problem of how to combine different similarity measures in a coherent and intuitive way. In this paper we propose our solution, which is loosely based on ideas derived from fuzzy logic in that it uses the equivalent in the similarity domain of the and, or and not operations. The approach is much more general than that, however, and can be adapted to work with any operation that combines together similarity judgment. With this approach, a query can be described as a Directional Acyclic graph with certain properties. We analyse briefly the properties of this graph, and we present the interface we are developing to specify these queries.

Journal ArticleDOI
01 May 1996
TL;DR: This paper proposes a schema integration tool, which makes use of the semantic relations to integrate objects of different schemata and attempts to find semantic similarity between entities compared and compute their semantic similarity degrees (SSD).
Abstract: Schema integration plays an important part in research areas of heterogeneous information systems and semantic interoperability. How to acquire data semantic knowledge from the local schemas, and how to represent the knowledge acquired in support to the schema integration process are two major problems in schema integration research. To find similarities between objects of different schemata at the semantic level is considered to be one of the crucial problems in schema integration. In order to identify such similarities it is necessary to form a set of semantic characteristics the objects may possess. In this paper, we present a set of such characteristics and a set of semantic similarity relations. The relations are classified into four groups, weak semantic relation, compatible semantic relation, equivalence semantic relation and mergeable semantic relation. In addition, based on the semantic similarity relations, we attempt to find semantic similarity between entities compared and compute their semantic similarity degrees (SSD). We also propose a schema integration tool, which makes use of the semantic relations to integrate objects of different schemata.

Journal ArticleDOI
TL;DR: To explore the nature of semantic deficit in Alzheimer's disease patients, two tasks that are known to be very different with respect to the type of attentional demand and conscious effort they require are compared: lexical decision in a semantic priming paradigm and semantic relatedness judgements (intentional).
Abstract: To explore the nature of semantic deficit in Alzheimer's disease patients (AD patients) we compared two tasks that are known to be very different with respect to the type of attentional demand and conscious effort they require: lexical decision (automatic) in a semantic priming paradigm and semantic relatedness judgements (intentional) In order to minimise post-lexical facilitation we devised a semantic priming experiment that met an automatic condition as much as possible, and we selected patients without severe word recognition deficits AD patients showed reduced accuracy in the semantic relatedness judgements as compared to controls Some effect of priming was found, but this was weaker than in normals AD patients also differed from controls on targets preceded by a nonlinguistic prime (neutral condition) where their reaction times were slower as compared to neutral condition

Book ChapterDOI
01 Jan 1996
TL;DR: Evaluation shows the proposed method of measuring semantic similarity between words using a knowledge-base constructed automatically from machine-readable dictionaries is superior to other currently available methods.
Abstract: A method of measuring semantic similarity between words using a knowledge-base constructed automatically from machine-readable dictionaries is proposed. The method takes into consideration the fact that similarity changes depending on situation or context, which we call ‘viewpoint’. Evaluation shows the proposed method, although based on a simply structured knowledge-base, is superior to other currently available methods.

Journal ArticleDOI
TL;DR: In this paper, a new word was created for every time the word "president" was used as a title: "presidentbush", which enabled discrimination between the use of ''president'' as a role in government and ''I want to be president''.
Abstract: Different uses of single words, therefore, must be differentiated in the text by creating unambiguous word concepts for the program to recognize. Every time the word \"president\" was used as a title, a new word was created: \"presidentbush.\" This cleaned up part of the problem, but \"Mr. President\" was a reference used for addressing President Bush as well, so that was also changed to \"presidentbush.\" This enabled discrimination between the use of \"president\" as a title versus its use as a role in government (\"President Bush\" versus \"I want to be president\").

Journal ArticleDOI
TL;DR: In this article, the authors defend the existence of a certain aura of meaning connected with individual lexical items which spreads over the senses of their neighbours by creating specific semantic expectations, which is referred to as semantic harmony or semantic prosody.

Journal ArticleDOI
TL;DR: Results from four experiments were compatible with the proposal that partial identification of the test stimulus as relevant is a necessary condition for generalization in the guilty knowledge technique.
Abstract: In the present study, we investigated orienting response generalization across various types of semantically related stimuli. Four experiments, based on a modified version of the guilty knowledge technique, were designed to examine whether semantic relations based on abstract features are reflected by electrodermal responsivity. No generalization across coordinates was obtained, but a moderate degree of generalization was demonstrated between a word and its superordinate category (e.g., table-furniture) and between a word and its synonym. Complete generalization occurred from a verbal label of an object to its pictorial representation, and vice versa. These results are compatible with our proposal that partial identification of the test stimulus as relevant is a necessary condition for generalization in the guilty knowledge technique.

Proceedings ArticleDOI
25 Aug 1996
TL;DR: It is shown that a certain class of similarity measures, which is based on set-theoretic concepts, and explains many of the characteristics of human similarity assessment, can be interpreted as a distance in a suitable psychological space.
Abstract: We show that a certain class of similarity measures, which is based on set-theoretic concepts, and explains many of the characteristics of human similarity assessment, can be interpreted as a distance in a suitable psychological space. This view unifies a number of different measures of similarity that psychological experiments have determined to be active in humans for different classes of stimuli. The study arises out of a consideration of similarity in retrieval from a multimedia database.

Proceedings ArticleDOI
07 May 1996
TL;DR: An approach for understanding speech using a new form of probabilistic models to represent syntactic and semantic knowledge of a restricted domain is concerning, which integrates semantic, syntactic, and acoustic-phonetic knowledge in a seamless, consistent way.
Abstract: The paper is concerning an approach for understanding speech using a new form of probabilistic models to represent syntactic and semantic knowledge of a restricted domain. One important feature of our grammar is that the parse tree directly represents the semantic content of the utterance. Since we determine that semantic content by an integrated search, we avoid consistency problems at the interface between the recognizer and the language understanding part of the speech understanding system. We succeeded in designing such an incremental algorithm, which integrates semantic, syntactic, and acoustic-phonetic knowledge in a seamless, consistent way. High efficiency is achieved by using a chart-parsing technique with structure-sharing and a strict top-down strategy for opening new word hypotheses in the pronunciation layer.

Posted Content
TL;DR: The paper proposes a computationally feasible method for measuring context-sensitive semantic distance between words, computed by adaptive scaling of a semantic space, which successfully extracts the context of a text.
Abstract: The paper proposes a computationally feasible method for measuring context-sensitive semantic distance between words. The distance is computed by adaptive scaling of a semantic space. In the semantic space, each word in the vocabulary V is represented by a multi-dimensional vector which is obtained from an English dictionary through a principal component analysis. Given a word set C which specifies a context for measuring word distance, each dimension of the semantic space is scaled up or down according to the distribution of C in the semantic space. In the space thus transformed, distance between words in V becomes dependent on the context C. An evaluation through a word prediction task shows that the proposed measurement successfully extracts the context of a text.

Proceedings ArticleDOI
05 Aug 1996
TL;DR: A similarity calculation model called IFSM (Inherited Feature Similarity Measure) between objects (words/concepts) based on their common and distinctive features and an implementation method for obtaining features based on abstracted triples extracted from a large text corpus utilizing taxonomical knowledge is proposed.
Abstract: We describe a similarity calculation model called IFSM (Inherited Feature Similarity Measure) between objects (words/concepts) based on their common and distinctive features. We propose an implementation method for obtaining features based on abstracted triples extracted from a large text corpus utilizing taxonomical knowledge. This model represents an integration of traditional methods, i.e., relation based similarity measure and distribution based similarity measure. An experiment, using our new concept abstraction method which we call the flat probability grouping method, over 80,000 surface triples, shows that the abstraction level of 3000 is a good basis for feature description.

Proceedings ArticleDOI
05 Aug 1996
TL;DR: It is argued that it is necessary to draw a line between generalizable semantic principles and domain-specific semantic information and it is shown how this model may be applied to the interpretation of compounds in real texts, provided that complementary semantic information are retrieved.
Abstract: A domain independent model is proposed for the automated interpretation of nominal compounds in English. This model is meant to account for productive rules of interpretation which are inferred from the morpho-syntactic and semantic characteristics of the nominal constituents. In particular, we make extensive use of pustejovsky's principles concerning the predicative information associated with nominals. We argue that it is necessary to draw a line between generalizable semantic principles and domain-specific semantic information. We explain this distinction and we show how this model may be applied to the interpretation of compounds in real texts, provided that complementary semantic information are retrieved.

Journal ArticleDOI
TL;DR: It is argued that TE-linkage indeed has its own inherent meanings, and it is demonstrated that these meanings cannot be stated in terms of traditional semantic relations but can only be understood in cognitive terms.

Patent
12 Jan 1996
TL;DR: In this paper, a method and apparatus for compiling source code that pre-evaluates certain semantic attributes during syntactical analysis is presented, instead of waiting to perform these checks in a separate pass through the parse tree during semantic analysis.
Abstract: A method and apparatus for compiling source code that pre-evaluates certain semantic attributes during syntactical analysis. The invention performs certain type of semantic analysis, such as checking semantic attributes, during the operation of the syntactical analyzer, while the parse tree is being built, instead of waiting to perform these checks in a separate pass through the parse tree during semantic analysis. The present invention modifies the format of nodes in the parse tree to include fields for semantic attributes and modifies the actions associated with grammar productions so that they create parse tree nodes of the correct format. In addition, the present invention includes semantic attribute routines that determine the attribute values to store in the parse tree for the various semantic attributes.

Patent
07 Jun 1996
TL;DR: In this article, the authors proposed a method for discriminating semantic similarity between words capable of obtaining high similarlity discriminating accuracy even when a similar attribute in terms of meaning in an attribute set.
Abstract: PURPOSE: To provide a method for discriminating semantic similarity between words capable of obtaining high similarlity discriminating accuracy even when a similar attribute in terms of meaning in an attribute set. CONSTITUTION: The method for discriminating semantic similarity between words discriminates the semantic similarlity between words through the use of a word semantic data base expressing the meaning of the words by a set pairing an attribute (v) expressing the feature of the word (w) and the importance (a) of the attribute showing the depth of relation between the word and the attribute. In the method, similarity is obtained by selecting attributes vk and vl respectively concerning two words wi and wj to be the objects of similarity discrimination and summing product aik ×aji ×Lki of the importance aik and aji of the selected two attribute and a quantity Lki expressing semantic similarity between these two attributes vk and vi concerning the set of all the attributes included in the two words to set the result to be similarity.

Journal ArticleDOI
TL;DR: It is suggested that by using principal component analysis, one can design a user‐friendly semantic space which can be navigated and to learn the names of embedded magnitudes in semantic space, the idea of conceptual clustering is used in a broader context.
Abstract: The idea of conceptual mapping goes back to the semantic differential and conceptual clustering. Using multivariate statistical techniques, one can map a dispersion of texts onto another dispersion of their content indicators, such as keywords. The resulting configurations of texts/indicators differ from one another according to their meaning, expressed in terms of co‐ordinates of a semantic field. We suggest that by using principal component analysis, one can design a user‐friendly semantic space which can be navigated. Further, to learn the names of embedded magnitudes in semantic space, the idea of conceptual clustering is used in a broader context. This is a two‐mode statistical approach, grouping both documents and their index terms at the same time. By observing the agglomerations of narrower, related terms over a corpus, one arrives at broader, more general thesaurus entries which denote and conceptualise the major dimensions of semantic space.

Journal ArticleDOI
TL;DR: A more general model of visual language generation is achieved, based on the Gen-Inf scheme, where the end-user is allowed to choose the algorithm which best fits his/her requirements within the particular application environment.
Abstract: The paper presents a grammatical inference methodology for the generation of visual languages, that benefits from the availability of semantic information about the sample sentences. Several well-known syntactic inference algorithms are shown to obey a general inference scheme, which the authors call the Gen-Inf scheme. Then, all the algorithms of the Gen-Inf scheme are modified in agreement with the introduced semantics-based inference methodology. The use of grammatical inference techniques in the design of adaptive user interfaces was previously experimented with the VLG system for visual language generation. The system is a powerful tool for specifying, designing, and interpreting customized visual languages for different applications. They enhance the adaptivity of the VLG system to any visual environment by exploiting the proposed semantics-based inference methodology. As a matter of fact, a more general model of visual language generation is achieved, based on the Gen-Inf scheme, where the end-user is allowed to choose the algorithm which best fits his/her requirements within the particular application environment.

Patent
02 Apr 1996
TL;DR: In this article, a conception support device which can mechanically put on a riddle to play riddles on words being conventional in Japan such as 'a put on riddle ×× is solved as ○○, that meaning is...' so as to enable a computer to cope with even for unexpected formal reactions.
Abstract: PURPOSE: To provide a conception support device which can mechanically put on a riddle to play riddles on words being conventional in Japan such as 'a put on riddle ×× is solved as ○○, that meaning is ...' so as to enable a computer to cope with even for unexpected formal reactions. CONSTITUTION: A word set D in a set A whose semantic similarity is higher than a prescribed reference level which is defined by a means B to a ward (a) designated out of word set A stored in a word set A in a storage 1 is selected and, a word set F is selected out of set. A when the phonetic similarity defined by the means C to each word (d) in the set D is higher than a prescribed reference level. When the semantic similarity defined by the means B to each word (f) of the set F is higher than a prescribed reference level, a word set H is selected out of the set A. Then a word (h) is selected when the semantic similarity defined by the means B or the phonetic similarity defined by the means C to the word (a) of the set H is lower than a prescribed reference level.

Book ChapterDOI
20 May 1996
TL;DR: An integration process which allows similarities to be discovered between the schemas under study and works on object-oriented schemas, using a model of thesaurus drawn from the domain dealing with the meaning of words: linguistics.
Abstract: The complexity of databases is increasing continually. The work of several designers become necessary. Therefore it is interesting to improve the design process with a new phase devoted to information integration, in order to take into account the designers'viewpoints. In this paper, we present an integration process which allows similarities to be discovered between the schemas under study. It works on object-oriented schemas. Whenever possible, we propose several results for the integration of two given schemas. This then makes it possible to choose the one which is the best adapted to the working context, amongst the result schemas. When design schemas are being integrated, the structural, but also and above all, the semantic part of the schemas are studied. To represent the semantic of the words which are used in a schema, we have defined a model of thesaurus drawn from the domain dealing with the meaning of words: linguistics.

Journal ArticleDOI
TL;DR: This paper focused on metaphor comprehension in a poetic text in terms of the concepts of "semantic field, semantic restructuring, and perceptual restructuring" and found consistent relationships between these ratings and the degree of interaction of these semantic fields.
Abstract: This study focused on metaphor comprehension in a poetic text in terms of the concepts of “semantic field,” “semantic restructuring,” and “perceptual restructuring.” Three poetic texts, differentiated in terms of the number of semantic fields and in the degree of their interaction, were given to readers of English language and literature. They were asked to analyze a number of metaphors appearing within the context of each text and also to rate them along a number of scales (such as Concrete-Abstract). Specific semantic fields could be derived from the written responses. Furthermore, consistent relationships were found between these ratings and the degree of interaction of these semantic fields. The results lend further support for a Gestalt-Interaction theory of metaphor.