Showing papers on "Probabilistic latent semantic analysis published in 2000"

PDF

Open Access

Proceedings Article•DOI•

[...]

Daniel Gildea¹, Dan Jurafsky²•Institutions (2)

University of California, Berkeley¹, University of Colorado Boulder²

03 Oct 2000

TL;DR: This work presents a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame, derived from parse trees and hand-annotated training data.

...read moreread less

Abstract: We present a system for identifying the semantic relationships, or semantic roles, filled by constituents of a sentence within a semantic frame. Various lexical and syntactic features are derived from parse trees and used to derive statistical classifiers from hand-annotated training data.

...read moreread less

944 citations

Journal Article•DOI•

Exploiting latent semantic information in statistical language modeling

[...]

J.R. Bellegarda¹•Institutions (1)

Apple Inc.¹

01 Aug 2000

TL;DR: This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus, and proposes an integrative formulation for harnessing this synergy.

...read moreread less

Abstract: Statistical language models used in large-vocabulary speech recognition must properly encapsulate the various constraints, both local and global, present in the language. While local constraints are readily captured through n-gram modeling, global constraints, such as long-term semantic dependencies, have been more difficult to handle within a data-driven formalism. This paper focuses on the use of latent semantic analysis, a paradigm that automatically uncovers the salient semantic relationships between words and documents in a given corpus. In this approach, (discrete) words and documents are mapped onto a (continuous) semantic vector space, in which familiar clustering techniques can be applied. This leads to the specification of a powerful framework for automatic semantic classification, as well as the derivation of several language model families with various smoothing properties. Because of their large-span nature, these language models are well suited to complement conventional n-grams. An integrative formulation is proposed for harnessing this synergy, in which the latent semantic information is used to adjust the standard n-gram probability. Such hybrid language modeling compares favorably with the corresponding n-gram baseline: experiments conducted on the Wall Street Journal domain show a reduction in average word error rate of over 20%. This paper concludes with a discussion of intrinsic tradeoffs, such as the influence of training data selection on the resulting performance.

...read moreread less

565 citations

Journal Article•DOI•

Latent Semantic Indexing

[...]

Christos H. Papadimitriou¹, Prabhakar Raghavan², Hisao Tamaki³, Santosh Vempala⁴•Institutions (4)

University of California, Berkeley¹, IBM², Meiji University³, Massachusetts Institute of Technology⁴

01 Oct 2000

TL;DR: It is proved that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance, and the technique of random projection is proposed as a way of speeding up LSI.

...read moreread less

Abstract: Latent semantic indexing (LSI) is an information retrieval technique based on the spectral analysis of the term-document matrix, whose empirical success had heretofore been without rigorous prediction and explanation. We prove that, under certain conditions, LSI does succeed in capturing the underlying semantics of the corpus and achieves improved retrieval performance. We propose the technique of random projection as a way of speeding up LSI. We complement our theorems with encouraging experimental results. We also argue that our results may be viewed in a more general framework, as a theoretical basis for the use of spectral methods in a wider class of applications such as collaborative filtering.

...read moreread less

399 citations

Journal Article•DOI•

Latent class model diagnosis.

[...]

Elizabeth S. Garrett¹, Scott L. Zeger²•Institutions (2)

Johns Hopkins University School of Medicine¹, Johns Hopkins University²

01 Dec 2000-Biometrics

TL;DR: Simulations and a psychiatric example are presented to demonstrate the effective use of procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using MarkovChain Monte Carlo techniques.

...read moreread less

Abstract: In many areas of medical research, such as psychiatry and gerontology, latent class variables are used to classify individuals into disease categories, often with the intention of hierarchical modeling. Problems arise when it is not clear how many disease classes are appropriate, creating a need for model selection and diagnostic techniques. Previous work has shown that the Pearson chi 2 statistic and the log-likelihood ratio G2 statistic are not valid test statistics for evaluating latent class models. Other methods, such as information criteria, provide decision rules without providing explicit information about where discrepancies occur between a model and the data. Identifiability issues further complicate these problems. This paper develops procedures for assessing Markov chain Monte Carlo convergence and model diagnosis and for selecting the number of categories for the latent variable based on evidence in the data using Markov chain Monte Carlo techniques. Simulations and a psychiatric example are presented to demonstrate the effective use of these methods.

...read moreread less

254 citations

Journal Article•DOI•

Generalized latent trait models

[...]

Irini Moustaki¹, Martin Knott¹•Institutions (1)

London School of Economics and Political Science¹

01 Sep 2000-Psychometrika

TL;DR: A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented and in addition the scoring of individuals on the latent dimensions is discussed.

...read moreread less

Abstract: In this paper we discuss a general model framework within which manifest variables with different distributions in the exponential family can be analyzed with a latent trait model. A unified maximum likelihood method for estimating the parameters of the generalized latent trait model will be presented. We discuss in addition the scoring of individuals on the latent dimensions. The general framework presented allows, not only the analysis of manifest variables all of one type but also the simultaneous analysis of a collection of variables with different distributions. The approach used analyzes the data as they are by making assumptions about the distribution of the manifest variables directly.

...read moreread less

246 citations

Proceedings Article•DOI•

Knowledge-free induction of morphology using latent semantic analysis

[...]

Patrick Schone¹, Dan Jurafsky¹•Institutions (1)

University of Colorado Boulder¹

13 Sep 2000

TL;DR: A semantics-only algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically and it is shown that this approach provides morphology induction results that rival a current state-of-the-art system.

...read moreread less

Abstract: Morphology induction is a subproblem of important tasks like automatic learning of machine-readable dictionaries and grammar induction. Previous morphology induction approaches have relied solely on statistics of hypothesized stems and affixes to choose which affixes to consider legitimate. Relying on stem-and-affix statistics rather than semantic knowledge leads to a number of problems, such as the inappropriate use of valid affixes ("ally" stemming to "all"). We introduce a semantic-based algorithm for learning morphology which only proposes affixes when the stem and stem-plus-affix are sufficiently similar semantically. We implement our approach using Latent Semantic Analysis and show that our semantics-only approach provides morphology induction results that rival a current state-of-the-art system.

...read moreread less

233 citations

Patent•

Method and mechanism for superpositioning state vectors in a semantic abstract

[...]

Delos C. Jensen¹, Stephen R. Carter¹•Institutions (1)

Novell¹

18 Oct 2000

TL;DR: In this paper, state vectors representing the semantic content of a document are superpositioned to construct a single vector representing a semantic abstract for the document, which can be used to locate documents with similar semantic content.

...read moreread less

Abstract: State vectors representing the semantic content of a document are created. The state vectors are superpositioned to construct a single vector representing a semantic abstract for the document. The single vector can be normalized. Once constructed, the single vector semantic abstract can be compared with semantic abstracts for other documents to measure a semantic distance between the documents, and can be used to locate documents with similar semantic content.

...read moreread less

180 citations

Proceedings Article•DOI•

Using latent semantic analysis to identify similarities in source code to support program understanding

[...]

Jonathan I. Maletic¹, Andrian Marcus¹•Institutions (1)

University of Memphis¹

13 Nov 2000

TL;DR: The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation to assist in the understanding of a nontrivial software system, namely a version of Mosaic.

...read moreread less

Abstract: The paper describes the results of applying Latent Semantic Analysis (LSA), an advanced information retrieval method, to program source code and associated documentation. Latent semantic analysis is a corpus based statistical method for inducing and representing aspects of the meanings of words and passages (of natural language) reflective in their usage. This methodology is assessed for application to the domain of software components (i.e., source code and its accompanying documentation). Here LSA is used as the basis to cluster software components. This clustering is used to assist in the understanding of a nontrivial software system, namely a version of Mosaic. Applying latent semantic analysis to the domain of source code and internal documentation for the support of program understanding is a new application of this method and a departure from the normal application domain of natural language.

...read moreread less

110 citations

Proceedings Article•

Image retrieval: content versus context

[...]

Thijs Westerveld¹•Institutions (1)

University of Twente¹

12 Apr 2000

TL;DR: Experiments with an on-line newspaper archive show that Latent Semantic Indexing can outperform both content based and context based approaches and that it is a promising approach for indexing visual and multi-modal data.

...read moreread less

Abstract: In this paper, we introduce a new approach to image retrieval. This new approach takes the best from two worlds, combines image features (content) and words from collateral text (context) into one semantic space. Our approach uses Latent Semantic Indexing, a method that uses co-occurrence statistics to uncover hidden semantics. This paper shows how this method, that has proven successful in both monolingual and cross lingual text retrieval, can be used for multi-modal and cross-modal information retrieval. Experiments with an on-line newspaper archive show that Latent Semantic Indexing can outperform both content based and context based approaches and that it is a promising approach for indexing visual and multi-modal data.

...read moreread less

105 citations

Proceedings Article•DOI•

Latent semantic space: iterative scaling improves precision of inter-document similarity measurement

[...]

Rie Kubota Ando¹•Institutions (1)

Cornell University¹

01 Jul 2000

TL;DR: A novel algorithm that creates document vectors with reduced dimensionality by iteratively "scaling" vectors and computing eigenvectors is presented, which breaks the symmetry of documents and terms to capture information more evenly across documents.

...read moreread less

Abstract: We present a novel algorithm that creates document vectors with reduced dimensionality. This work was motivated by an application characterizing relationships among documents in a collection. Our algorithm yielded inter-document similarities with an average precision up to 17.8% higher than that of singular value decomposition (SVD) used for Latent Semantic Indexing. The best performance was achieved with dimensional reduction rates that were 43% higher than SVD on average. Our algorithm creates basis vectors for a reduced space by iteratively “scaling” vectors and computing eigenvectors. Unlike SVD, it breaks the symmetry of documents and terms to capture information more evenly across documents. We also discuss correlation with a probabilistic model and evaluate a method for selecting the dimensionality using log-likelihood estimation.

...read moreread less

85 citations

Proceedings Article•DOI•

Fast latent semantic indexing of spoken documents by using self-organizing maps

[...]

Mikko Kurimo

05 Jun 2000

TL;DR: A new latent semantic indexing (LSI) method for spoken audio documents that smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER).

...read moreread less

Abstract: This paper describes a new latent semantic indexing (LSI) method for spoken audio documents. The framework is indexing broadcast news from radio and TV as a combination of large vocabulary continuous speech recognition (LVCSR), natural language processing (NLP) and information retrieval (IR). For indexing, the documents are presented as vectors of word counts, whose dimensionality is rapidly reduced by random mapping (RM). The obtained vectors are projected into the latent semantic subspace determined by SVD, where the vectors are then smoothed by a self-organizing map (SOM). The smoothing by the closest document clusters is important here, because the documents are often short and have a high word error rate (WER). As the clusters in the semantic subspace reflect the news topics, the SOMs provide an easy way to visualize the index and query results and to explore the database. Test results are reported for TREC's spoken document retrieval databases (www.idiap.ch/kurimo/thisl.html).

...read moreread less

Book•

Statistics and neural networks: advances at the interface

[...]

Jim Kay, D. M. Titterington

06 Apr 2000

TL;DR: Flexible discriminant and mixture models Neural networks for unsupervised learning based on information theory Radial basis function networks and statistics Robust prediction in many-parameter models and data visualisation.

...read moreread less

Abstract: Flexible discriminant and mixture models Neural networks for unsupervised learning based on information theory Radial basis function networks and statistics Robust prediction in many-parameter models Density networks Latent variable models and data visualisation Analysis of latent structure models with multidimensional latent variables Artificial neural networks and multivariate statistics

...read moreread less

Journal Article•DOI•

A Probabilistic Clustering Model for Variables of Mixed Type

[...]

Johann Bacher¹•Institutions (1)

University of Erlangen-Nuremberg¹

01 Aug 2000-Quality & Quantity

TL;DR: In this article, a probabilistic clustering model for mixed data is proposed, which allows analysis of variables of mixed type: the variables may be nominal, ordinal and/or quantitative.

...read moreread less

Abstract: This paper develops a probabilistic clustering model for mixeddata. The model allows analysis of variables of mixed type: thevariables may be nominal, ordinal and/or quantitative. The modelcontains the well-known models of latent class analysis as submodels.As in latent class analysis, local independence of the variables isassumed. The parameters of the model are estimated by the EMalgorithm. Test statistics and goodness-of-fit measures are proposedfor model selection. Two artificial data sets show the usefulness ofthese tests. An empirical example completes the presentation.

...read moreread less

Proceedings Article•DOI•

Latent semantic analysis of textual data

[...]

Preslav Nakov

01 Jun 2000

TL;DR: An overview of the usage of the LSA for analysis of textual data and the potential of LSA on selected corpus of religious and sacred texts is demonstrated.

...read moreread less

Abstract: Latent Semantic Analysis of Text Information The paper presents an overview of the usage of LSA for analysis of textual data. The mathematical apparatus is explained in brief and special attention if pointed on the key parameters that influence the quality of the results obtained. The potential of LSA is demonstrated on selected corpus of religious and sacred texts. The results of an experimental application of LSA for educational purposes are also present.

...read moreread less

Proceedings Article•

Model criticism of Bayesian networks with latent variables

[...]

David M. Williamson¹, Russell G. Almond¹, Robert J. Mislevy¹•Institutions (1)

Princeton University¹

30 Jun 2000

TL;DR: This paper introduces a methodology for criticizing models both globally (a BN in its entirety) and locally (observable nodes), and explores its value in identifying several kinds of misfit: node errors, edge errors, state errors, and prior probability errors in the latent structure.

...read moreread less

Abstract: The application of Bayesian networks (BNs) to cognitive assessment and intelligent tutoring systems poses new challenges for model construction. When cognitive task analyses suggest constructing a BN with several latent variables, empirical model criticism of the latent structure becomes both critical and complex. This paper introduces a methodology for criticizing models both globally (a BN in its entirety) and locally (observable nodes), and explores its value in identifying several kinds of misfit: node errors, edge errors, state errors, and prior probability errors in the latent structure. The results suggest the indices have potential for detecting model misfit and assisting in locating problematic components of the model.

...read moreread less

Book Chapter•DOI•

Probabilistic Parse Selection Based on Semantic Co-Occurrences

[...]

Eirik Hektoen¹•Institutions (1)

Logica¹

01 Jan 2000

TL;DR: This paper presents a new technique for selecting the correct parse of ambiguous sentences based on a probabilistic analysis, of lexical cooccurrences in semantic forms called Semco, which uses Bayesian Estimation for the cooccurrence probabilities to achieve higher accuracy for sparse data than the more common Maximum Likelihood Estimation would.

...read moreread less

Abstract: This chapter presents a new technique for selecting the correct parse of ambiguous sentences based on a probabilistic analysis of lexical co-occurrences in semantic forms. The method is called ‘Semco’ (for semantic co-occurrence analysis) and is specifically targeted at the differential distribution of such co-occurrences in correct and incorrect parses. It uses Bayesian Estimation for the co-occurrence probabilities to achieve higher accuracy for sparse data than the more common Maximum Likelihood Estimation would. It has been tested on the Wall Street Journal corpus (in the Perm Treebank) and shown to find the correct parse of 60.9% of parseable sentences of 6–20 words.

...read moreread less

Journal Article•DOI•

Modeling and Diagnosing Domain Knowledge Using Latent Semantic Indexing

[...]

Jared Freeman, Bryan Thompson, Marvin S. Cohen

01 Dec 2000-Interactive Learning Environments

TL;DR: A Latent Semantic Index was constructed from arguments made by navy officers concerning events in an anti-air warfare scenario and a model based on LSI factor values predicted level of domain expertise with 89% accuracy.

...read moreread less

Abstract: A Latent Semantic Index (LSI) was constructed from arguments made by navy officers concerning events in an anti-air warfare scenario. A model based on LSI factor values predicted level of domain expertise with 89% accuracy. The LSI factor space was reduced using MDS to five dimensions: aircraft route, aircraft response, kinematics, localization, and an unclassifiable element. Arguments in the localization category were reliably more common among officers with the greatest expertise. Automated classification of arguments into these elements achieved 84% accuracy. LSI may be a useful tool for automating aspects of modeling expertise and diagnosing knowledge deficiencies.

...read moreread less

Mixtures of latent variable models for density estimation and classification

[...]

Perry Moerland

01 Jan 2000

TL;DR: A Bayesian treatment of mixtures of latent variable models is proposed to avoid having to choose a value for the dimension of the latent subspace by a computationally expensive search technique such as cross-validation.

...read moreread less

Abstract: This paper deals with the problem of probability density estimation with the goal of finding a good probabilistic representation of the data. One of the most popular density estimation methods is the Gaussian mixture model (GMM). A promising alternative to GMMs are the recently proposed mixtures of latent variable models. Examples of the latter are principal component analysis and factor analysis. The advantage of these models is that they are capable of representing the covariance structure with less parameters by choosing the dimension of a subspace in a suitable way. An empirical evaluation on a large number of data sets shows that mixtures of latent variable models almost always outperform various GMMs both in density estimation and Bayes classifiers. To avoid having to choose a value for the dimension of the latent subspace by a computationally expensive search technique such as cross-validation, a Bayesian treatment of mixtures of latent variable models is proposed. This framework makes it possible to determine the appropriate dimension during training and experiments illustrate its viability.

...read moreread less

Book•

Analysis of latent structure models with multidimensional latent variables

[...]

A. P. Dunmur, D. M. Titterington

06 Apr 2000

Proceedings Article•

Using Lexical Semantic Knowledge from Machine Readable Dictionaries for Domain Independent Language Modelling

[...]

George Demetriou¹, Eric Atwell², Clive Souter²•Institutions (2)

University of Sheffield¹, University of Leeds²

01 May 2000

TL;DR: The utilization of semantic knowledge acquired from an MRD for language modelling tasks in relation to speech recognition applications is described, providing evidence that limited or incomplete knowledge from lexical resources such as MRDs can be useful for domain independent language modelling.

...read moreread less

Abstract: Machine Readable Dictionaries (MRDs) have been used in a variety of language processing tasks including word sense disambiguation, text segmentation, information retrieval and information extraction. In this paper we describe the utilization of semantic knowledge acquired from an MRD for language modelling tasks in relation to speech recognition applications. A semantic model of language has been derived using the dictionary definitions in order to compute the semantic association between the words. The model is capable of capturing phenomena of latent semantic dependencies between the words in texts and reducing the language ambiguity by a considerable factor. The results of experiments suggest that the semantic model can improve the word recognition rates in “noisy-channel” applications. This research provides evidence that limited or incomplete knowledge from lexical resources such as MRDs can be useful for domain independent language modelling.

...read moreread less

Proceedings Article•DOI•

Latent semantic indexing model for Boolean query formulation (poster session)

[...]

Dae-Ho Baek¹, Heuiseok Lim, Hae-Chang Rim¹•Institutions (1)

Korea University¹

01 Jul 2000

TL;DR: A new model named Boolean Latent Semantic Indexing model based on the Singular Value Decomposition and Boolean query formulation is introduced, which can help users to make precise representation of their information search needs.

...read moreread less

Abstract: A new model named Boolean Latent Semantic Indexing model based on the Singular Value Decomposition and Boolean query formulation is introduced. While the Singular Value Decomposition alleviates the problems of lexical matching in the traditional information retrieval model, Boolean query formulation can help users to make precise representation of their information search needs. Retrieval experiments on a number of test collections seem to show that the proposed model achieves substantial performance gains over the Latent Semantic Indexing model.

...read moreread less

Proceedings Article•

Latent semantic indexing model for Boolean query formulation

[...]

Dae Ho Baek¹, Heuiseok Lim, Hae-Chang Rim•Institutions (1)

Korea University¹

01 Jan 2000

TL;DR: In this paper, a new model named Boolean Latent Semantic Indexing model based on the Singular Value Decomposition and Boolean query formulation is introduced, which can help users to make precise representation of their information search needs.

...read moreread less

Graphical displays for latent class clusters and latent class factor models

[...]

Jeroen K. Vermunt, Jay Magidson

01 Jan 2000

TL;DR: This paper discusses two different forms of (exploratory) LC analysis which are implemented in a new computer program called Latent GOLD.

...read moreread less

Abstract: Statistical Innovations Inc., P.O. Box 1, Belmont, MA 02478, USAKeywords. latent class analysis, factor analysis, cluster analysis, mixture models,categorical data, graphical displays, bi-plot, tri-plot, statistical softwareLatent class (LC) analysis is becoming one of the standard data analysis toolsin social, biomedical, and marketing research. This paper discusses two differentforms of (exploratory) LC analysis which are implemented in a new computerprogram called Latent GOLD

...read moreread less

Proceedings Article•

[...]

Rickard Cöster¹, Lars Asker¹•Institutions (1)

Stockholm University¹

20 Aug 2000

TL;DR: This paper presents an approach that builds on user feedback across multiple queries in order to improve the retrieval quality of novel queries and demonstrates that REGRESSOR automatically improves on the performance of Latent Semantic Indexing by utilizing the feedback information from past queries.

...read moreread less

Abstract: In several information retrieval (IR) systems there is a possibility for user feedback. Many machine learning methods have been proposed that learn from the feedback information in a long-term fashion. In this paper, we present an approach that builds on user feedback across multiple queries in order to improve the retrieval quality of novel queries. This allows users of an IR system to retrieve relevant documents at a reduced effort. Two algorithms for long-term learning across multiple queries in the scope of the retrieval system Latent Semantic Indexing have been implemented in a system, REGRESSOR, in order to test these ideas. The algorithms are based on k-nearest-neighbor searching and back propagation neural networks. Training examples are query vectors, and by using Latent Semantic Indexing, the examples are reduced to a fixed and manageable size. In order to evaluate the methods, we performed a set of experiments where we compared the performance of Latent Semantic Indexing and REGRESSOR. The results demonstrate that REGRESSOR automatically improves on the performance of Latent Semantic Indexing by utilizing the feedback information from past queries.

...read moreread less

Journal Article•DOI•

Automatic document classification based on latent semantic analysis

[...]

Igor Kuralenok¹, Igor Nekrestyanov¹•Institutions (1)

Saint Petersburg State University¹

01 Jul 2000-Programming and Computer Software

TL;DR: The method proposed is based on the use of the latent semantic analysis to retrieve semantic dependencies between words to classify document based on these dependencies.

...read moreread less

Abstract: In this paper, the problem of automatic document classification by a set of given topics is considered. The method proposed is based on the use of the latent semantic analysis to retrieve semantic dependencies between words. The classification of document is based on these dependencies. The results of experiments performed on the basis of the standard test data set TREC (Text REtrieval Conference) confirm the attractiveness of this approach. The relatively low computational complexity of this method at the classification stage makes it possible to be applied to the classification of document streams.

...read moreread less

Journal Article•

Evaluating Word Similarity in a Semantic Network.

[...]

Masanobu Kobayashi, Xiaoyong Du, Naohiro Ishii

01 Jan 2000-Informatica (slovenia)

[...]

J. Magidson, Jeroen K. Vermunt

01 Jan 2000

Journal Article•

Text Browsing Based on Latent Semantic Indexing

[...]

Lin Hong

01 Jan 2000-Journal of Chinese information processing

TL;DR: Text browsing based on Latent Semantic Indexing (LSI) is presented in this paper, and it combines LSI with concept tagging to improve the efficiency of users reading.

...read moreread less

Abstract: Text browsing is the assistant reading mechanism to help users browse the online texts.Text browsing based on Latent Semantic Indexing(LSI)is presented in this paper,and it combines LSI with concept tagging to improve the efficiency of users reading.It applies LSI to reduce the skew intersections and calculates the similarity between terms and texts based on the semantic space,it also divides the terms into several semantic classes and determines the meanings of classes.In additional,it implements the information navigation based on conceptual tree.

...read moreread less

A Meta-Analysis on Relationship Modeling Accuracy: Comparing Relational and Semantic Models

[...]

Qing Cao, Fiona Fui-Hoon Nah, Keng Siau

01 Jan 2000

TL;DR: The meta-analysis carried out in this research is an attempt to alleviate inconsistent results in previous studies.

...read moreread less

Abstract: Semantic data modeling, such as entity-relationship (ER) modeling and extended/enhanced entity-relationship (EER) modeling, has emerged as an alternative to relational data modeling. The majority of research in data modeling suggests that the use of semantic data models leads to better performance. However the findings are not conclusive and sometimes inconsistent. In this research, we investigate modeling relationship correctness in relational and semantic models. The meta-analysis carried out in this research is an attempt to alleviate inconsistent results in previous studies.

...read moreread less

Posted Content•

Using a Diathesis Model for Semantic Parsing

[...]

Jordi Atserias, Irene Castellón, Montse Civit, German Rigau

29 Jun 2000-arXiv: Computation and Language

TL;DR: This paper obtains a case-role analysis, in which the semantic roles of the verb are identified, and presents a semantic parsing approach for unrestricted texts that identifies correctly more than 73% of possible semantic case-roles.

...read moreread less

Abstract: This paper presents a semantic parsing approach for unrestricted texts. Semantic parsing is one of the major bottlenecks of Natural Language Understanding (NLU) systems and usually requires building expensive resources not easily portable to other domains. Our approach obtains a case-role analysis, in which the semantic roles of the verb are identified. In order to cover all the possible syntactic realisations of a verb, our system combines their argument structure with a set of general semantic labelled diatheses models. Combining them, the system builds a set of syntactic-semantic patterns with their own role-case representation. Once the patterns are build, we use an approximate tree pattern-matching algorithm to identify the most reliable pattern for a sentence. The pattern matching is performed between the syntactic-semantic patterns and the feature-structure tree representing the morphological, syntactical and semantic information of the analysed sentence. For sentences assigned to the correct model, the semantic parsing system we are presenting identifies correctly more than 73% of possible semantic case-roles. Keys:

...read moreread less