scispace - formally typeset
Search or ask a question

Showing papers on "Latent semantic analysis published in 2001"


Journal ArticleDOI
Thomas Hofmann1
TL;DR: This paper proposes to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice, and results in a more principled approach with a solid foundation in statistical inference.
Abstract: This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter method which stems from linear algebra and performs a Singular Value Decomposition of co-occurrence tables, the proposed technique uses a generative latent class model to perform a probabilistic mixture decomposition. This results in a more principled approach with a solid foundation in statistical inference. More precisely, we propose to make use of a temperature controlled version of the Expectation Maximization algorithm for model fitting, which has shown excellent performance in practice. Probabilistic Latent Semantic Analysis has many applications, most prominently in information retrieval, natural language processing, machine learning from text, and in related areas. The paper presents perplexity results for different types of text and linguistic data collections and discusses an application in automated document indexing. The experiments indicate substantial and consistent improvements of the probabilistic method over standard Latent Semantic Analysis.

2,574 citations


Book ChapterDOI
05 Sep 2001
TL;DR: This paper presented an unsupervised learning algorithm for recognizing synonyms based on statistical data acquired by querying a web search engine, called Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words.
Abstract: This paper presents a simple unsupervised learning algorithm for recognizing synonyms, based on statistical data acquired by querying a Web search engine. The algorithm, called PMI-IR, uses Pointwise Mutual Information (PMI) and Information Retrieval (IR) to measure the similarity of pairs of words. PMI-IR is empirically evaluated using 80 synonym test questions from the Test of English as a Foreign Language (TOEFL) and 50 synonym test questions from a collection of tests for students of English as a Second Language (ESL). On both tests, the algorithm obtains a score of 74%. PMI-IR is contrasted with Latent Semantic Analysis (LSA), which achieves a score of 64% on the same 80 TOEFL questions. The paper discusses potential applications of the new unsupervised learning algorithm and some implications of the results for LSA and LSI (Latent Semantic Indexing).

1,232 citations


Proceedings ArticleDOI
Yihong Gong1, Xin Liu1
01 Sep 2001
TL;DR: This paper proposes two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents, and uses the latent semantic analysis technique to identify semantically important sentences, for summary creations.
Abstract: In this paper, we propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The first method uses standard IR methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and different from each other. This is an attempt to create a summary with a wider coverage of the document's main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators. The evaluations also study the influence of different VSM weighting schemes on the text summarization performances. Finally, the causes of the large disparities in the evaluators' manual summarization results are investigated, and discussions on human text summarization patterns are presented.

863 citations



Patent
08 May 2001
TL;DR: A computer-based information search and retrieval system and method for retrieving textual digital objects that makes full use of the projections of the documents onto both the reduced document space characterized by the singular value decomposition-based latent semantic structure and its orthogonal space is presented in this paper.
Abstract: A computer-based information search and retrieval system and method for retrieving textual digital objects that makes full use of the projections of the documents onto both the reduced document space characterized by the singular value decomposition-based latent semantic structure and its orthogonal space. The resulting system and method has increased robustness, improving the instability of the traditional keyword search engine due to synonymy and/or polysemy of a natural language, and therefore is particularly suitable for web document searching over a distributed computer network such as the Internet.

218 citations


01 Jan 2001
TL;DR: A theoretical framework for semantic space models is developed by synthesizing theoretical analyses from vector space information re- trieval and categorical data analysis with new basic re- search.
Abstract: Towards a Theory of Semantic Space Will Lowe (wlowe02@tufts.edu) Center for Cognitive Studies Tufts University; MA 21015 USA Abstract This paper adds some theory to the growing literature of semantic space models. We motivate semantic space models from the perspective of distributional linguistics and show how an explicit mathematical formulation can provide a better understanding of existing models and suggest changes and improvements. In addition to pro- viding a theoretical framework for current models, we consider the implications of statistical aspects of language data that have not been addressed in the psychological modeling literature. Statistical approaches to language must deal principally with count data, and this data will typically have a highly skewed frequency distribution due to Zipf’s law. We consider the consequences of these facts for the construction of semantic space models, and present methods for removing frequency biases from se- mantic space models. Introduction There is a growing literature on the empirical adequacy of semantic space models across a wide range of sub- ject domains (Burgess et al., 1998; Landauer et al., 1998; Foltz et al., 1998; McDonald and Lowe, 1998; Lowe and McDonald, 2000). However, semantic space mod- els are typically structured and parameterized differently by each researcher. Levy and Bullinaria (2000) have ex- plored the implications of parameter changes empirically by running multiple simulations, but there has up until now been no work that places semantic space models in an overarching theoretical framework; consequently there there are few statements of how semantic spaces ought to be structured in the light of their intended pur- pose. In this paper we attempt to develop a theoretical framework for semantic space models by synthesizing theoretical analyses from vector space information re- trieval and categorical data analysis with new basic re- search. The structure of the paper is as follows. The next sec- tion brie¤y motivates semantic space models using ideas from distributional linguistics. We then review Zipf’s law and its consequences the distributional character of linguistic data. The £nal section presents a formal de£- nition of semantic space models and considers what ef- fects different choices of component have on the result- ing models. Motivating Semantic Space Firth (1968) observed that “you shall know a word by the company it keeps”. If we interpret company as lex- ical company, the words that occur near to it in text or speech, then two related claims are possible. The £rst is unexceptional: we come to know about the syntactic character of a word by examining the other words that may and may not occur around it in text. Syntactic theory then postulates latent variables e.g. parts of speech and branching structure, that control the distributional prop- erties of words and restrictions on their contexts of occur- rence. The second claim is that we come to know about the semantic character of a word by examining the other words that may and may not occur around it in text. The intuition for this distributional characterization of semantics is that whatever makes words similar or dis- similar in meaning, it must show up distributionally, in the lexical company of the word. Otherwise the suppos- edly semantic difference is not available to hearers and it is not easy to see how it may be learned. If words are similar to the extent that they occur in the similar contexts then we may de£ne a statistical re- placement test (Finch, 1993) which tests the meaning- fulness of the result of switching one word for another in a sentence. When a corpus of meaningful sentences is available the test may be reversed (Lowe, 2000a), and un- der a suitable representation of lexical context, we may hold each word constant and estimate its typical sur- rounding context. A semantic space model is a way of representing similarity of typical context in a Euclidean space with axes determined by local word co-occurrence counts. Counting the co-occurrence of a target word with a £xed set of D other words makes it possible to position the target in a space of dimension D. A target’s position with respect to other words then expresses similarity of lexical context. Since the basic notion from distributional linguistics is ‘intersubstitutability in context’, a semantic space model is effective to the extent it realizes this idea accurately. Zipf’s Law The frequency of a word is (approximately) proportional to the reciprocal of its rank in a frequency list (Zipf, 1949; Mandelbrot, 1954). This is Zipf’s Law. Zipf’s law ensures dramatically skewed distributions for almost

146 citations


01 Jan 2001
TL;DR: Experimental results of usage of LSA for analysis of English literature texts and preliminary transformations of the frequency text-document matrix with different weight functions are tested on the basis of control subsets.
Abstract: This paper presents experimental results of usage of LSA for analysis of English literature texts. Several preliminary transformations of the frequency text-document matrix with different weight functions are tested on the basis of control subsets. Additional clustering based on correlation matrix is applied in order to reveal the latent structure. The algorithm creates a shaded form matrix via singular values and vectors. The results are interpreted as a quality of the transformations and compared to the control set tests.

129 citations


PatentDOI
Jerome R. Bellegarda1
TL;DR: A method and apparatus for speech recognition using latent semantic adaptation to generate an LSA space for a collection of documents and to continually adapt the LSAspace with new documents as they become available.
Abstract: A method and apparatus for speech recognition using latent semantic adaptation is described herein. According to one aspect of the present invention, a method for recognizing speech comprises using latent semantic analysis (LSA) to generate an LSA space for a collection of documents and to continually adapt the LSA space with new documents as they become available. Adaptation of the LSA space is optimally two-sided, taking into account the new words in the new documents. Alternatively, adaptation is one-sided, taking into account the new documents but discarding any new words appearing in those documents.

127 citations


Patent
Yihong Gong1, Xin Liu
26 Mar 2001
TL;DR: In this paper, a text summarizer using relevance measurement technologies and latent semantic analysis techniques provides accurate and useful summarization of the contents of text documents, and generic text summaries may be produced by ranking and extracting sentences from original documents.
Abstract: Text summarizers using relevance measurement technologies and latent semantic analysis techniques provide accurate and useful summarization of the contents of text documents. Generic text summaries may be produced by ranking and extracting sentences from original documents; broad coverage of document content and decreased redundancy may simultaneously be achieved by constructing summaries from sentences that are highly ranked and different from each other. In one embodiment, conventional Information Retrieval (IR) technologies may be applied in a unique way to perform the summarization; relevance measurement, sentence selection, and term elimination may be repeated in successive iterations. In another embodiment, a singular value decomposition technique may be applied to a terms-by-sentences matrix such that all the sentences from the document may be projected into the singular vector space; a text summarizer may then select sentences having the largest index values with the most important singular vectors as part of the text summary.

110 citations


01 Jan 2001
TL;DR: The general principle under exploration the Distributional Hypothesis, which combines the convergence of these recent studies into a cognitive role for distributional information in explaining language ability, is called.
Abstract: Testing the Distributional Hypothesis: The Influence of Context on Judgements of Semantic Similarity Scott McDonald (scottm@cogsci.ed.ac.uk) Michael Ramscar (michael@cogsci.ed.ac.uk) Institute for Communicating and Collaborative Systems, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland Abstract Distributional information has recently been implicated as playing an important role in several aspects of lan- guage ability. Learning the meaning of a word is thought to be dependent, at least in part, on exposure to the word in its linguistic contexts of use. In two experiments, we manipulated subjects’ contextual experience with mar- ginally familiar and nonce words. Results showed that similarity judgements involving these words were af- fected by the distributional properties of the contexts i n which they were read. The accrual of contextual experi- ence was simulated in a semantic space model, by succes- sively adding larger amounts of experience in the form of item-in-context exemplars sampled from the British National Corpus. The experiments and the simulation provide support for the role of distributional information in developing representations of word meaning. The Distributional Hypothesis The basic human ability of language understanding – mak- ing sense of another person’s utterances – does not develop in isolation from the environment. There is a growing body of research suggesting that distributional information plays a more powerful role than previously thought in a number of aspects of language processing. The exploita- tion of statistical regularities in the linguistic environment has been put forward to explain how language learners accomplish tasks from segmenting speech to bootstrap- ping word meaning. For example, Saffran, Aslin and Newport (1996) have demonstrated that infants are highly sensitive to simple conditional probability statistics, indicating how the ability to segment the speech stream into words may be realised. Adults, when faced with the task of identifying the word boundaries in an artificial language, also appear able to readily exploit such statistics (Saffran, Newport & Aslin, 1996). Redington, Chater and Finch (1998) have proposed that distributional information may contribute to the acquisition of syntactic knowledge by children. Useful information about the similarities and differences in the meaning of words has also been shown to be present in simple distributional statistics (e.g., Landauer & Dumais, 1997; McDonald, 2000). Based on the convergence of these recent studies into a cognitive role for distributional information in explaining language ability, we call the general principle under exploration the Distributional Hypothesis. The purpose of the present paper is to further test the distributional hypothesis, by examining the influence of context on similarity judgements involving marginally familiar and novel words. Our investigations are framed under the ‘semantic space’ approach to representing word meaning, to which we turn next. Distributional Models of Word Meaning The distributional hypothesis has provided the motivation for a class of objective statistical methods for representing meaning. Although the surge of interest in the approach arose in the fields of computational linguistics and infor- mation retrieval (e.g., Schutze, 1998; Grefenstette, 1994), where large-scale models of lexical semantics are crucial for tasks such as word sense disambiguation, high- dimensional ‘semantic space’ models are also useful tools for investigating how the brain represents the meaning of words. Word meaning can be considered to vary along many dimensions; semantic space models attempt to capture this variation in a coherent way, by positioning words in a geometric space. How to determine what the crucial dimensions are has been a long-standing problem; a recent and fruitful approach to this issue has been to label the dimensions of semantic space with words. A word is located in the space according to the degree to which it co- occurs with each of the words labelling the dimensions of the space. Co-occurrence frequency information is extracted from a record of language experience – a large corpus of natural language. Using this approach, two words that tend to occur in similar linguistic contexts – that is, they are distributionally similar – will be positioned closer together in semantic space than two words which are not as distributionally similar. Such simple distributional knowledge has been implicated in a variety of language processing behaviours, such as lexical priming (e.g., Lowe & McDonald, 2000; Lund, Burgess & Atchley, 1995; McDonald & Lowe, 1998), synonym selection (Landauer & Dumais, 1997), retrieval in analogical reason- ing (Ramscar & Yarlett, 2000) and judgements of semantic similarity (McDonald, 2000). Contextual co-occurrence, the fundamental relationship underlying the success of the semantic space approach to representing word meaning, can be defined in a number of ways. Perhaps the simplest (and the approach taken in the majority of the studies cited above) is to define co- occurrence in terms of a ‘context window’: the co-occur-

103 citations


Journal ArticleDOI
TL;DR: A novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis is presented.
Abstract: This paper presents a novel statistical method for factor analysis of binary and count data which is closely related to a technique known as Latent Semantic Analysis. In contrast to the latter meth...

Journal ArticleDOI
TL;DR: A system that can automatically assess a student essay based on its content using Latent Semantic Analysis, a tool which is used to represent the meaning of words as vectors in a high-dimensional space.
Abstract: This paper presents Apex, a system that can automatically assess a student essay based on its content. It relies on Latent Semantic Analysis, a tool which is used to represent the meaning of words as vectors in a high-dimensional space. By comparing an essay and the text of a given course on a semantic basis, our system can measure how well the essay matches the text. Various assessments are presented to the student regarding the topic, the outline and the coherence of the essay. Our experiments yield promising results.

01 Jan 2001
TL;DR: A model of language understanding that combines information from rule-based syntactic processing with a vector-based se- mantic representation which is learned from a corpus is evaluated as a cognitive model, and as a potential technique for natural language understanding.
Abstract: Rules for Syntax, Vectors for Semantics Peter Wiemer-Hastings (Peter.Wiemer-Hastings@ed.ac.uk) Iraide Zipitria (iraidez@cogsci.ed.ac.uk) University of Edinburgh Division of Informatics 2 Buccleuch Place Edinburgh EH8 9LW Scotland Abstract Latent Semantic Analysis (LSA) has been shown to perform many linguistic tasks as well as humans do, and has been put forward as a model of human linguistic competence. But LSA pays no attention to word order, much less sentence structure. Researchers in Natural Language Processing have made significant progress in quickly and accurately deriv- ing the syntactic structure of texts. But there is little agree- ment on how best to represent meaning, and the representa- tions are brittle and difficult to build. This paper evaluates a model of language understanding that combines information from rule-based syntactic processing with a vector-based se- mantic representation which is learned from a corpus. The model is evaluated as a cognitive model, and as a potential technique for natural language understanding. Motivations Latent Semantic Analysis (LSA) was originally developed for the task of information retrieval, selecting a text which matches a query from a large database (Deerwester, Du- mais, Furnas, Landauer, & Harshman, 1990) 1 . More re- cently, LSA has been evaluated by psychologists as a model for human lexical acquisition (Landauer & Dumais, 1997). It has been applied to other textual tasks and found to gen- erally perform at levels matching human performance. All this despite the fact that LSA pays no attention to word or- der, let alone syntax. This led Landauer to claim that syntax apparently has no contribution to the meaning of a sentence, and may only serve as a working memory crutch for sen- tence processing, or in a stylistic role (Landauer, Laham, Rehder, & Schreiner, 1997). The tasks that LSA has been shown to perform well on can be separated into two groups: those that deal with sin- gle words and those that deal with longer texts. For exam- ple, on the synonym selection part of the TOEFL (Test of English as a Foreign Language), LSA was as accurate at choosing the correct synonym (out of 4 choices) as were successful foreign applicants to US universities (Landauer et al., 1997). For longer texts, Rehder et al (1998) showed that for evaluating author knowledge, LSA does steadily worse for texts shorter than 200 words. More specifically, 1 We do not describe the functioning of the LSA mechanism here. For a complete description, see (Deerwester et al., 1990; Landauer & Dumais, 1997) for 200-word essay segments, LSA accounted for 60% of the variance in human scores. For 60-word essay segments, LSA scores accounted for only 10% of the variance. In work on judging the quality of single-sentence student answers in an intelligent tutoring context, we have shown in previous work that although LSA nears the performance of intermediate-knowledge human raters, it lags far behind expert performance (Wiemer-Hastings, Wiemer-Hastings, & Graesser, 1999b). Furthermore, when we compared LSA to a keyword-based approach, LSA performed only marginally better (Wiemer-Hastings, Wiemer-Hastings, & Graesser, 1999a). This accords with unpublished results on short answer sentences from Walter Kintsch, personal com- munication, January 1999. In the field of Natural Language Processing, the eras of excessive optimism and ensuing disappointment have been followed by study increases in the systems’ ability to pro- cess the syntactic structure of texts with rule-based mecha- nisms. The biggest recent developments have been due to the augmentation of the rules with corpus-derived proba- bilities for when they should be applied (Charniak, 1997; Collins, 1996, 1998, for example). Unfortunately, progress in the area of computing the se- mantic content of texts has not been so successful. Two ba- sic variants of semantic theories have been developed. One is based on some form of logic. The other is represented by connections within semantic networks. In fact, the latter can be simply converted into a logic-based representation. Such theories are brittle in two ways. First, they require every concept and every connection between concepts to be defined by a human knowledge engineer. Multi-purpose representations are not feasible because of the many techni- cal senses of words in every different domain. Second, such representations can not naturally make the graded judge- ments that humans do. Humans can compare any two things (even apples and oranges!), but aside from count- ing feature overlap, logic-based representations have diffi- culty with relationships other than subsumption and “has- as-part”. Due to these various motivations, we are pursuing a two- pronged research project. First, we want to evaluate the combination of a syntactic processing mechanism with an LSA-based semantic representation as a cognitive model of human sentence similarity judgements. Second, we are

Proceedings ArticleDOI
07 May 2001
TL;DR: A new method, which does not require an explicit document segmentation of the training corpus is presented, which resulted in a perplexity reduction of 16% on a database of biology lecture transcriptions.
Abstract: Introduces the non-negative matrix factorization for language model adaptation. This approach is an alternative to latent semantic analysis based language modeling using singular value decomposition with several benefits. A new method, which does not require an explicit document segmentation of the training corpus is presented as well. This method resulted in a perplexity reduction of 16% on a database of biology lecture transcriptions.


Book
01 Jan 2001
TL;DR: The purpose of this paper is to investigate issues in LSA’s original context of information retrieval and to pose new directions for future work on how the performance of LSA depends on its ability to handle synonyms.
Abstract: Latent Semantic Analysis (LSA) [4] is a mathematical approach to the discovery ofsimilarity relationships among documents, fragments of documents, and the wordsthat occur within collections of documents. Although LSA was originally appliedin the context of information retrieval [4], it has since been successfully applied toa wide variety of text-based tasks [16].LSA is a variant of the vector space model for information retrieval that usesa reduced-rank approximation to the term-document matrix. In the informationretrieval domain, rank reduction is applied in an e ort to remove the oise" thatobscures the semantic content of the data [4]. In this context, two claims aretypically made for LSA: that it provides a substantial improvement in retrievalperformanceoverthestandardvectorspacemodelandthatthisimprovementresultsfrom LSA’s ability to solve what is known as the synonymy problem.Despite the many successful applications of LSA, there are a large numberof unanswered questions that bear on where, and in what manner, LSA should beapplied. The purpose of this paper is to begin to investigate these issues in LSA’soriginal context of information retrieval and to pose new directions for future work.Among the more critical questions that we address in this paper are the following: Does LSA reliably improve retrieval performance as compared to the vectorspace model? Does LSA improve retrieval performance by addressing the synonymy prob-lem? How can the optimal rank be chosen? How can relevant and irrelevant documents be distinguished? And are there alternative matrix techniques that can be used to discoverreduced representations?This paper is organized as follows. In Sections 0.2{0.3, we review the details of thevector space model and LSA. In Section 0.4, we outline our empirical methods. InSection 0.5, we compare the retrieval performances of LSA and the full-rank vectorspace model. In Section 0.6, we evaluate how the performance of LSA depends onits ability to handle synonyms. In Sections 0.7{0.8, we consider the choice of rankand how best to identify relevant documents. In Section 0.9, we examine the useother orthogonal decompositions for rank reduction. Finally, in Section 0.10, wesummarize our results.

Book ChapterDOI
01 Jan 2001
TL;DR: A simple and highly efficient system for computing useful word co–occurrence statistics, along with a number of criteria for optimizing and validating the resulting representations, and the consequences of the different methodologies for work within cognitive or neural computation are discussed.
Abstract: Several recent papers have described how lexical properties of words can be captured by simple measurements of which other words tend to occur close to them. At a practical level, word co–occurrence statistics are used to generate high dimensional vector space representations and appropriate distance metrics are defined on those spaces. The resulting co–occurrence vectors have been used to account for phenomena ranging from semantic priming to vocabulary acquisition. We have developed a simple and highly efficient system for computing useful word co–occurrence statistics, along with a number of criteria for optimizing and validating the resulting representations. Other workers have advocated various methods for reducing the number of dimensions in the co–occurrence vectors. LundB LandauerD and Lowe&McDonald [8] have used a statistical reliability criterion. We have used a simpler framework that orders and truncates the dimensions according to their word frequency. Here we compare how the different methods perform for two evaluation criteria and briefly discuss the consequences of the different methodologies for work within cognitive or neural computation.

11 Oct 2001
TL;DR: A Bayesian mixture model for probabilistic latent semantic analysis of documents with images and text and enables a priori knowledge, such as word and image preferences, to be encoded.
Abstract: We present a Bayesian mixture model for probabilistic latent semantic analysis of documents with images and text. The Bayesian perspective allows us to perform automatic regularisation to obtain sparser and more coherent clustering models. It also enables us to encode a priori knowledge, such as word and image preferences. The learnt model can be used for browsing digital databases, information retrieval with image and/or text queries, image annotation (adding words to an image) and text illustration (adding images to a text).

01 Jan 2001
TL;DR: The general principle under exploration the Distributional Hypothesis, which combines the convergence of these recent studies into a cognitive role for distributional information in explaining language ability, is called.
Abstract: Testing the Distributional Hypothesis: The Influence of Context on Judgements of Semantic Similarity Scott McDonald (scottm@cogsci.ed.ac.uk) Michael Ramscar (michael@cogsci.ed.ac.uk) Institute for Communicating and Collaborative Systems, University of Edinburgh 2 Buccleuch Place, Edinburgh EH8 9LW Scotland Abstract Distributional information has recently been implicated as playing an important role in several aspects of lan- guage ability. Learning the meaning of a word is thought to be dependent, at least in part, on exposure to the word in its linguistic contexts of use. In two experiments, we manipulated subjects’ contextual experience with mar- ginally familiar and nonce words. Results showed that similarity judgements involving these words were af- fected by the distributional properties of the contexts i n which they were read. The accrual of contextual experi- ence was simulated in a semantic space model, by succes- sively adding larger amounts of experience in the form of item-in-context exemplars sampled from the British National Corpus. The experiments and the simulation provide support for the role of distributional information in developing representations of word meaning. The Distributional Hypothesis The basic human ability of language understanding – mak- ing sense of another person’s utterances – does not develop in isolation from the environment. There is a growing body of research suggesting that distributional information plays a more powerful role than previously thought in a number of aspects of language processing. The exploita- tion of statistical regularities in the linguistic environment has been put forward to explain how language learners accomplish tasks from segmenting speech to bootstrap- ping word meaning. For example, Saffran, Aslin and Newport (1996) have demonstrated that infants are highly sensitive to simple conditional probability statistics, indicating how the ability to segment the speech stream into words may be realised. Adults, when faced with the task of identifying the word boundaries in an artificial language, also appear able to readily exploit such statistics (Saffran, Newport & Aslin, 1996). Redington, Chater and Finch (1998) have proposed that distributional information may contribute to the acquisition of syntactic knowledge by children. Useful information about the similarities and differences in the meaning of words has also been shown to be present in simple distributional statistics (e.g., Landauer & Dumais, 1997; McDonald, 2000). Based on the convergence of these recent studies into a cognitive role for distributional information in explaining language ability, we call the general principle under exploration the Distributional Hypothesis. The purpose of the present paper is to further test the distributional hypothesis, by examining the influence of context on similarity judgements involving marginally familiar and novel words. Our investigations are framed under the ‘semantic space’ approach to representing word meaning, to which we turn next. Distributional Models of Word Meaning The distributional hypothesis has provided the motivation for a class of objective statistical methods for representing meaning. Although the surge of interest in the approach arose in the fields of computational linguistics and infor- mation retrieval (e.g., Schutze, 1998; Grefenstette, 1994), where large-scale models of lexical semantics are crucial for tasks such as word sense disambiguation, high- dimensional ‘semantic space’ models are also useful tools for investigating how the brain represents the meaning of words. Word meaning can be considered to vary along many dimensions; semantic space models attempt to capture this variation in a coherent way, by positioning words in a geometric space. How to determine what the crucial dimensions are has been a long-standing problem; a recent and fruitful approach to this issue has been to label the dimensions of semantic space with words. A word is located in the space according to the degree to which it co- occurs with each of the words labelling the dimensions of the space. Co-occurrence frequency information is extracted from a record of language experience – a large corpus of natural language. Using this approach, two words that tend to occur in similar linguistic contexts – that is, they are distributionally similar – will be positioned closer together in semantic space than two words which are not as distributionally similar. Such simple distributional knowledge has been implicated in a variety of language processing behaviours, such as lexical priming (e.g., Lowe & McDonald, 2000; Lund, Burgess & Atchley, 1995; McDonald & Lowe, 1998), synonym selection (Landauer & Dumais, 1997), retrieval in analogical reason- ing (Ramscar & Yarlett, 2000) and judgements of semantic similarity (McDonald, 2000). Contextual co-occurrence, the fundamental relationship underlying the success of the semantic space approach to representing word meaning, can be defined in a number of ways. Perhaps the simplest (and the approach taken in the majority of the studies cited above) is to define co- occurrence in terms of a ‘context window’: the co-occur-

Proceedings ArticleDOI
06 Aug 2001
TL;DR: In an effort to bridge the gaps in knowledge about this research problem, an introduction to a novel automated text marker (ATM) prototype is given in this paper.
Abstract: A survey of major systems for the automated assessment of free-text answers is presented. This includes the Project Essay Grade (PEG), the Intelligent Essay Assessor (IEA), which employs latent semantic analysis (LSA), and the Electronic Essay Rater (E-Rater). All these systems have the same weakness in that they are unable to perform any assessment of text content. The word order is also not taken into account. In an effort to bridge the gaps in knowledge about this research problem, an introduction to a novel automated text marker (ATM) prototype is given in this paper.

Book ChapterDOI
03 Sep 2001
TL;DR: Random Indexing is applied on aligned bilingual corpora, producing French-English and Swedish-English thesauri that are used for cross-lingual query expansion in CLEF 2001.
Abstract: Random Indexing is a vector-based technique for extracting semantically similar words from the co-occurrence statistics of words in large text data. We have applied the technique on aligned bilingual corpora, producing French-English and Swedish-English thesauri that we have used for cross-lingual query expansion. In this paper, we report on our CLEF 2001 experiments on French-to-English and Swedish-to-English query expansion.

Proceedings Article
01 Sep 2001
TL;DR: Two approaches to vector-based call-routing are described, one based on matching queries to routes and the other on matching query directly to stored queries, and it is argued that there are some problems with the former approach.
Abstract: Two approaches to vector-based call-routing are described, one based on matching queries to routes and the other on matching queries directly to stored queries. We argue that there are some problems with the former approach, both when used directly and when latent semantic analysis (LSA) is used to reduce the dimensionality of the vectors. However, the second approach imposes a higher computational load than the first and we have experimented with reducing the number of reference vectors (using the multi-edit and condense algorithm) and the dimensionality of the vectors (using linear discriminant analysis (LDA)). Results are presented for the task of routing queries on banking and financial services to one of thirty-two destinations. Best results (5.1% routing error) were obtained by first using LSA to smooth the query vectors followed by LDA to increase discrimination and reduce vector dimensionality.

Proceedings ArticleDOI
Yihong Gong1, Xin Liu
10 Sep 2001
TL;DR: Two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents are proposed, which strive to select sentences that are highly ranked and different from each other.
Abstract: We propose two generic text summarization methods that create text summaries by ranking and extracting sentences from the original documents. The first method uses standard information retrieval methods to rank sentence relevances, while the second method uses the latent semantic analysis technique to identify semantically important sentences, for summary creations. Both methods strive to select sentences that are highly ranked and different from each other. This is an attempt to create a summary with a wider coverage of the document's main content and less redundancy. Performance evaluations on the two summarization methods are conducted by comparing their summarization outputs with the manual summaries generated by three independent human evaluators.

Proceedings ArticleDOI
09 Nov 2001
TL;DR: A framework for the automatic generation of links based on salient semantic structures extracted from homogeneous web repositories, and an imple-mentation of the framework is discussed, and results of the Latent Semantic Analysis linking service are presented.
Abstract: We present a framework for the automatic generation of links based on salient semantic structures extracted from homogeneous web repositories, and discuss an imple-mentation of the framework. For this study, we consider homogeneous the repositories of the eClass, an instrumented environment that automatically captures details of a lecture and provides effective multimedia-enhanced web-based in-terfaces for users to review the lecture, and the CoWeb, a web-based service for collaborative authoring of web-based material. We exploited Latent Semantic Analysis over data indexed by a general public license search engine. We exper-imented our service with data from a graduate course sup-ported by both eClass and CoWeb repositories. We present the results of the Latent Semantic Analysis linking service in the light of results previously obtained with our previous works.

04 Aug 2001
TL;DR: Experimental results show that accounting for semantic information in fact decreases the performances compared to LSI standalone, and the main weakenesses of the current hybrid scheme are discussed and several tracks for improvement are sketched.
Abstract: A new approach for constructing pseudo-keywords, referred to as Sense Units, is proposed. Sense Units are obtained by a word clustering process, where the underlying similarity reflects both statistical and semantic properties, respectively detected through Latent Semantic Analysis and WordNet. Sense Units are used to recode documents and are evaluated from the performance increase they permit in classification tasks. Experimental results show that accounting for semantic information in fact decreases the performances compared to LSI standalone. The main weakenesses of the current hybrid scheme are discussed and several tracks for improvement are sketched.

Book ChapterDOI
Preslav Nakov1
01 Oct 2001
TL;DR: The paper presents the results of experiments of usage of LSA for analysis of textual data and tests two hypotheses: 1) the texts by the same author are alike and can be distinguished from the ones by different person; 2) the prose and poetry can be automatically discovered.
Abstract: The paper presents the results of experiments of usage of LSA for analysis of textual data. The method is explained in brief and special attention is pointed on its potential for comparison and investigation of German literature texts. Two hypotheses are tested: 1) the texts by the same author are alike and can be distinguished from the ones by different person; 2) the prose and poetry can be automatically discovered.


01 Jan 2001
TL;DR: Connell et al. as mentioned in this paper used Latent Semantic Analysis (LSA) to simulate typicality effects by the use of a co-occurrence model of language and found that LSA successfully simulates subject data relating to typicality and the effects of context on categories.
Abstract: Using Distributional Measures to Model Typicality in Categorization Louise Connell (louise.connell@ucd.ie) Department of Computer Science, University College Dublin, Belfield, Dublin 4, Ireland Michael Ramscar (michael@cogsci.ed.ac.uk) School of Cognitive Science, University of Edinburgh, 2 Buccleuch Place, Edinburgh, EH8 9LW, Scotland. Abstract Typicality effects are ordinarily tied to concepts and conceptualization. The underlying assumption in much of categorization research is that effects such as typicality are reflective of stored conceptual structure. This paper questions this assumption by simulating typicality effects by the use of a co-occurrence model of language, Latent Semantic Analysis (LSA). Despite being a statistical tool based on simple word co- occurrence, LSA successfully simulates subject data relating to typicality effects and the effects of context on categories. Moreover, it does so without any explicit coding of categories or semantic features. The model is then used to successfully predict participants’ judgements of typicality in context. In the light of the findings reported here, we question the traditional interpretation of typicality data: are these data reflective of underlying structure in people’s concepts, or are they reflective of the distributional properties of the linguistic environments in which they find themselves. Introduction The world contains a myriad of objects and events that an intelligent observer could seemingly infinitely partition and generalise from. So how it is that humans can adopt a particular partitioning in the mass of data that confronts them? How do they pick out regularities in the stuff of experience and index them using words? What are these regularities? And how do humans recognise, communicate, learn and reason with them? These questions are central to cognitive science, and traditionally, their close linkage has tempted researchers to seek a unified answer to them: categorization – the act of grouping things in the world – has been commonly linked to the representation of concepts 1 , with many researchers assuming that a theory of one provides for the other (Armstrong, Gleitman & Gleitman, 1983; Keil, 1987; Lakoff, 1987). In the experiments reported, we follow the common assumption (Medin & Smith, 1984; Komatsu, 1992) that categories are classes, concepts are their mental representations and that an instance is a specific example of a category member. In much of this work, it is assumed that linguistic behavior (such as naming features associated with a concept, c.f. Rosch, 1973) is determined by, and reflective of, underlying concepts that are grounded in perceptual experience of objects and artifacts themselves. Here, we wish to consider the idea that language itself is part of the environment that determines conceptual behavior. A growing body of research indicates that distributional information may play a powerful role in many aspects of human cognition. In particular, it has been proposed that people can exploit statistical regularities in language to accomplish a range of conceptual and perceptual learning tasks. Saffran, Aslin & Newport (1996; see also Saffran, Newport, & Aslin; 1996) have demonstrated that infants and adults are sensitive to simple conditional probability statistics, suggesting one way in which the ability to segment the speech stream into words may be realized. Redington, Chater & Finch (1998) suggest that distributional information may contribute to the acquisition of syntactic knowledge by children. MacDonald & Ramscar (this volume) have shown how information derived from a 100 million word corpus can be used to manipulate subjects’ contextual experience with marginally familiar and nonce words, demonstrating that similarity judgements involving these words are affected by the distributional properties of the contexts in which they were read. The objective of this paper is to examine the extent to which co-occurrence techniques can model human categorization data: What is the relationship between typicality judgements and distributional information? Indeed, are the responses people provide in typicality experiments more reflective of the distributional properties of the linguistic environments in which they find themselves than they are of the underlying structure of people's concepts? Typicality Effects The first empirical evidence of typicality effects was provided by Rosch (1973), who found participants judged some category members as more (proto)typical than others. Rosch (1973) gave subjects a category name such as fruit with a list of members such as apple,

Proceedings ArticleDOI
09 Dec 2001
TL;DR: This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition, resulting in a 24% perplexity improvement overall when compared to a trigram language model.
Abstract: This paper describes the use of exponential models to improve non-negative matrix factorization (NMF) based topic language models for automatic speech recognition. This modeling technique borrows the basic idea from latent semantic analysis (LSA), which is typically used in information retrieval. An improvement was achieved when exponential models were used to estimate the a posteriori topic probabilities for an observed history. This method improved the perplexity of the NMF model, resulting in a 24% perplexity improvement overall when compared to a trigram language model.

Book ChapterDOI
10 Sep 2001
TL;DR: It is argued that the ability of LSI to reduce large dimensional spaces to a lower dimensional representation which is easier to understand can help in highlighting key relationships in the complexity of interactions between agent and environment.
Abstract: This paper describes the simulation of a foraging agent in an environment with a simple ecological structure, alternatively using one of three different control systems with varying degrees of memory These controllers are evolved to produce a range of emergent behaviours, which are analysed and compared using Latent Semantic Indexing (LSI): the behaviours are compared between controllers and in their evolutionary trajectories It is argued that the ability of LSI to reduce large dimensional spaces to a lower dimensional representation which is easier to understand can help in highlighting key relationships in the complexity of interactions between agent and environment