Showing papers on "Latent semantic analysis published in 2003"

PDF

Open Access

Journal Article•DOI•

Measuring praise and criticism: Inference of semantic orientation from association

[...]

Peter D. Turney¹, Michael L. Littman²•Institutions (2)

National Research Council¹, Rutgers University²

01 Oct 2003-ACM Transactions on Information Systems

TL;DR: This article introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words, based on two different statistical measures of word association.

...read moreread less

Abstract: The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This article introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8p on the full test set, but the accuracy rises above 95p when the algorithm is allowed to abstain from classifying mild words.

...read moreread less

1,651 citations

Proceedings Article•DOI•

On image auto-annotation with latent space models

[...]

Florent Monay, Daniel Gatica-Perez

02 Nov 2003

TL;DR: This paper applies and compares two simple latent space models commonly used in text analysis, namely Latent Semantic Analysis (LSA) and Probabilistic LSA (PLSA), and found that, on a 8000-image dataset, a classic LSA model defined on keywords and a very basic image representation performed as well as much more complex, state-of-the-art methods.

...read moreread less

Abstract: Image auto-annotation, i.e., the association of words to whole images, has attracted considerable attention. In particular, unsupervised, probabilistic latent variable models of text and image features have shown encouraging results, but their performance with respect to other approaches remains unknown. In this paper, we apply and compare two simple latent space models commonly used in text analysis, namely Latent Semantic Analysis (LSA) and Probabilistic LSA (PLSA). Annotation strategies for each model are discussed. Remarkably, we found that, on a 8000-image dataset, a classic LSA model defined on keywords and a very basic image representation performed as well as much more complex, state-of-the-art methods. Furthermore, non-probabilistic methods (LSA and direct image matching) outperformed PLSA on the same dataset.

...read moreread less

289 citations

Proceedings Article•DOI•

An Empirical Model of Multiword Expression Decomposability

[...]

Timothy Baldwin¹, Colin Bannard², Takaaki Tanaka³, Dominic Widdows¹•Institutions (3)

Stanford University¹, University of Edinburgh², Nippon Telegraph and Telephone³

12 Jul 2003

TL;DR: A construction-inspecific model of multiword expression decomposability based on latent semantic analysis is presented, and evidence is furnished for the calculated similarities being correlated with the semantic relational content of WordNet.

...read moreread less

Abstract: This paper presents a construction-inspecific model of multiword expression decomposability based on latent semantic analysis. We use latent semantic analysis to determine the similarity between a multiword expression and its constituent words, and claim that higher similarities indicate greater decomposability. We test the model over English noun-noun compounds and verb-particles, and evaluate its correlation with similarities and hyponymy values in WordNet. Based on mean hyponymy over partitions of data ranked on similarity, we furnish evidence for the calculated similarities being correlated with the semantic relational content of WordNet.

...read moreread less

239 citations

Journal Article•DOI•

Spatial iconicity affects semantic relatedness judgments.

[...]

Rolf A. Zwaan¹, Richard H. Yaxley¹•Institutions (1)

Florida State University¹

01 Dec 2003-Psychonomic Bulletin & Review

TL;DR: Three experiments were conducted to examine whether spatial iconicity affects semantic-relatedness judgments and showed that this effect did not occur when the words were presented horizontally, thus ruling out that the iconicity effect is due to the order in which the words are read.

...read moreread less

Abstract: Three experiments were conducted to examine whether spatial iconicity affects semantic-relatedness judgments. Subjects made speeded decisions with regard to whether members of a simultaneously presented word pair were semantically related. In Experiment 1, the words were presented one above the other. In the experimental pair, the words denoted parts of larger objects (e.g., ATTIC-BASEMENT). The words were either in an iconic relation with their referents (e.g., ATTIC presented above BASEMENT) or in a reverse-iconic relation (BASEMENT above ATTIC). The reverse-iconic condition yielded significantly slower semantic-relatedness judgments than did the iconic condition. Experiments 2 and 3 showed that this effect did not occur when the words were presented horizontally, thus ruling out that the iconicity effect is due to the order in which the words are read. Two alternative explanations for this finding are discussed.

...read moreread less

216 citations

Journal Article•DOI•

Assessing Reading Skill With a Think-Aloud Procedure and Latent Semantic Analysis

[...]

Joseph P. Magliano¹, Keith K. Millis•Institutions (1)

Northern Illinois University¹

01 Sep 2003-Cognition and Instruction

TL;DR: The authors used Latent Semantic Analysis (LSA) to estimate the semantic similarity between readers' think-aloud protocols to focal sentences and sentences in the stories that provided direct causal antecedents to the focal sentences.

...read moreread less

Abstract: The viability of assessing reading strategies is studied based on think-aloud protocols combined with Latent Semantic Analysis (LSA). Readers in two studies thought aloud after reading specific focal sentences embedded in two stories. LSA was used to estimate the semantic similarity between readers' think-aloud protocols to the focal sentences and sentences in the stories that provided direct causal antecedents to the focal sentences. Study 1 demonstrated that according to human- and LSA-based assessments of the protocols, the responses of less-skilled readers semantically overlapped more with the focal sentences than with the causal antecedent sentences, whereas the responses of skilled readers overlapped with these sentences equally. In addition, the extent that the semantic overlap with causal antecedents was greater than the overlap with the focal sentences predicted performance on comprehension test questions and the Nelson-Denny test of reading skill. Study 2 replicated these findings and also demon...

...read moreread less

174 citations

Proceedings Article•DOI•

Unsupervised methods for developing taxonomies by combining syntactic and statistical information

[...]

Dominic Widdows¹•Institutions (1)

Stanford University¹

27 May 2003

TL;DR: An unsupervised algorithm for placing unknown words into a taxonomy and its accuracy on a large and varied sample of words is evaluated and it is shown that automatic filtering using the class-labelling algorithm gives a fourfold improvement in accuracy.

...read moreread less

Abstract: This paper describes an unsupervised algorithm for placing unknown words into a taxonomy and evaluates its accuracy on a large and varied sample of words. The algorithm works by first using a large corpus to find semantic neighbors of the unknown word, which we accomplish by combining latent semantic analysis with part-of-speech information. We then place the unknown word in the part of the taxonomy where these neighbors are most concentrated, using a class-labelling algorithm developed especially for this task. This method is used to reconstruct parts of the existing Word-Net database, obtaining results for common nouns, proper nouns and verbs. We evaluate the contribution made by part-of-speech tagging and show that automatic filtering using the class-labelling algorithm gives a fourfold improvement in accuracy.

...read moreread less

135 citations

Patent•

Method and apparatus for assigning word prominence to new or previous information in speech synthesis

[...]

Jerome R. Bellegarda¹, Kim E. A. Silverman¹•Institutions (1)

Apple Inc.¹

14 May 2003

TL;DR: In this paper, a method and apparatus for generating speech that sounds more natural is presented, where word prominence and latent semantic analysis are used to generate more natural sounding speech, and a word prominence is assigned to a word in the current sentence in accordance with the information determination.

...read moreread less

Abstract: A method and apparatus is provided for generating speech that sounds more natural. In one embodiment, word prominence and latent semantic analysis are used to generate more natural sounding speech. A method for generating speech that sounds more natural may comprise generating synthesized speech having certain word prominence characteristics and applying a semantically-driven word prominence assignment model to specify word prominence consistent with the way humans assign word prominence. A speech representative of a current sentence is generated. The determination is made whether information in the current sentence is new or previously given in accordance with a semantic relationship between the current sentence and a number of preceding sentences. A word prominence is assigned to a word in the current sentence in accordance with the information determination.

...read moreread less

125 citations

Proceedings Article•DOI•

Automatic evaluation of students' answers using syntactically enhanced LSA

[...]

Dharmendra Kanejiya¹, Arun Kumar¹, Surendra Prasad¹•Institutions (1)

Indian Institutes of Technology¹

31 May 2003

TL;DR: Syntactically Enhanced LSA is presented here an approach which generalizes LSA by considering a word along with its syntactic neighborhood given by the part-of-speech tag of its preceding word, as a unit of knowledge representation, which provides better discrimination of syntactic-semantic knowledge representation than LSA.

...read moreread less

Abstract: Latent semantic analysis (LSA) has been used in several intelligent tutoring systems(ITS's) for assessing students' learning by evaluating their answers to questions in the tutoring domain. It is based on word-document co-occurrence statistics in the training corpus and a dimensionality reduction technique. However, it doesn't consider the word-order or syntactic information, which can improve the knowledge representation and therefore lead to better performance of an ITS. We present here an approach called Syntactically Enhanced LSA (SELSA) which generalizes LSA by considering a word along with its syntactic neighborhood given by the part-of-speech tag of its preceding word, as a unit of knowledge representation. The experimental results on Auto-Tutor task to evaluate students' answers to basic computer science questions by SELSA and its comparison with LSA are presented in terms of several cognitive measures. SELSA is able to correctly evaluate a few more answers than LSA but is having less correlation with human evaluators than LSA has. It also provides better discrimination of syntactic-semantic knowledge representation than LSA.

...read moreread less

95 citations

Patent•

Assistive call center interface

[...]

Ted H. Applebaum¹, Jean-Claude Junqua¹•Institutions (1)

Panasonic¹

04 Jun 2003

TL;DR: In this paper, an automatic speech recognition and semantic categorization system is used to convert unstructured voice input into structured data that can then be used to access one or more databases to retrieve associated supplemental data.

...read moreread less

Abstract: Unstructured voice information from an incoming caller is processed by automatic speech recognition and semantic categorization system to convert the information into structured data that may then be used to access one or more databases to retrieve associated supplemental data. The structured data and associated supplemental data are then made available through a presentation system that provides information to the call center agent and, optionally, to the incoming caller. The system thus allows a call center information processing system to handle unstructured voice input for use by the live agent in handling the incoming call and for storage and retrieval at a later time. The semantic analysis system may be implemented by a global parser or by an information retrieval technique, such as latent semantic analysis. Co-occurrence of keywords may be used to associate prior calls with an incoming call to assist in understanding the purpose of the incoming call.

...read moreread less

95 citations

Journal Article•DOI•

Essay Assessment with Latent Semantic Analysis

[...]

Tristan Miller¹•Institutions (1)

University of Toronto¹

01 Dec 2003-Journal of Educational Computing Research

TL;DR: This article examines the application of LSA to automated essay scoring, and compares LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA.

...read moreread less

Abstract: Latent semantic analysis (LSA) is an automated, statistical technique for comparing the semantic similarity of words or documents. In this article, I examine the application of LSA to automated essay scoring. I compare LSA methods to earlier statistical methods for assessing essay quality, and critically review contemporary essay-scoring systems built on LSA, including the Intelligent Essay Assessor, Summary Street, State the Essence, Apex, and Select-a-Kibitzer. Finally, I discuss current avenues of research, including LSA's application to computer-measured readability assessment and to automatic summarization of student essays.

...read moreread less

92 citations

Book Chapter•DOI•

Semantically Enhanced Collaborative Filtering on the Web

[...]

Bamshad Mobasher¹, Xin Jin¹, Yanzan Zhou¹•Institutions (1)

DePaul University¹

22 Sep 2003

TL;DR: This paper introduces an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions.

...read moreread less

Abstract: Item-based Collaborative Filtering (CF) algorithms have been designed to deal with the scalability problems associated with traditional user-based CF approaches without sacrificing recommendation or prediction accuracy. Item-based algorithms avoid the bottleneck in computing user-user correlations by first considering the relationships among items and performing similarity computations in a reduced space. Because the computation of item similarities is independent of the methods used for generating predictions, multiple knowledge sources, including structured semantic information about items, can be brought to bear in determining similarities among items. The integration of semantic similarities for items with rating- or usage-based similarities allows the system to make inferences based on the underlying reasons for which a user may or may not be interested in a particular item. Furthermore, in cases where little or no rating (or usage) information is available (such as in the case of newly added items, or in very sparse data sets), the system can still use the semantic similarities to provide reasonable recommendations for users. In this paper, we introduce an approach for semantically enhanced collaborative filtering in which structured semantic knowledge about items, extracted automatically from the Web based on domain-specific reference ontologies, is used in conjunction with user-item mappings to create a combined similarity measure and generate predictions. Our experimental results demonstrate that the integrated approach yields significant advantages both in terms of improving accuracy, as well as in dealing with very sparse data sets or new items.

...read moreread less

Journal Article•

An empirical comparison of text categorization methods

[...]

Ana Cardosc-Cachopo, Arlindo L. Oliveira

01 Jan 2003-Lecture Notes in Computer Science

TL;DR: In this article, a comparison of the performance of a number of text categorization methods in two different data sets is presented, and the results obtained using the Mean Reciprocal Rank as a measure of overall performance, a commonly used evaluation measure for question answering tasks.

...read moreread less

Abstract: In this paper we present a comprehensive comparison of the performance of a number of text categorization methods in two different data sets. In particular, we evaluate the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of the Vector and LSA models. We report the results obtained using the Mean Reciprocal Rank as a measure of overall performance, a commonly used evaluation measure for question answering tasks. We argue that this evaluation measure is also very well suited for text categorization tasks. Our results show that overall, SVMs and k-NN LSA perform better than the other methods, in a statistically significant way.

...read moreread less

Journal Article•DOI•

Use of latent semantic analysis for predicting psychological phenomena: two issues and proposed solutions.

[...]

Michael B. W. Wolfe¹, Susan R. Goldman²•Institutions (2)

Grand Valley State University¹, University of Illinois at Chicago²

01 Feb 2003-Behavior Research Methods Instruments & Computers

TL;DR: Two issues are discussed that researchers must attend to when evaluating the utility of LSA for predicting psychological phenomena, and LSA indices of similarity should be derived from theoretical analysis of the processes involved in understanding two conflicting accounts of a historical event.

...read moreread less

Abstract: Latent semantic analysis (LSA) is a computational model of human knowledge representation that approximates semantic relatedness judgments. Two issues are discussed that researchers must attend to when evaluating the utility of LSA for predicting psychological phenomena. First, the role of semantic relatedness in the psychological process of interest must be understood. LSA indices of similarity should then be derived from this theoretical understanding. Second, the knowledge base (semantic space) from which similarity indices are generated must contain 'knowledge' that is appropriate to the task at hand. Proposed solutions are illustrated with data from an experiment in which LSA-based indices were generated from theoretical analysis of the processes involved in understanding two conflicting accounts of a historical event. These indices predict the complexity of subsequent student reasoning about the event, as well as hand-coded predictions generated from think-aloud protocols collected when students were reading the accounts of the event.

...read moreread less

Book Chapter•DOI•

An Empirical Comparison of Text Categorization Methods

[...]

Ana Cardoso-Cachopo¹, Ana Cardoso-Cachopo², Arlindo L. Oliveira¹, Arlindo L. Oliveira²•Institutions (2)

Instituto Superior Técnico¹, INESC-ID²

08 Oct 2003

TL;DR: A comprehensive comparison of the performance of a number of text categorization methods in two different data sets is presented, in particular, the Vector and Latent Semantic Analysis (LSA) methods, a classifier based on Support Vector Machines (SVM) and the k-Nearest Neighbor variations of theVector and LSA models.

...read moreread less

Book Chapter•DOI•

Processing natural language without natural language processing

[...]

Eric D. Brill¹•Institutions (1)

Microsoft¹

16 Feb 2003

TL;DR: Recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods, suggesting that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.

...read moreread less

Abstract: We can still create computer programs displaying only the most rudimentary natural language processing capabilities. One of the greatest barriers to advanced natural language processing is our inability to overcome the linguistic knowledge acquisition bottleneck. In this paper, we describe recent work in a number of areas, including grammar checker development, automatic question answering, and language modeling, where state of the art accuracy is achieved using very simple methods whose power comes entirely from the plethora of text currently available to these systems, as opposed to deep linguistic analysis or the application of state of the art machine learning techniques. This suggests that the field of NLP might benefit by concentrating less on technology development and more on data acquisition.

...read moreread less

Journal Article•DOI•

Data‐driven approaches to information access

[...]

Susan T. Dumais¹•Institutions (1)

Microsoft¹

01 May 2003-Cognitive Science

TL;DR: This paper summarizes three lines of research that are motivated by the practical problem of helping users find information from external data sources, most notably computers, that believe that solutions to practical information access problems can shed light on human knowledge representation and reasoning.

...read moreread less

Book Chapter•DOI•

Discovery of implicit and explicit connections between people using email utterance

[...]

Robert McArthur, Peter Bruza

14 Sep 2003

TL;DR: A model called HALe is proposed which automatically derives dimensional representations of words in a high dimensional context space from an email corpus which is used to discover a network of people based on a seed contextual description.

...read moreread less

Abstract: This paper is about finding explicit and implicit connections between people by mining semantic associations from their email communications. Following from a socio-cognitive stance, we propose a model called HALe which automatically derives dimensional representations of words in a high dimensional context space from an email corpus. These dimensional representations are used to discover a network of people based on a seed contextual description. Such a network represents useful connections between people not easily achievable by 'normal' retrieval means. Implicit connections are "lifted" by applying latent semantic analysis to the high dimensional context space. The discovery techniques are applied to a substantial corpus of real-life email utterance drawn from a small-to-medium size information technology organization. The techniques are computationally tractable, and evidence is presented that suggests appropriate explicit connections are being brought to light, as well as interesting, and perhaps serendipitous implicit connections. The ultimate goal of such techniques is to bring to light context-sensitive, ephemeral, and often hidden relationships between people, and between people and information, which pervade the enterprise.

...read moreread less

Journal Article•DOI•

Semantic grounding in models of analogy: an environmental approach

[...]

Michael Ramscar¹, Michael Ramscar², Daniel Yarlett², Daniel Yarlett¹•Institutions (2)

Stanford University¹, University of Edinburgh²

01 Jan 2003-Cognitive Science

TL;DR: A new model (EMMA: the environmental model of analogy) which relies on co-occurrence information provided by LSA (Latent Semantic Analysis) to ground the relations between the symbolic elements aligned in analogy, and demonstrates that the environmental approach to semantics embodied in LSA can produce appropriate patterns of analogical retrieval, but that this semantic measure is not sufficient to model analogical mapping.

...read moreread less

Proceedings Article•DOI•

Latent concepts and the number orthogonal factors in latent semantic analysis

[...]

Georges Dupret

28 Jul 2003

TL;DR: By examining the number of times term t is identified for a search on term t' (precision) using differing ranges of dimensions, it is found that lower ranked dimensions identify related terms and higher-ranked dimensions discriminate between the synonyms.

...read moreread less

Abstract: We seek insight into Latent Semantic Indexing by establishing a method to identify the optimal number of factors in the reduced matrix for representing a keyword. This method is demonstrated empirically by duplicating all documents containing a term t, and inserting new documents in the database that replace t with t'. By examining the number of times term t is identified for a search on term t' (precision) using differing ranges of dimensions, we find that lower ranked dimensions identify related terms and higher-ranked dimensions discriminate between the synonyms.

...read moreread less

Book Chapter•DOI•

Efficiency and Reliability of Semantic Retrieval in DNA-Based Memories

[...]

Max H. Garzon¹, Kiranchand V. Bobba¹, Andrew Neel¹•Institutions (1)

University of Memphis¹

01 Jun 2003

TL;DR: Using a new type of memory compaction mechanism for data mining in vitro, DNA-based semantic retrieval compares favorably with statistically-based Latent Semantic Analysis (LSA), one of the best performers for semantic associative-based retrieval on text corpora.

...read moreread less

Abstract: Associative memories based on DNA-affinity have been proposed. Here, the efficiency, reliability, and semantic capability for associative retrieval of three models of a DNA-based memory are quantified and compared to current conventional methods. In affinity-based memories[1], retrievals and deletions under stringent conditions occur reliably (98%) within very short times (100 milliseconds), regardless of the degree of stringency of the recall or the number of simultaneous queries in the input. In a more sophisticated type of DNA-based memory B proposed and experimentally verified by Chen et al. [2] with three genomes, the sensitivity of the discrimination ability remains unchanged when used on a library of 18 plasmids in the range of 1-4kbps and does appear to grow exponentially with the number of library strands used, even under simultaneous multiple queries in the same input. Finally, using a new type of memory compaction mechanism for data mining in vitro, DNA-based semantic retrieval compares favorably with statistically-based Latent Semantic Analysis (LSA), one of the best performers for semantic associative-based retrieval on text corpora.

...read moreread less

Towards Deeper Understanding of the LSA Performance

[...]

Preslav Nakov, Elena Valchanova, Galia Angelova

01 Jan 2003

TL;DR: The results show that while the linguistic processing has a substantial influence on the LSA performance, the traditional factors are even more important, and therefore it did not prove that the linguistic pre-processing substantially improves text categorisation.

...read moreread less

Abstract: The paper presents on-going work towards deeper understanding of the factors influencing the performance of the Latent Semantic Analysis (LSA). Unlike previous attempts that concentrate on problems such as matrix elements weighting, space dimensionality selection, similarity measure etc., we primarily study the impact of another, often neglected, but fundamental element of LSA (and of any text processing technique): the definition of “word”. For the purpose, a balanced corpus of Bulgarian newspaper texts was carefully created, to allow for in-depth observations of the LSA performance, and series of experiments were performed in order to understand and compare (with respect to the task of text categorisation) six possible inputs with different level of linguistic quality, including: graphemic form as met in the text, stem, lemma, phrase, lemma&phrase and part-of-speech annotation. In addition to LSA, we made comparisons to the standard vector-space model, without any dimensionality reduction. The results show that while the linguistic processing has a substantial influence on the LSA performance, the traditional factors are even more important, and therefore we did not prove that the linguistic pre-processing substantially improves text categorisation.

...read moreread less

Proceedings Article•DOI•

Latent semantic analysis for dialogue act classification

[...]

Riccardo Serafin¹, Barbara Di Eugenio¹, Michael Glass²•Institutions (2)

University of Illinois at Chicago¹, Valparaiso University²

27 May 2003

TL;DR: These experiments in applying Latent Semantic Analysis (LSA) to dialogue act classification employ both LSA proper and LSA augmented in two ways, and report results on DIAG, the authors' own corpus of tutoring dialogues, and on the CallHome Spanish corpus.

...read moreread less

Abstract: This paper presents our experiments in applying Latent Semantic Analysis (LSA) to dialogue act classification. We employ both LSA proper and LSA augmented in two ways. We report results on DIAG, our own corpus of tutoring dialogues, and on the CallHome Spanish corpus. Our work has the theoretical goal of assessing whether LSA, an approach based only on raw text, can be improved by using additional features of the text.

...read moreread less

Contextual Effects on Metaphor Comprehension: Experiment and Simulation

[...]

Benoît Lemaire, Maryse Bianco

01 Jan 2003

TL;DR: It is shown that, for both humans and model, metaphors take longer to process than the literal meanings and second, an inductive context can shorten the processing time.

...read moreread less

Abstract: This paper presents a computational model of referential metaphor comprehension. This model is designed on top of Latent Semantic Analysis (LSA), a model of the representation of word and text meanings. Comprehending a referential metaphor consists in scanning the semantic neighbors of the metaphor in order to find words that are also semantically related to the context. The depth of that search is compared to the time it takes for humans to process a metaphor. In particular, we are interested in two independent variables : the nature of the reference (either a literal meaning or a figurative meaning) and the nature of the context (inductive or not inductive). We show that, for both humans and model, first, metaphors take longer to process than the literal meanings and second, an inductive context can shorten the processing time.

...read moreread less

Video content modeling with latent semantic analysis

[...]

Fabrice Souvannavong

01 Jan 2003

Posted Content•

Measuring Praise and Criticism: Inference of Semantic Orientation from Association

[...]

Peter D. Turney¹, Michael L. Littman²•Institutions (2)

National Research Council¹, Rutgers University²

19 Sep 2003-arXiv: Computation and Language

TL;DR: In this paper, the authors introduce a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words, based on pointwise mutual information (PMI) and latent semantic analysis (LSA).

...read moreread less

Abstract: The evaluative character of a word is called its semantic orientation. Positive semantic orientation indicates praise (e.g., "honest", "intrepid") and negative semantic orientation indicates criticism (e.g., "disturbing", "superfluous"). Semantic orientation varies in both direction (positive or negative) and degree (mild to strong). An automated system for measuring semantic orientation would have application in text classification, text filtering, tracking opinions in online discussions, analysis of survey responses, and automated chat systems (chatbots). This paper introduces a method for inferring the semantic orientation of a word from its statistical association with a set of positive and negative paradigm words. Two instances of this approach are evaluated, based on two different statistical measures of word association: pointwise mutual information (PMI) and latent semantic analysis (LSA). The method is experimentally tested with 3,596 words (including adjectives, adverbs, nouns, and verbs) that have been manually labeled positive (1,614 words) and negative (1,982 words). The method attains an accuracy of 82.8% on the full test set, but the accuracy rises above 95% when the algorithm is allowed to abstain from classifying mild words.

...read moreread less

Journal Article•DOI•

Natural language spoken interface control using data-driven semantic inference

[...]

Jerome R. Bellegarda¹, K.E.A. Silverman¹•Institutions (1)

Apple Inc.¹

09 Jul 2003-IEEE Transactions on Speech and Audio Processing

TL;DR: The concept of data-driven semantic inference is introduced, which in principle allows for any word constructs in command/query formulation, which is no longer necessary for users to memorize the exact syntax of every command.

...read moreread less

Abstract: Spoken interaction tasks are typically approached using a formal grammar as language model. While ensuring good system performance, this imposes a rigid framework on users, by implicitly forcing them to conform to a pre-defined interaction structure. This paper introduces the concept of data-driven semantic inference, which in principle allows for any word constructs in command/query formulation. Each unconstrained word string is automatically mapped onto the intended action through a semantic classification against the set of supported actions. As a result, it is no longer necessary for users to memorize the exact syntax of every command. The underlying (latent semantic analysis) framework relies on co-occurrences between words and commands, as observed in a training corpus. A suitable extension can also handle commands that are ambiguous at the word level. The behavior of semantic inference is characterized using a desktop user interface control task involving 113 different actions. Under realistic usage conditions, this approach exhibits a 2 to 5% classification error rate. Various training scenarios of increasing scope are considered to assess the influence of coverage on performance. Sufficient semantic knowledge about the task domain is found to be captured at a level of coverage as low as 70%. This illustrates the good generalization properties of semantic inference.

...read moreread less

Proceedings Article•

A revised algorithm for latent semantic analysis

[...]

Xiangen Hu¹, Zhiqiang Cai¹, Max M. Louwerse¹, Andrew Olney¹, Phanni Penumatsa¹, Arthur C. Graesser¹ - Show less +2 more•Institutions (1)

University of Memphis¹

09 Aug 2003

TL;DR: A new LSA algorithm significantly improves the precision of AutoTutor's natural language understanding and can be applied to othernatural language understanding applications.

...read moreread less

Abstract: The intelligent tutoring system AutoTutor uses latent semantic analysis to evaluate student answers to the tutor's questions. By comparing a student's answer to a set of expected answers, the system determines how much information is covered and how to continue the tutorial. Despite the success of LSA in tutoring conversations, the system sometimes has difficulties determining at an early stage whether or not an expectation is covered. A new LSA algorithm significantly improves the precision of AutoTutor's natural language understanding and can be applied to other natural language understanding applications.

...read moreread less

Journal Article•DOI•

Computerizing reading training: evaluation of a latent semantic analysis space for science text.

[...]

Christopher A. Kurby¹, Katja Wiemer-Hastings¹, Nagasai Ganduri¹, Joseph P. Magliano¹, Keith K. Millis¹, Danielle S. McNamara² - Show less +2 more•Institutions (2)

Northern Illinois University¹, University of Memphis²

01 May 2003-Behavior Research Methods Instruments & Computers

TL;DR: The effectiveness of a domain-specific latent semantic analysis (LSA) in assessing reading strategies was examined, and the science LSA space correlated highly with human judgments, and more highly than did the general reading space.

...read moreread less

Abstract: The effectiveness of a domain-specific latent semantic analysis (LSA) in assessing reading strategies was examined. Students were given self-explanation reading training (SERT) and asked to think aloud after each sentence in a science text. Novice and expert human raters and two LSA spaces (general reading, science) rated the similarity of each think-aloud protocol to benchmarks representing three different reading strategies (minimal, local, and global). The science LSA space correlated highly with human judgments, and more highly than did the general reading space. Also, cosines from the science LSA spaces can distinguish between different levels of semantic similarity, but may have trouble in distinguishing local processing protocols. Thus, a domain-specific LSA space is advantageous regardless of the size of the space. The results are discussedin the context of applying the science LSA to a computer-based version of SERT that gives online feedback based on LSA cosines.

...read moreread less

Proceedings Article•DOI•

Automatic junk e-mail filtering based on latent content

[...]

Jerome R. Bellegarda¹, Devang Naik¹, Kim E. A. Silverman¹•Institutions (1)

Apple Inc.¹

30 Nov 2003

TL;DR: Experiments show that the underlying framework is latent semantic analysis, which is competitive with the state-of-the-art in e-mail classification, and potentially advantageous in real-world applications with high junk-to-legitimate ratios.

...read moreread less

Abstract: The explosion in unsolicited mass electronic mail (junk e-mail) over the past decade has sparked interest in automatic filtering solutions. Traditional techniques tend to rely on header analysis, keyword/keyphrase matching and analogous rule-based predicates, and/or some probabilistic model of text generation. This paper aims instead at deciding whether or not the latent subject matter is consistent with the user's interests. The underlying framework is latent semantic analysis: each e-mail is automatically classified against two semantic anchors, one for legitimate and one for junk messages. Experiments show that this approach is competitive with the state-of-the-art in e-mail classification, and potentially advantageous in real-world applications with high junk-to-legitimate ratios. The resulting technology has been successfully released in August 2002 as part of the e-mail client bundled with the MacOS 10.2 operating system.

...read moreread less

Proceedings Article•DOI•

Latent semantic information in maximum entropy language models for conversational speech recognition

[...]

Yonggang Deng¹, Sanjeev Khudanpur¹•Institutions (1)

Johns Hopkins University¹

27 May 2003

TL;DR: An investigation into the use of LSA in language modeling for conversational speech recognition finds that previously proposed methods of combining an LSA-based unigram model with an N-gram model yield much smaller reductions in perplexity on speech transcriptions than has been reported on written text.

...read moreread less

Abstract: Latent semantic analysis (LSA), first exploited in indexing documents for information retrieval, has since been used by several researchers to demonstrate impressive reductions in the perplexity of statistical language models on text corpora such as the Wall Street Journal. In this paper we present an investigation into the use of LSA in language modeling for conversational speech recognition. We find that previously proposed methods of combining an LSA-based unigram model with an N-gram model yield much smaller reductions in perplexity on speech transcriptions than has been reported on written text. We next present a family of exponential models in which LSA similarity is a feature of a word-history pair. The maximum entropy model in this family yields a greater reduction in perplexity, and statistically significant improvements in recognition accuracy over a trigram model on the Switchboard corpus. We conclude with a comparison of this LSA-featured model with a previously proposed topic-dependent maximum entropy model.

...read moreread less