Showing papers on "Sentiment analysis published in 2006"

PDF

Open Access

Proceedings Article•

SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining

[...]

Andrea Esuli¹, Fabrizio Sebastiani²•Institutions (2)

National Research Council¹, University of Padua²

01 Jan 2006

TL;DR: SENTIWORDNET is a lexical resource in which each WORDNET synset is associated to three numerical scores Obj, Pos and Neg, describing how objective, positive, and negative the terms contained in the synset are.

...read moreread less

Abstract: Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinion it expresses. OM has a rich set of applications, ranging from tracking users opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the PNpolarity of subjective terms, i.e. identify whether a term that is a marker of opinionated content has a positive or a negative connotation. Research on determining whether a term is indeed a marker of opinionated content (a subjective term) or not (an objective term) has been instead much scarcer. In this work we describe SENTIWORDNET, a lexical resource in which each WORDNET synset sis associated to three numerical scores Obj(s), Pos(s) and Neg(s), describing how objective, positive, and negative the terms contained in the synset are. The method used to develop SENTIWORDNET is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classi.cation. The three scores are derived by combining the results produced by a committee of eight ternary classi.ers, all characterized by similar accuracy levels but different classification behaviour. SENTIWORDNET is freely available for research purposes, and is endowed with a Web-based graphical user interface.

...read moreread less

2,625 citations

Proceedings Article•

Predicting Movie Sales from Blogger Sentiment

[...]

Gilad Mishne¹, Natalie S. Glance•Institutions (1)

University of Amsterdam¹

01 Jan 2006

TL;DR: The main finding is that positive sentiment is indeed a better predictor for movie success when applied to a limited context around references to the movie in weblogs, posted prior to its release.

...read moreread less

Abstract: The volume of discussion about a product in weblogs has recently been shown to correlate with the product’s financial performance. In this paper, we study whether applying sentiment analysis methods to weblog data results in better correlation than volume only, in the domain of movies. Our main finding is that positive sentiment is indeed a better predictor for movie success when applied to a limited context around references to the movie in weblogs, posted prior to its release. If my film makes one more person miserable, I’ve done my job.

...read moreread less

354 citations

Proceedings Article•DOI•

Seeing stars when there aren’t many stars: Graph-based semi-supervised learning for sentiment categorization

[...]

Andrew Goldberg¹, Xiaojin Zhu¹•Institutions (1)

University of Wisconsin-Madison¹

09 Jun 2006

TL;DR: A graph-based semi-supervised learning algorithm is presented to address the sentiment analysis task of rating inference and achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.

...read moreread less

Abstract: We present a graph-based semi-supervised learning algorithm to address the sentiment analysis task of rating inference. Given a set of documents (e.g., movie reviews) and accompanying ratings (e.g., "4 stars"), the task calls for inferring numerical ratings for unlabeled documents based on the perceived sentiment expressed by their text. In particular, we are interested in the situation where labeled data is scarce. We place this task in the semi-supervised setting and demonstrate that considering unlabeled reviews in the learning process can improve rating-inference performance. We do so by creating a graph on both labeled and unlabeled data to encode certain assumptions for this task. We then solve an optimization problem to obtain a smooth rating function over the whole graph. When only limited labeled data is available, this method achieves significantly better predictive accuracy over other methods that ignore the unlabeled examples during training.

...read moreread less

348 citations

Proceedings Article•DOI•

Examining the Role of Linguistic Knowledge Sources in the Automatic Identification and Classification of Reviews

[...]

Vincent Ng¹, Sajib Dasgupta¹, S. M. Niaz Arifin¹•Institutions (1)

University of Texas at Dallas¹

17 Jul 2006

TL;DR: It is demonstrated that review identification can be performed with high accuracy using only unigrams as features, and the role of four types of simple linguistic knowledge sources in a polarity classification system is examined.

...read moreread less

Abstract: This paper examines two problems in document-level sentiment analysis: (1) determining whether a given document is a review or not, and (2) classifying the polarity of a review as positive or negative. We first demonstrate that review identification can be performed with high accuracy using only unigrams as features. We then examine the role of four types of simple linguistic knowledge sources in a polarity classification system.

...read moreread less

262 citations

Proceedings Article•DOI•

Feature Subsumption for Opinion Analysis

[...]

Ellen Riloff¹, Siddharth Patwardhan¹, Janyce Wiebe²•Institutions (2)

University of Utah¹, University of Pittsburgh²

22 Jul 2006

TL;DR: It is shown that reducing the feature set improves performance on three opinion classification tasks, especially when combined with traditional feature selection.

...read moreread less

Abstract: Lexical features are key to many approaches to sentiment analysis and opinion detection. A variety of representations have been used, including single words, multi-word Ngrams, phrases, and lexico-syntactic patterns. In this paper, we use a subsumption hierarchy to formally define different types of lexical features and their relationship to one another, both in terms of representational coverage and performance. We use the subsumption hierarchy in two ways: (1) as an analytic tool to automatically identify complex features that outperform simpler features, and (2) to reduce a feature set by removing unnecessary features. We show that reducing the feature set improves performance on three opinion classification tasks, especially when combined with traditional feature selection.

...read moreread less

255 citations

Proceedings Article•DOI•

Utility scoring of product reviews

[...]

Zhu Zhang¹, Balaji Varadarajan¹•Institutions (1)

University of Arizona¹

06 Nov 2006

TL;DR: A new task in the ongoing research in text sentiment analysis: predicting utility of product reviews, which is orthogonal to polarity classification and opinion extraction is identified, and regression models are built by incorporating a diverse set of features.

...read moreread less

Abstract: We identify a new task in the ongoing research in text sentiment analysis: predicting utility of product reviews, which is orthogonal to polarity classification and opinion extraction. We build regression models by incorporating a diverse set of features, and achieve highly competitive performance for utility scoring on three real-world data sets.

...read moreread less

235 citations

Journal Article•DOI•

The importance of neutral examples for learning sentiment

[...]

Moshe Koppel¹, Jonathan Schler¹•Institutions (1)

Bar-Ilan University¹

01 May 2006

TL;DR: It is shown that it is crucial to use neutral examples in learning polarity for a variety of reasons, and the use of neutral training examples inlearning facilitates better distinction between positive and negative examples.

...read moreread less

Abstract: Most research on learning to identify sentiment ignores “neutral” examples, learning only from examples of significant (positive or negative) polarity. We show that it is crucial to use neutral examples in learning polarity for a variety of reasons. Learning from negative and positive examples alone will not permit accurate classification of neutral examples. Moreover, the use of neutral training examples in learning facilitates better distinction between positive and negative examples.

...read moreread less

192 citations

SentiWordNet: A High-Coverage Lexical Resource for Opinion Mining

[...]

Andrea Esuli¹, Fabrizio Sebastiani¹•Institutions (1)

National Research Council¹

01 Jan 2006

TL;DR: SentiWordNet is described, a lexical resource produced by asking an automated classifier Φ to associate to each synset s of WordNet (version 2.0) a triplet of scores Φ(s, p) describing how strongly the terms contained in s enjoy each of the three properties.

...read moreread less

Abstract: Opinion mining (OM) is a recent subdiscipline at the crossroads of information retrieval and computational linguistics which is concerned not with the topic a document is about, but with the opinions it expresses. OM has a rich set of applications, ranging from tracking users’ opinions about products or about political candidates as expressed in online forums, to customer relationship management. In order to aid the extraction of opinions from text, recent research has tried to automatically determine the “PN-polarity” of subjective terms, i.e. identify whether a term that indicates the presence of an opinion has a positive or a negative connotation. Research on determining the “SO-polarity” of terms, i.e. whether a term indeed indicates the presence of an opinion (a subjective term) or not (an objective, or neutral term) has been instead much scarcer. In this paper we describe SentiWordNet, a lexical resource produced by asking an automated classifier Φ to associate to each synset s of WordNet (version 2.0) a triplet of scores Φ(s, p) (for p ∈ P ={Positive, Negative, Objective}) describing how strongly the terms contained in s enjoy each of the three properties. The method used to develop SentiWordNet is based on the quantitative analysis of the glosses associated to synsets, and on the use of the resulting vectorial term representations for semi-supervised synset classification. The score triplet is derived by combining the results produced by a committee of eight ternary classifiers, all characterized by similar accuracy levels but extremely different classification behaviour. We present the results of evaluating the accuracy of the automatically assigned triplets on a publicly available benchmark. SentiWordNet is freely available for research purposes, and is endowed with a Web-based graphical user interface.

...read moreread less

180 citations

Journal Article•DOI•

Recognizing strong and weak opinion clauses

[...]

Theresa Wilson¹, Janyce Wiebe¹, Rebecca Hwa¹•Institutions (1)

University of Pittsburgh¹

01 May 2006

TL;DR: This paper presents the first experimental results classifying the intensity of opinions and other types of subjectivity and classifies the subjectivity of deeply nested clauses using a wide range of features, including new syntactic features developed for opinion recognition.

...read moreread less

Abstract: There has been a recent swell of interest in the automatic identification and extraction of opinions and emotions in text. In this paper, we present the first experimental results classifying the intensity of opinions and other types of subjectivity and classifying the subjectivity of deeply nested clauses. We use a wide range of features, including new syntactic features developed for opinion recognition. We vary the learning algorithm and the feature organization to explore the effect this has on the classification task. In 10-fold cross-validation experiments using support vector regression, we achieve improvements in mean-squared error over baseline ranging from 49% to 51%. Using boosting, we achieve improvements in accuracy ranging from 23% to 96%. In the past few years, interest in the automatic identification and extraction of attitudes, opinions, and sentiments in text has been growing rapidly. This task is motivated by the desire to provide tools and support for information analysts in government, commercial, and political domains, who want to automatically track attitudes and feelings in the news and online forums. How do people feel about recent events in the Middle East? Is the rhetoric from a particular opposition group intensifying? Is there a change in the attitudes being expressed toward the war in Iraq? A system that could automatically identify and extract opinions and emotions from text would be an enormous help to someone trying to answer these kinds of questions. To date, the majority of work on subjectivity and sentiment analysis has focused on classification at the document or sentence level. Document classification tasks include, for example, distinguishing editorials from news articles and classifying reviews as positive or negative. A common sentence-level task is to classify sentences as subjective or objective. However, for many applications, simply recognizing which documents or sentences are opinionated may not be sufficient. Opinions vary in their intensity, and many applications would benefit from being able to determine not only if an opinion is being presented, but how strong is the opinion. Flame detection systems, for example, seek to identify strong rants and emotional tirades, while letting milder opinions pass through. Information analysts need tools that will help them to recognize changes over time in the virulence expressed by persons or groups of interest, and to detect when rhetoric is heating up, or cooling down. A further challenge with automatic opinion identification is that it is not uncommon to find two or more opinions in a single sentence, or to find a sentence containing opinions as well as factual information. Information extraction (IE) systems are natural language processing (NLP) systems that extract from text any information relevant to a prespecified topic. An IE system trying to distinguish between factual information (which should be extracted) and non-factual information (which should be discarded or labeled uncertain) would benefit from the ability to pinpoint the particular clauses that contain opinions. This ability would also be important for multi-perspective question answering systems, which aim to present multiple answers to non-factual questions based on opinions derived from different sources, and for

...read moreread less

173 citations

Journal Article•DOI•

Learning to laugh (automatically): computational models for humor recognition

[...]

Rada Mihalcea¹, Carlo Strapparava•Institutions (1)

University of North Texas¹

01 May 2006

TL;DR: Through experiments performed on very large data sets, it is shown that automatic classification techniques can be effectively used to distinguish between humorous and non‐humorous texts, with significant improvements observed over a priori known baselines.

...read moreread less

Abstract: Humor is one of the most interesting and puzzling aspects of human behavior. Despite the attention it has received in fields such as philosophy, linguistics, and psychology, there have been only few attempts to create computational models for humor recognition or generation. In this article, we bring empirical evidence that computational approaches can be successfully applied to the task of humor recognition. Through experiments performed on very large data sets, we show that automatic classification techniques can be effectively used to distinguish between humorous and non-humorous texts, with significant improvements observed over a priori known baselines.

...read moreread less

150 citations

Proceedings Article•

A Preliminary Investigation into Sentiment Analysis of Informal Political Discourse.

[...]

Tony Mullen, Robert Malouf¹•Institutions (1)

San Diego State University¹

01 Jan 2006

TL;DR: Preliminary statistical tests on a new dataset of political discussion group postings indicate that posts made in direct response to other posts in a thread have a strong tendency to represent an opposing political viewpoint to the original post.

...read moreread less

Abstract: With the rise of weblogs and the increasing tendency of online publications to turn to message-board style reader feedback venues, informal political discourse is becoming an important feature of the intellectual landscape of the Internet, creating a challenging and worthwhile area for experimentation in techniques for sentiment analysis. We describe preliminary statistical tests on a new dataset of political discussion group postings which indicate that posts made in direct response to other posts in a thread have a strong tendency to represent an opposing political viewpoint to the original post. We conclude that traditional text classification methods will be inadequate to the task of sentiment analysis in this domain, and that progress is to be made by exploiting information about how posters interact with each

...read moreread less

Proceedings Article•DOI•

Sentiment Retrieval using Generative Models

[...]

Koji Eguchi¹, Victor Lavrenko²•Institutions (2)

National Institute of Informatics¹, University of Massachusetts Amherst²

22 Jul 2006

TL;DR: This paper proposes several sentiment information retrieval models in the framework of probabilistic language models, assuming that a user both inputs query terms expressing a certain topic and also specifies a sentiment polarity of interest in some manner.

...read moreread less

Abstract: Ranking documents or sentences according to both topic and sentiment relevance should serve a critical function in helping users when topics and sentiment polarities of the targeted text are not explicitly given, as is often the case on the web. In this paper, we propose several sentiment information retrieval models in the framework of probabilistic language models, assuming that a user both inputs query terms expressing a certain topic and also specifies a sentiment polarity of interest in some manner. We combine sentiment relevance models and topic relevance models with model parameters estimated from training data, considering the topic dependence of the sentiment. Our experiments prove that our models are effective.

...read moreread less

Proceedings Article•DOI•

User-directed Sentiment Analysis: Visualizing the Affective Content of Documents

[...]

Michelle L. Gregory¹, Nancy Chinchor, Paul D. Whitney¹, Richard Carter¹, Elizabeth G. Hetzler¹, Alan E. Turner¹ - Show less +2 more•Institutions (1)

Pacific Northwest National Laboratory¹

22 Jul 2006

TL;DR: Approaches for visualizing the affective content of documents and an interactive capability for exploring emotion in a large document collection are discussed.

...read moreread less

Abstract: Recent advances in text analysis have led to finer-grained semantic analysis, including automatic sentiment analysis---the task of measuring documents, or chunks of text, based on emotive categories, such as positive or negative. However, considerably less progress has been made on efficient ways of exploring these measurements. This paper discusses approaches for visualizing the affective content of documents and describes an interactive capability for exploring emotion in a large document collection.

...read moreread less

Proceedings Article•DOI•

Sentiment Classification for Movie Reviews in Chinese by Improved Semantic Oriented Approach

[...]

Qiang Ye¹, Wen Shi², Yijun Li¹•Institutions (2)

Harbin Institute of Technology¹, Northeast Agricultural University²

04 Jan 2006

TL;DR: The improved semantic approach for sentiment classification on movie reviews written in Chinese was proposed and data experiment shows the capability of this approach.

...read moreread less

Abstract: Sentiment classification aims at mining reviews of customers for a certain product by automatic classifying the reviews into positive or negative opinions. With the fast developing of World Wide Web applications, sentiment classification would have huge opportunity to help people automatic analysis of customers opinions from the web information. Automatic opinion mining will benefit to both consumers and sellers. Up to now, it is still a complicated task with great challenge. There are mainly two types of approaches for sentiment classification, machine learning methods and semantic orientation methods. Though some pioneer researches explored the approaches for English movie review classification, few jobs have been done on sentiment classification for Chinese reviews. The improved semantic approach for sentiment classification on movie reviews written in Chinese was proposed in this paper. Data experiment shows the capability of this approach.

...read moreread less

Proceedings Article•DOI•

Partially Supervised Coreference Resolution for Opinion Summarization through Structured Rule Learning

[...]

Veselin Stoyanov¹, Claire Cardie¹•Institutions (1)

Cornell University¹

22 Jul 2006

TL;DR: This work proposes and evaluates a new algorithm for the task of source coreference resolution that outperforms competitive baselines and approaches the problem as the novel problem of partially supervised clustering.

...read moreread less

Abstract: Combining fine-grained opinion information to produce opinion summaries is important for sentiment analysis applications. Toward that end, we tackle the problem of source coreference resolution -- linking together source mentions that refer to the same entity. The partially supervised nature of the problem leads us to define and approach it as the novel problem of partially supervised clustering. We propose and evaluate a new algorithm for the task of source coreference resolution that outperforms competitive baselines.

...read moreread less

Proceedings Article•

Blog Mining Through Opinionated Words

[...]

Giuseppe Attardi¹, Maria Simi¹•Institutions (1)

University of Pisa¹

01 Jan 2006

TL;DR: A single stage approach to opinion mining is explored, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags, showing a significant improvement in precision for both topic relevance and opinion relevance.

...read moreread less

Abstract: IIntent mining is a special kind of document analysis whose goal is to assess the attitude of the document author with respect to a given subject. Opinion mining is a kind of intent mining where the attitude is a positive or negative opinion. Most systems tackle the problem with a two step approach, an information retrieval followed by a postprocess or filter phase to identify opinionated blogs. We explored a single stage approach to opinion mining, retrieving opinionated documents ranked with a special ranking function which exploits an index enriched with opinion tags. A set of subjective words are used as tags for identifying opinionated sentences. Subjective words are marked as “opinionated” and are used in the retrieval phase to boost the rank of documents containing them. In indexing the collection, we recovered the relevant content from the blog permalink pages, exploiting HTML metadata about the generator and heuristics to remove irrelevant parts from the body. The index also contains information about the occurrence of opinionated words, extracted from an analysis of WordNet glosses. The experiments compared the precision of normal queries with respect to queries which included as constraint the proximity to an opinionated word. The results show a significant improvement in precision for both topic relevance and opinion relevance.

...read moreread less

Proceedings Article•

Domain Specific Affective Classification of Documents.

[...]

Sara Owsley¹, Sanjay Sood¹, Kristian J. Hammond•Institutions (1)

Northwestern University¹

21 Aug 2006

TL;DR: A set of techniques that can be used to classify weblogs (blogs) by emotional content are described, which aims to generate domain specific sentiment classifiers to determine the emotional state of weblogs in that domain.

...read moreread less

Abstract: In this paper, we describe a set of techniques that can be used to classify weblogs (blogs) by emotional content. Instead of using a general purpose emotional classification strategy, our technique aims to generate domain specific sentiment classifiers that can be used to determine the emotional state of weblogs in that domain.

...read moreread less

Proceedings Article•

Multiple Ranking Strategies for Opinion Retrieval in Blogs

[...]

Gilad Mishne¹•Institutions (1)

University of Amsterdam¹

01 Jan 2006

TL;DR: This article used a combination of shallow sentiment analysis, spam detection, and link-based authority estimation to identify opinions in blog posts, which yielded a significant improvement over a content-only baseline.

...read moreread less

Abstract: We describe our participation in the Opinion Retrieval task at TREC 2006. Our approach to identifying opinions in blog post consisted of scoring the posts separately on various aspects associated with an expression of opinion about a topic, including shallow sentiment analysis, spam detection, and link-based authority estimation. The separate approaches were combined into a single ranking, yielding significant improvement over a content-only baseline.

...read moreread less

Journal Article•DOI•

The semantic web as a Linguistic resource: opportunities for natural language generation

[...]

Chris Mellish¹, Xiantang Sun¹•Institutions (1)

King's College, Aberdeen¹

01 Sep 2006-Knowledge Based Systems

TL;DR: It is argued that, because the documents of the semantic web are created by human beings, they are actually much more like natural language documents than theory would have us believe.

...read moreread less

Abstract: This paper argues that, because the documents of the semantic web are created by human beings, they are actually much more like natural language documents than theory would have us believe. We present evidence that natural language words are used extensively and in complex ways in current ontologies. This leads to a number of dangers for the semantic web, but also opens up interesting new challenges for natural language processing. This is illustrated by our own work using natural language generation to present parts of ontologies.

...read moreread less

Journal Article•DOI•

A Survey of Sentiment Analysis

[...]

Takashi Inui, Manabu Okumura

01 Jan 2006

TL;DR: In this article, the authors discuss the effects of different types of transformations on the quality of a person's life and their ability to adapt to changes in the real world, such as:

...read moreread less

Abstract: インターネットが普及し, 一般の個人が手軽に情報発信できる環境が整ってきている. この個人の発信する情報には, ある対象に関するその人の評価等, 個人の意見が多く記述される.これらの評価情報を抽出し, 整理し, 提示することは, 対象の提供者である企業や, 対象を利用する立場の一般の人々双方にとって利点となる.このため, 自然言語処理の分野では, 近年急速に評価情報を扱う研究が活発化している.本論文では, このような現状の中, テキストから評価情報を発見, 抽出および整理, 集約する技術について, その基盤となる研究から最近の研究までを概説する.

...read moreread less

Automatic Dream Sentiment Analysis

[...]

David Nadeau, Catherine Sabourin, Joseph De Koninck, Stan Matwin, Peter D. Turney - Show less +1 more

01 Jan 2006

TL;DR: It is shown that machine learning allows automating the human judgment with accuracy superior to majority class choice, as a first step toward automatic analysis of sentiments in dreams.

...read moreread less

Abstract: In this position paper, we propose a first step toward automatic analysis of sentiments in dreams. 100 dreams were sampled from a dream bank created for a normative study of dreams. Two human judges assigned a score to describe dream sentiments. We ran four baseline algorithms in an attempt to automate the rating of sentiments in dreams. Particularly, we compared the General Inquirer (GI) tool, the Linguistic Inquiry and Word Count (LIWC), a weighted version of the GI lexicon and of the HM lexicon and a standard bag-of-words. We show that machine learning allows automating the human judgment with accuracy superior to majority class choice.

...read moreread less

Proceedings Article•DOI•

Using Bilingual Lexicon to Judge Sentiment Orientation of Chinese Words

[...]

Jianxin Yao¹, Gengfeng Wu¹, Jian Liu¹, Yu Zheng¹•Institutions (1)

Shanghai University¹

20 Sep 2006

TL;DR: A new method for determining the sentiment orientation of the Chinese words by using bilingual lexicons using SVM and C4.5 is proposed.

...read moreread less

Abstract: It is a challenging task to identify sentiments (the affective parts of options) of reviews. One of the most important problems is to predict the sentiment orientation of the words. This paper proposes a new method for determining the sentiment orientation of the Chinese words by using bilingual lexicons. Given a Chinese word, we observe the occurrences of English sentiment words in its interpretations, to predict the sentiment orientation of the Chinese word. The whole process can be illustrated logically as follows: (1) translate a Chinese word into Chinese-English interpretation; (2) generate an English word sequence by parsing the interpretation; (3) calculate the sentiment vector from the English word sequence; (4) use a classifier to predict the sentiment orientation for the Chinese word. The performance of two kinds of classifiers (SVM and C4.5) is studied. The experiments show that the proposed method performed well and achieved high accuracy.

...read moreread less

Sentiment Classification Techniques for Tracking Literary Reputation

[...]

Maite Taboada, Mary Ann Gillies, Paul McFetridge

01 Jan 2006

TL;DR: The authors used the semantic orientation of adjectives and their rough position in the text to calculate the overall orientation of the text and suggest ways in which this calculation can be improved, including further development of adjective lists, expansion of these lists and the consequent algorithms for calculating orientation to include other parts of speech, and the use of Rhetorical Structure Theory to differentiate units that make a direct contribution to the intended orientation from those that are contrastive or otherwise make an indirect contribution.

...read moreread less

Abstract: The initial stages of a project tracking the literary reputation of authors are described. The critical reviews of six authors who either rose to fame or fell to obscurity between 1900 and 1950 will be examined and we hope to demonstrate the contribution of each text to the evolving reputations of the authors. We provide an initial report on the use of the semantic orientation of adjectives and their rough position in the text to calculate the overall orientation of the text and suggest ways in which this calculation can be improved. Improvements include further development of adjective lists, expansion of these lists and the consequent algorithms for calculating orientation to include other parts of speech, and the use of Rhetorical Structure Theory to differentiate units that make a direct contribution to the intended orientation from those that are contrastive or otherwise make an indirect contribution. ∗ In Proceedings of LREC 2006 Workshop “Towards Computational Models of Literary Analysis”, pp. 36-43.

...read moreread less

Proceedings Article•

BlogHarvest: Blog Mining and Search Framework.

[...]

Mukul Joshi, Nikhil Belsare

01 Jan 2006

TL;DR: BlogHarvest is demonstrated which is a blog mining and search framework that extracts the interests of the blogger, finds and recommends blogs with similar topics and provides blog oriented search functionality.

...read moreread less

Abstract: Beyond serving as online diaries, weblogs have evolved into complex social structures. Blogging software allows users to publish opinions on any topic without any constraints on the predefined schema. Analysis of linkage between blogs has indicated that community forming in blogosphere is not a random process but is a result of shared interests binding bloggers together. Learning, analysis and usage of the user's interest and social linkage from the blog is therefore necessary to provide useful search faculty on the blogosphere to bloggers and revenue generation opportunities like advertising to the blog service providers. In this paper, we demonstrate BlogHarvest which is a blog mining and search framework that extracts the interests of the blogger, finds and recommends blogs with similar topics and provides blog oriented search functionality. BlogHarvest uses classification, linkage & topic similarity based clustering and POS tagging based opinion mining for providing these features. Novel search interface is built to provide related blogs for queries along with the usual result ranking. Association rules found from POS tags are used to get the context of search for providing query expansion to get targeted results. By crawling the blogosphere and extract & index blog posts and linkage metadata; we have analyzed around 50000 blogs to tune our algorithms.

...read moreread less

Patent•

Lexicon generation methods, computer implemented lexicon editing methods, lexicon generation devices, lexicon editors, and articles of manufacture

[...]

Alan E. Turner¹, Elizabeth G. Hetzler¹, Christian Posse¹•Institutions (1)

Battelle Memorial Institute¹

30 Jun 2006

TL;DR: In this article, a lexicon generation method is described, which includes providing a seed vector indicative of occurrences of a plurality of seed terms within a text item, and content vectors indicative of the occurrences of respective ones of content terms within the text items, comparing individual ones of the content vectors with respect to the seed vector.

...read moreread less

Abstract: Lexicon generation methods, computer implemented lexicon editing methods, lexicon generation devices, lexicon editors, and articles of manufacture are described according to some aspects. In one aspect, a lexicon generation method includes providing a seed vector indicative of occurrences of a plurality of seed terms within a plurality of text items, providing a plurality of content vectors indicative of occurrences of respective ones of a plurality of content terms within the text items, comparing individual ones of the content vectors with respect to the seed vector, and responsive to the comparing, selecting at least one of the content terms as a term of a lexicon usable in sentiment analysis of text.

...read moreread less

Dataset•

datasets for Using Verbs and Adjectives to Automatically Classify Blog Sentiment

[...]

Paula Chesley, Bruce Vincent, Li Xu, Srihari Rohini

01 Jan 2006

TL;DR: Using Verbs and Adjectives to Automatically Classify Blog Sentiment: as discussed by the authors used a corpus of text from blogs that are manually classified as having positive, negative, or neutral sentiment.

...read moreread less

Abstract: Training and test datasets for the paper "Using Verbs and Adjectives to Automatically Classify Blog Sentiment", available online at http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.179.3144&rep=rep1&type=pdf . Zip files of texts from blogs that are manually classified as having positive, negative, or neutral sentiment.

...read moreread less

Journal Article•

Semantic polarity analysis and opinion mining on Chinese review sentences

[...]

Yao Tianfang¹•Institutions (1)

Shanghai Jiao Tong University¹

01 Jan 2006-Journal of Computer Applications

TL;DR: A new algorithm for computing the contextual polarity of polar words was proposed, and a framework was introduced for topic identification and feature extraction that provided an innovative solution for the association of extracted opinion to its specific topic.

...read moreread less

Abstract: By using natural language management technology, Chinese Web review sentences were semantic studied and opinion mined. A new algorithm for computing the contextual polarity of polar words was proposed, and a framework was introduced for topic identification and feature extraction. The framework provided an innovative solution for the association of extracted opinion to its specific topic. The experiment results show that the algorithm is both reasonable and effective compared with the results of manual annotation.

...read moreread less

Sequential Models for Sentiment Prediction

[...]

Yi Mao¹, Guy Lebanon•Institutions (1)

Purdue University¹

01 Jan 2006

TL;DR: A variant of conditional random fields that is better suited to handle the problem of predicting local sentiment flow in documents is developed, showing the possibility of incorporating sentiment concept into a range of new applications.

...read moreread less

Abstract: We examine the problem of predicting local sentiment flow in documents, and its application to several areas of text analysis. Formally, the problem is stated as predicting an ordinal sequence based on a sequence of word sets. In the spirit of isotonic regression, we develop a variant of conditional random fields that is better suited to handle this problem. Experiments are reported for both sentiment prediction and text summarization, showing the possibility of incorporating sentiment concept into a range of new applications.

...read moreread less

Using WordNet for Opinion Mining

[...]

Pavel Smrû

01 Jan 2006

TL;DR: An automatic system that was designed to crawl various information sources available on the Web to collect and identify different opinions on a given topic and to report diversity of opinions across languages and countries is introduced.

...read moreread less

Abstract: This paper deals with lexical resources applied for opinion mining ‐ the identification and extraction of opinions from free texts. Opinion mining comprises the segmentation of documents, passages, sentences, or phrases to objective (factual) and subjective parts, and the evaluation of the subjective attitude toward a given fact. We briefly introduce an automatic system that was designed to crawl various information sources available on the Web ‐ newspapers, Internet blogs and forums ‐ to collect and identify different opinions on a given topic and to report diversity of opinions across languages and countries. A special attention is paid to linguistic resources used, especially to wordnet extensions that play a crucial role in the identification of subjective expressions.

...read moreread less

Proceedings Article•

Opinion mining in a telephone survey corpus.

[...]

Nathalie Camelin, Géraldine Damnati¹, Frédéric Béchet, Renato De Mori•Institutions (1)

Orange S.A.¹

01 Sep 2006

TL;DR: This paper addresses the automatic analysis of audio messages where customers are asked to give their opinion over several dimensions about a Customer Service, and interpretation methods that integrate automatically and manually acquired knowledge are proposed.

...read moreread less

Abstract: Telephone surveys are often used by Customer Services to evaluate their clients' satisfaction and to improve their services. Large amounts of data are collected to observe the evolution of custo-mers' opinions. Within this context, the automatization of the process of these databases becomes a crucial issue. This paper addresses the automatic analysis of audio messages where customers are asked to give their opinion over several dimensions about a Customer Service. Interpretation methods that integrate automatically and manually acquired knowledge are proposed. Experimental results, done on a database collected from a deployed Customer Service in real conditions with real customers are given.

...read moreread less