scispace - formally typeset
Search or ask a question

Showing papers on "Sentiment analysis published in 2008"


Book
08 Jul 2008
TL;DR: This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems and focuses on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis.
Abstract: An important part of our information-gathering behavior has always been to find out what other people think. With the growing availability and popularity of opinion-rich resources such as online review sites and personal blogs, new opportunities and challenges arise as people now can, and do, actively use information technologies to seek out and understand the opinions of others. The sudden eruption of activity in the area of opinion mining and sentiment analysis, which deals with the computational treatment of opinion, sentiment, and subjectivity in text, has thus occurred at least in part as a direct response to the surge of interest in new systems that deal directly with opinions as a first-class object. This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Our focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. We include material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

7,452 citations


Proceedings ArticleDOI
11 Feb 2008
TL;DR: This paper proposes a holistic lexicon-based approach to solving the problem of determining the semantic orientations (positive, negative or neutral) of opinions expressed on product features in reviews by exploiting external evidences and linguistic conventions of natural language expressions.
Abstract: One of the important types of information on the Web is the opinions expressed in the user generated content, e.g., customer reviews of products, forum posts, and blogs. In this paper, we focus on customer reviews of products. In particular, we study the problem of determining the semantic orientations (positive, negative or neutral) of opinions expressed on product features in reviews. This problem has many applications, e.g., opinion mining, summarization and search. Most existing techniques utilize a list of opinion (bearing) words (also called opinion lexicon) for the purpose. Opinion words are words that express desirable (e.g., great, amazing, etc.) or undesirable (e.g., bad, poor, etc) states. These approaches, however, all have some major shortcomings. In this paper, we propose a holistic lexicon-based approach to solving the problem by exploiting external evidences and linguistic conventions of natural language expressions. This approach allows the system to handle opinion words that are context dependent, which cause major difficulties for existing algorithms. It also deals with many special words, phrases and language constructs which have impacts on opinions based on their linguistic patterns. It also has an effective function for aggregating multiple conflicting opinion words in a sentence. A system, called Opinion Observer, based on the proposed technique has been implemented. Experimental results using a benchmark product review data set and some additional reviews show that the proposed technique is highly effective. It outperforms existing methods significantly

1,404 citations


Journal ArticleDOI
TL;DR: Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.
Abstract: The Internet is frequently used as a medium for exchange of information and opinions, as well as propaganda dissemination. In this study the use of sentiment analysis methodologies is proposed for classification of Web forum opinions in multiple languages. The utility of stylistic and syntactic features is evaluated for sentiment classification of English and Arabic content. Specific feature extraction components are integrated to account for the linguistic characteristics of Arabic. The entropy weighted genetic algorithm (EWGA) is also developed, which is a hybridized genetic algorithm that incorporates the information-gain heuristic for feature selection. EWGA is designed to improve performance and get a better assessment of key features. The proposed features and techniques are evaluated on a benchmark movie review dataset and U.S. and Middle Eastern Web forum postings. The experimental results using EWGA with SVM indicate high performance levels, with accuracies of over 91p on the benchmark dataset as well as the U.S. and Middle Eastern forums. Stylistic features significantly enhanced performance across all testbeds while EWGA also outperformed other feature selection methods, indicating the utility of these features and techniques for document-level classification of sentiments.

949 citations


Proceedings ArticleDOI
21 Apr 2008
TL;DR: This article proposed a multi-grain topic model for extracting the ratable aspects of objects from online user reviews, which is more appropriate for our task since standard models tend to produce topics that correspond to global properties of objects rather than the aspects of an object that tend to be rated by a user.
Abstract: In this paper we present a novel framework for extracting the ratable aspects of objects from online user reviews. Extracting such aspects is an important challenge in automatically mining product opinions from the web and in generating opinion-based summaries of user reviews [18, 19, 7, 12, 27, 36, 21]. Our models are based on extensions to standard topic modeling methods such as LDA and PLSA to induce multi-grain topics. We argue that multi-grain models are more appropriate for our task since standard models tend to produce topics that correspond to global properties of objects (e.g., the brand of a product type) rather than the aspects of an object that tend to be rated by a user. The models we present not only extract ratable aspects, but also cluster them into coherent topics, e.g., 'waitress' and 'bartender' are part of the same topic 'staff' for restaurants. This differentiates it from much of the previous work which extracts aspects through term frequency analysis with minimal clustering. We evaluate the multi-grain models both qualitatively and quantitatively to show that they improve significantly upon standard topic models.

650 citations


Proceedings ArticleDOI
16 Mar 2008
TL;DR: The construction of a large data set annotated for six basic emotions, ANGER, DISGUST, FEAR, JOY, SADNESS and SURPRISE, and several knowledge-based and corpusbased methods for the automatic identification of these emotions in text are proposed.
Abstract: This paper describes experiments concerned with the automatic analysis of emotions in text. We describe the construction of a large data set annotated for six basic emotions: ANGER, DISGUST, FEAR, JOY, SADNESS and SURPRISE, and we propose and evaluate several knowledge-based and corpusbased methods for the automatic identification of these emotions in text.

648 citations


Journal ArticleDOI
TL;DR: The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification, and it is found that sentiment classifiers are severely dependent on domains or topics.
Abstract: Up to now, there are very few researches conducted on sentiment classification for Chinese documents. In order to remedy this deficiency, this paper presents an empirical study of sentiment categorization on Chinese documents. Four feature selection methods (MI, IG, CHI and DF) and five learning methods (centroid classifier, K-nearest neighbor, winnow classifier, Naive Bayes and SVM) are investigated on a Chinese sentiment corpus with a size of 1021 documents. The experimental results indicate that IG performs the best for sentimental terms selection and SVM exhibits the best performance for sentiment classification. Furthermore, we found that sentiment classifiers are severely dependent on domains or topics.

430 citations


Proceedings ArticleDOI
07 Apr 2008
TL;DR: The results show that working with standard technology and existing sentiment analysis approaches is a viable approach to sentiment analysis within a multilingual framework.
Abstract: This paper introduces a methodology for determining polarity of text within a multilingual framework. The method leverages on lexical resources for sentiment analysis available in English (SentiWordNet). First, a document in a different language than English is translated into English using standard translation software. Then, the translated document is classified according to its sentiment into one of the classes "positive" and "negative". For sentiment classification, a document is searched for sentiment bearing words like adjectives. By means of SentiWordNet, scores for positivity and negativity are determined for these words. An interpretation of the scores then leads to the document polarity. The method is tested for German movie reviews selected from Amazon and is compared to a statistical polarity classifier based on n-grams. The results show that working with standard technology and existing sentiment analysis approaches is a viable approach to sentiment analysis within a multilingual framework.

367 citations


Proceedings ArticleDOI
18 Aug 2008
TL;DR: In this paper, the authors focus on mining opinions from comparative sentences, i.e., to determine which entities in a comparison are preferred by its author, and propose a technique to solve the problem.
Abstract: This paper studies sentiment analysis from the user-generated content on the Web. In particular, it focuses on mining opinions from comparative sentences, i.e., to determine which entities in a comparison are preferred by its author. A typical comparative sentence compares two or more entities. For example, the sentence, "the picture quality of Camera X is better than that of Camera Y", compares two entities "Camera X" and "Camera Y" with regard to their picture quality. Clearly, "Camera X" is the preferred entity. Existing research has studied the problem of extracting some key elements in a comparative sentence. However, there is still no study of mining opinions from comparative sentences, i.e., identifying preferred entities of the author. This paper studies this problem, and proposes a technique to solve the problem. Our experiments using comparative sentences from product reviews and forum posts show that the approach is effective.

275 citations


Proceedings ArticleDOI
Qi Su1, Xinying Xu1, Honglei Guo2, Zhili Guo2, Xian Wu2, Xiaoxun Zhang2, Bin Swen1, Zhong Su2 
21 Apr 2008
TL;DR: A novel mutual reinforcement approach to deal with the feature-level opinion mining problem, which can largely predict opinions relating to different product features, even for the case without the explicit appearance of product feature words in reviews.
Abstract: The boom of product review websites, blogs and forums on the web has attracted many research efforts on opinion mining. Recently, there was a growing interest in the finer-grained opinion mining, which detects opinions on different review features as opposed to the whole review level. The researches on feature-level opinion mining mainly rely on identifying the explicit relatedness between product feature words and opinion words in reviews. However, the sentiment relatedness between the two objects is usually complicated. For many cases, product feature words are implied by the opinion words in reviews. The detection of such hidden sentiment association is still a big challenge in opinion mining. Especially, it is an even harder task of feature-level opinion mining on Chinese reviews due to the nature of Chinese language. In this paper, we propose a novel mutual reinforcement approach to deal with the feature-level opinion mining problem. More specially, 1) the approach clusters product features and opinion words simultaneously and iteratively by fusing both their content information and sentiment link information. 2) under the same framework, based on the product feature categories and opinion word groups, we construct the sentiment association set between the two groups of data objects by identifying their strongest n sentiment links. Moreover, knowledge from multi-source is incorporated to enhance clustering in the procedure. Based on the pre-constructed association set, our approach can largely predict opinions relating to different product features, even for the case without the explicit appearance of product feature words in reviews. Thus it provides a more accurate opinion evaluation. The experimental results demonstrate that our method outperforms the state-of-art algorithms.

240 citations


Proceedings ArticleDOI
Xiaojun Wan1
25 Oct 2008
TL;DR: This study proposes a novel approach to leverage reliable English resources to improve Chinese sentiment analysis by first translates Chinese reviews into English reviews by machine translation services, and then identifies the sentiment polarity of English Reviews by directly leveraging English resources.
Abstract: It is a challenging task to identify sentiment polarity of Chinese reviews because the resources for Chinese sentiment analysis are limited. Instead of leveraging only monolingual Chinese knowledge, this study proposes a novel approach to leverage reliable English resources to improve Chinese sentiment analysis. Rather than simply projecting English resources onto Chinese resources, our approach first translates Chinese reviews into English reviews by machine translation services, and then identifies the sentiment polarity of English reviews by directly leveraging English resources. Furthermore, our approach performs sentiment analysis for both Chinese reviews and English reviews, and then uses ensemble methods to combine the individual analysis results. Experimental results on a dataset of 886 Chinese product reviews demonstrate the effectiveness of the proposed approach. The individual analysis of the translated English reviews outperforms the individual analysis of the original Chinese reviews, and the combination of the individual analysis results further improves the performance.

224 citations


Proceedings Article
01 Jan 2008
TL;DR: This work explores an approach utilizing state-of-the-art machine translation technology and performs sentiment analysis on the English translation of a foreign language text and indicates that entity sentiment scores obtained by the method are statistically significantly correlated across nine languages of news sources and five languages of a parallel corpus.
Abstract: There is a growing interest in mining opinions using sentiment analysis methods from sources such as news, blogs and product reviews. Most of these methods have been developed for English and are difficult to generalize to other languages. We explore an approach utilizing state-of-the-art machine translation technology and perform sentiment analysis on the English translation of a foreign language text. Our experiments indicate that (a) entity sentiment scores obtained by our method are statistically significantly correlated across nine languages of news sources and five languages of a parallel corpus; (b) the quality of our sentiment analysis method is largely translator independent; (c) after applying certain normalization techniques, our entity sentiment scores can be used to perform meaningful cross-cultural comparisons.

Proceedings ArticleDOI
Vikas Sindhwani1, Prem Melville1
15 Dec 2008
TL;DR: This paper proposes a novel semi-supervised sentiment prediction algorithm that utilizes lexical prior knowledge in conjunction with unlabeled examples based on joint sentiment analysis of documents and words based on a bipartite graph representation of the data.
Abstract: The goal of sentiment prediction is to automatically identify whether a given piece of text expresses positive or negative opinion towards a topic of interest. One can pose sentiment prediction as a standard text categorization problem, but gathering labeled data turns out to be a bottleneck. Fortunately, background knowledge is often available in the form of prior information about the sentiment polarity of words in a lexicon. Moreover, in many applications abundant unlabeled data is also available. In this paper, we propose a novel semi-supervised sentiment prediction algorithm that utilizes lexical prior knowledge in conjunction with unlabeled examples. Our method is based on joint sentiment analysis of documents and words based on a bipartite graph representation of the data. We present an empirical study on a diverse collection of sentiment prediction problems which confirms that our semi-supervised lexical models significantly outperform purely supervised and competing semi-supervised techniques.

01 Jan 2008
TL;DR: This dissertation investigates the manual and automatic identification of linguistic expressions of private states in a corpus of news documents from the world press with results in automatic systems for performing fine-grained subjectivity analysis that significantly outperform baseline systems.
Abstract: Private states (mental and emotional states) are part of the information that is conveyed in many forms of discourse. News articles often report emotional responses to news stories; editorials, reviews, and weblogs convey opinions and beliefs. This dissertation investigates the manual and automatic identification of linguistic expressions of private states in a corpus of news documents from the world press. A term for the linguistic expression of private states is subjectivity. The conceptual representation of private states used in this dissertation is that of (Wiebe, Wilson, and Cardie, 2005). As part of this research, annotators are trained to identify expressions of private states and their properties, such as the source and the intensity of the private state. This dissertation then extends the conceptual representation of private states to better model the attitudes and targets of private states. The inter-annotator agreement studies conducted for this dissertation show that the various concepts in the original and extended representation of private states can be reliably annotated. Exploring the automatic recognition of various types of private states is also a large part of this dissertation. Experiments are conducted that focus on three types of fine-grained subjectivity analysis: recognizing the intensity of clauses and sentences, recognizing the contextual polarity of words and phrases, and recognizing the attribution levels where sentiment and arguing attitudes are expressed. Various supervised machine learning algorithms are used to train automatic systems to perform each of these tasks. These experiments result in automatic systems for performing fine-grained subjectivity analysis that significantly outperform baseline systems. Keywords: Subjectivity, Private States, Opinions, Sentiment, Attitudes.

Book ChapterDOI
28 May 2008
TL;DR: A novel approach based on Support Vector Machines is proposed to classify a subset of documents using polarity metrics by applying it to a publicly available set of movie reviews.
Abstract: With the ever-growing popularity of online media such as blogs and social networking sites, the Internet is a valuable source of information for product and service reviews. Attempting to classify a subset of these documents using polarity metrics can be a daunting task. After a survey of previous research on sentiment polarity, we propose a novel approach based on Support Vector Machines. We compare our method to previously proposed lexical-based and machine learning (ML) approaches by applying it to a publicly available set of movie reviews. Our algorithm will be integrated within a blog visualization tool.

Journal ArticleDOI
TL;DR: The authors propose the AVA (adjective verb adverb) framework for identifying opinions on any given topic using adjectives, adverbs, or verbs for determining the strength of subjective expressions in a sentence or document.
Abstract: Most research on determining the strength of subjective expressions in a sentence or document uses single, specific parts of speech such as adjectives, adverbs, or verbs. To date, almost no research covers the development of a single comprehensive framework in which we can analyze sentiment that takes all three into account. The authors propose the AVA (adjective verb adverb) framework for identifying opinions on any given topic. In AVA, a user can select any topic t of interest and any document d. AVA will return a score that d expresses topic t. The score is expressed on a –1 (maximally negative) to +1 (maximally positive) scale.

Journal ArticleDOI
TL;DR: This work suggests that social network analysis is an important tool for performing natural language processing tasks with informal web texts, and could be improved using techniques which took into account the users' position in the online community.
Abstract: Purpose – To evaluate and extend, existing natural language processing techniques into the domain of informal online political discussions.Design/methodology/approach – A database of postings from a US political discussion site was collected, along with self‐reported political orientation data for the users. A variety of sentiment analysis, text classification, and social network analysis methods were applied to the postings and evaluated against the users' self‐descriptions.Findings – Purely text‐based methods performed poorly, but could be improved using techniques which took into account the users' position in the online community.Research limitations/implications – The techniques we applied here are fairly simple, and more sophisticated learning algorithms may yield better results for text‐based classification.Practical implications – This work suggests that social network analysis is an important tool for performing natural language processing tasks with informal web texts.Originality/value – This re...

Proceedings ArticleDOI
20 Jul 2008
TL;DR: A novel generation model that unifies topic-relevance and opinion generation by a quadratic combination is proposed and demonstrates that in the opinion retrieval task, a Bayesian approach to combining multiple ranking functions is superior to using a linear combination.
Abstract: Opinion retrieval is a task of growing interest in social life and academic research, which is to find relevant and opinionate documents according to a user's query. One of the key issues is how to combine a document's opinionate score (the ranking score of to what extent it is subjective or objective) and topic relevance score. Current solutions to document ranking in opinion retrieval are generally ad-hoc linear combination, which is short of theoretical foundation and careful analysis. In this paper, we focus on lexicon-based opinion retrieval. A novel generation model that unifies topic-relevance and opinion generation by a quadratic combination is proposed in this paper. With this model, the relevance-based ranking serves as the weighting factor of the lexicon-based sentiment ranking function, which is essentially different from the popular heuristic linear combination approaches. The effect of different sentiment dictionaries is also discussed. Experimental results on TREC blog datasets show the significant effectiveness of the proposed unified model. Improvements of 28.1% and 40.3% have been obtained in terms of MAP and p@10 respectively. The conclusion is not limited to blog environment. Besides the unified generation model, another contribution is that our work demonstrates that in the opinion retrieval task, a Bayesian approach to combining multiple ranking functions is superior to using a linear combination. It is also applicable to other result re-ranking applications in similar scenario.

Proceedings ArticleDOI
16 Jun 2008
TL;DR: This paper addresses a new task in sentiment classification that aims to improve performance through fusing training data from multiple domains simultaneously, and proposes two approaches of fusion, feature-level and classifier-level, to use training data for multi-domain sentiment classification.
Abstract: This paper addresses a new task in sentiment classification, called multi-domain sentiment classification, that aims to improve performance through fusing training data from multiple domains. To achieve this, we propose two approaches of fusion, feature-level and classifier-level, to use training data from multiple domains simultaneously. Experimental studies show that multi-domain sentiment classification using the classifier-level approach performs much better than single domain classification (using the training data individually).

Proceedings ArticleDOI
28 Oct 2008
TL;DR: This paper proposes a new approach focusing on two steps that automatically extract a learning dataset for a specific domain from the Internet and extracts the set of positive and negative adjectives relevant to the domain.
Abstract: The growing popularity of Web 2.0 provides with increasing numbers of documents expressing opinions on different topics. Recently, new research approaches have been defined in order to automatically extract such opinions from the Internet. They usually consider opinions to be expressed through adjectives, and make extensive use of either general dictionaries or experts to provide the relevant adjectives. Unfortunately, these approaches suffer from the following drawback: in a specific domain, a given adjective may either not exist or have a different meaning from another domain. In this paper, we propose a new approach focusing on two steps. First, we automatically extract a learning dataset for a specific domain from the Internet. Secondly, from this learning set we extract the set of positive and negative adjectives relevant to the domain. The usefulness of our approach was demonstrated by experiments performed on real data.

Proceedings ArticleDOI
24 Jul 2008
TL;DR: This paper has developed a system, which provides the user with a platform to analyze opinion expressions extracted from a repository, aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity.
Abstract: The proliferation of Internet has not only generated huge volumes of unstructured information in the form of web documents, but a large amount of text is also generated in the form of emails, blogs, and feedbacks etc. The data generated from online communication acts as potential gold mines for discovering knowledge. Text analytics has matured and is being successfully employed to mine important information from unstructured text documents. Most of these techniques use Natural Language Processing techniques which assume that the underlying text is clean and correct. Statistical techniques, though not as accurate as linguistic mechanisms, are also employed for the purpose to overcome the dependence on clean text. The chief bottleneck for designing statistical mechanisms is however its dependence on appropriately annotated training data. None of these methodologies are suitable for mining information from online communication text data due to the fact that they are often noisy. These texts are informally written. They suffer from spelling mistakes, grammatical errors, improper punctuation and irrational capitalization. This paper focuses on opinion extraction from noisy text data. It is aimed at extracting and consolidating opinions of customers from blogs and feedbacks, at multiple levels of granularity. Ours is a hybrid approach, in which we initially employ a semi-supervised method to learn domain knowledge from a training repository which contains both noisy and clean text. Thereafter we employ localized linguistic techniques to extract opinion expressions from noisy text. We have developed a system based on this approach, which provides the user with a platform to analyze opinion expressions extracted from a repository.

Proceedings ArticleDOI
31 Jan 2008
TL;DR: This paper surveys and analyzes various techniques that have been developed for the key tasks of opinion mining and provides an overall picture of what is involved in developing a software system for opinion mining.
Abstract: As people leave on the Web their opinions on products and services they have used, it has become important to develop methods of (semi-)automatically classifying and gauging them. The task of analyzing such data, collectively called customer feedback data, is known as opinion mining. Opinion mining consists of several steps, and multiple techniques have been proposed for each step. In this paper, we survey and analyze various techniques that have been developed for the key tasks of opinion mining. On the basis of our survey and analysis of the techniques, we provide an overall picture of what is involved in developing a software system for opinion mining.

Proceedings ArticleDOI
20 Jul 2008
TL;DR: A novel scheme for sentiment classification (without labeled examples) which combines the strengths of both "learn-based" andlexicon-based approaches as follows: first use a lexicon- based technique to label a portion of informative examples from given task (or domain); then learn a new supervised classifier based on these labeled ones.
Abstract: In this work, we propose a novel scheme for sentiment classification (without labeled examples) which combines the strengths of both "learn-based" and "lexicon-based" approaches as follows: we first use a lexicon-based technique to label a portion of informative examples from given task (or domain); then learn a new supervised classifier based on these labeled ones; finally apply this classifier to the task. The experimental results indicate that proposed scheme could dramatically outperform "learn-based" and "lexicon-based" techniques.

Proceedings Article
01 Jan 2008
TL;DR: Novel unsupervised techniques are used, including a one-word 'seed' vocabulary and iterative retraining for sentiment processing, and a criterion of 'sentiment density' for determining the extent to which a document is opinionated.
Abstract: We address the problem of sentiment and objectivity classification of product reviews in Chinese. Our approach is distinctive in that it treats both positive / negative sentiment and subjectivity / objectivity not as distinct classes but rather as a continuum; we argue that this is desirable from the perspective of would-be customers who read the reviews. We use novel unsupervised techniques, including a one-word 'seed' vocabulary and iterative retraining for sentiment processing, and a criterion of 'sentiment density' for determining the extent to which a document is opinionated. The classifier achieves up to 87% F-measure for sentiment polarity detection.

Proceedings Article
01 Jan 2008
TL;DR: It is found that discussion patterns on IMDb predict Academy Awards nominations and box office success, and weighting the forum posts of the contributors according to their network position allow for predict trends and real world events in the movie business.
Abstract: This paper introduces a new Web mining approach that combines social network analysis and automatic sentiment analysis. We show how weighting the forum posts of the contributors according to their network position allow us to predict trends and real world events in the movie business. To test our approach we conducted two experiments analyzing online forum discussions on the Internet movie database (IMDb) by examining the correlation of the social network structure with external metrics such as box office revenue and Oscar Awards. We find that discussion patterns on IMDb predict Academy Awards nominations and box office success. Two months before the Oscars were given we were able to correctly predict nine Oscar nominations. We also found that forum contributions correlated with box office success of 20 top grossing movies of 2006.

Journal ArticleDOI
Zhu Zhang1
TL;DR: A new task in text-sentiment analysis adds usefulness scoring to polarity/ opinion extraction to improve product- review ranking services, helping shoppers and vendors leverage information from multiple sources.
Abstract: A new task in text-sentiment analysis adds usefulness scoring to polarity/ opinion extraction to improve product- review ranking services, helping shoppers and vendors leverage information from multiple sources. Human language is a medium not only for exchanging information but also for conveying subjective opinions and emotion. Recently, interest in text-subjectivity and sentiment analysis has increased as part of the larger research effort in affective computing, which aims to make computers understand and generate human-like emotions through language and other expressive activities such as gesture.

Proceedings ArticleDOI
09 Dec 2008
TL;DR: This paper proposes a new UGC-oriented language technology application that aims at automatically collecting instances of personal experiences as well as opinions from an explosive number of user generated contents (UGCs) and storing them in an experience database with semantically rich indices.
Abstract: This paper proposes a new UGC-oriented language technology application, which we call experience mining. Experience mining aims at automatically collecting instances of personal experiences as well as opinions from an explosive number of user generated contents (UGCs) such as weblog and forum posts and storing them in an experience database with semantically rich indices. After arguing the technical issues of this new task, we focus on the central problem, factuality analysis, among others and propose a machine learning-based solution as well as the task definition itself. Our empirical evaluation indicates that our factuality analysis task is sufficiently well-defined to achieve a high inter-annotator agreement and our Factorial CRF-based model considerably outperforms the baseline. We also present an application system, which currently stores over 30M experience instances extracted from 150M Japanese blog posts with semantic indices and is scheduled to start serving as an experience search engine for unrestricted users in October.

Journal ArticleDOI
TL;DR: A method to recognize the relationships between subjective expressions and references to features of a product, such as service quality and location of a hotel is proposed and investigated.
Abstract: Automated discovery and analysis of customer opinions on the web holds a lot of promise for present-day practices of market research and customer relationship management. Opinion mining attempts to come up with ways to automatically analyse subjectivity expressed in natural language text. Previous research on the topic has shown that the overall subjectivity expressed in a document, such as a customer review, can be assessed with accuracy that is feasible in real-world applications. In this paper, we address the challenge of identification of customer opinions expressed towards specific features of a product, such as service quality and location of a hotel. The paper proposes and investigates a method to recognize the relationships between subjective expressions and references to features of a product. While the method has been evaluated on customer hotel reviews, it can potentially find application also in many tasks where concrete statements need to be extracted from documents on heterogeneous topics su...

Proceedings ArticleDOI
Keke Cai1, Scott Spangler1, Ying Chen1, Li Zhang1
09 Dec 2008
TL;DR: An overall sentiment analysis system that consists of techniques that could detect the topics that are highly correlated with the positive and negative opinions, and a novel topic detection method using point-wise mutual information and term frequency distribution are described.
Abstract: The emergence of new social media such as blogs, message boards, news, and Web content in general has dramatically changed the ecosystems of corporations. Consumers, non-profit organizations, and other forms of communities are extremely vocal about their opinions and perceptions on companies and their brands on the Web. The ability to leverage such "voice of the Web" to gain consumer, brand, and market insights can be truly differentiating and valuable to todaypsilas corporations. In particular, one important form of insights can be derived from sentiment analysis on Web content. Sentiment analysis traditionally emphasizes on classification of Web comments into positive, neutral, and negative categories. This paper goes beyond sentiment classification by focusing on techniques that could detect the topics that are highly correlated with the positive and negative opinions. Such techniques, when coupled with sentiment classification, can help the business analysts to understand both the overall sentiment scope as well as the drivers behind the sentiment. In this paper, we describe our overall sentiment analysis system that consists of such sentiment analysis techniques. We then detail a novel topic detection method using point-wise mutual information and term frequency distribution. We demonstrate the effectiveness of our overall approaches via several case studies on different social media data sets.

Journal ArticleDOI
TL;DR: A system for automatically determining the polarity (positivity/negativity) of these relations by using techniques from sentiment analysis is presented, using a machine learning model trained on the manually annotated news coverage of the Dutch 2006 elections.
Abstract: Many research questions in political communication can be answered by representing text as a network of positive or negative relations between actors and issues such as conducted by semantic network analysis. This article presents a system for automatically determining the polarity (positivity/negativity) of these relations by using techniques from sentiment analysis. We used a machine learning model trained on the manually annotated news coverage of the Dutch 2006 elections, collecting lexical, syntactic, and word-similarity based features, and using the syntactic analysis to focus on the relevant part of the sentence. The performance of the full system is significantly better than the baseline with an F1 score of .63. Additionally, we replicate four studies from an earlier analysis of these elections, attaining correlations of greater than .8 in three out of four cases. This shows that the presented system can be immediately used for a number of analyses.

Proceedings Article
08 Dec 2008
TL;DR: This work presents a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights, encoded as a network whose vertices are features and whose edges represent similarities and Dissimilarities between them.
Abstract: For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, and when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should be expected to have similar weights in an accurate model. Here we present a framework for regularized learning when one has prior knowledge about which features are expected to have similar and dissimilar weights. The prior knowledge is encoded as a network whose vertices are features and whose edges represent similarities and dissimilarities between them. During learning, each feature's weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using networks of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature networks constructed from declarative human knowledge significantly improve prediction accuracy.