scispace - formally typeset
Search or ask a question
Journal ArticleDOI

Multi-class sentiment classification

01 Sep 2017-Expert Systems With Applications (Pergamon)-Vol. 80, pp 323-339
TL;DR: A framework for multi-class sentiment classification is proposed, and the results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performsbest among the five machine learning algorithms.
Abstract: A framework for multi-class sentiment classification is proposed.A total of 3600 comparative experiments are conducted.Performances of different feature selection/machine learning algorithms are compared.The results are valuable for further studies on multi-class sentiment classification. Multi-class sentiment classification has extensive application backgrounds, whereas studies on this issue are still relatively scarce. In this paper, a framework for multi-class sentiment classification is proposed, which includes two parts: 1) selecting important features of texts using the feature selection algorithm, and 2) training multi-class sentiment classifier using the machine learning algorithm. Then, experiments are conducted for comparing the performances of four popular feature selection algorithms (document frequency, CHI statistics, information gain and gain ratio) and five popular machine learning algorithms (decision tree, nave Bayes, support vector machine, radial basis function neural network and K-nearest neighbor) in multi-class sentiment classification. The experiments are conducted on three public datasets which include twelve data subsets, and 10-fold cross validation is used to obtain the classification accuracy concerning each combination of feature selection algorithm, machine learning algorithm, feature set size and data subset. Based on the obtained 3600 classification accuracies (4 feature selection algorithms 5 machine learning algorithms 15 feature set sizes 12 data subsets), the average classification accuracy of each algorithm is calculated, and the Wilcoxon test is used to verify the existence of significant difference between different algorithms in multi-class sentiment classification. The results show that, in terms of classification accuracy, gain ratio performs best among the four feature selection algorithms and support vector machine performs best among the five machine learning algorithms. In terms of execution time, the similar comparisons are also conducted. The obtained results would be valuable for further improving the existing multi-class sentiment classifiers and developing new multi-class sentiment classifiers.
Citations
More filters
Journal ArticleDOI
TL;DR: Six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation are described.
Abstract: The aim of this study is to provide an overview the state-of-the-art elements of text classification. For this purpose, we first select and investigate the primary and recent studies and objectives in this field. Next, we examine the state-of-the-art elements of text classification. In the following steps, we qualitatively and quantitatively analyse the related works. Herein, we describe six baseline elements of text classification including data collection, data analysis for labelling, feature construction and weighing, feature selection and projection, training of a classification model, and solution evaluation. This study will help readers acquire the necessary information about these elements and their associated techniques. Thus, we believe that this study will assist other researchers and professionals to propose new studies in the field of text classification.

224 citations

Journal ArticleDOI
TL;DR: In this paper, a back-propagation artificial neural network (BP ANN) was used to select the variables that contribute the most to the model's estimation of AGB, and then they built the model.

144 citations

Journal ArticleDOI
TL;DR: Inspired by the methodology of three-way decisions, a three- way enhanced convolutional neural network model named 3W-CNN is proposed which can be seen as an ensemble method which uses the enhance model to optimize convolutionAL neural networks (CNN).

120 citations

Journal ArticleDOI
TL;DR: A method for modelling customer satisfaction from online reviews using an ensemble neural network based model (ENNM) and an effect-based Kano model (EKM) is proposed to measure the effects of customer sentiments toward different CSDs on customer satisfaction.
Abstract: With the rapid advances in information technology, an increasing number of online reviews are posted daily on the Internet. Such reviews can serve as a promising data source to understand customer ...

116 citations


Cites background or methods from "Multi-class sentiment classificatio..."

  • ...In the era of Web 2.0, customers increasingly post online reviews of products or services on the Internet (Chong et al. 2017; Grewal, Cote, and Baumgartner 2004; Liu, Bi, and Fan 2017a)....

    [...]

  • ...To determine the sentiment orientation of each review in Ri, SVM is used, which is a state of the art machine learning algorithm for sentiment classification (Liu, Bi, and Fan 2017b; Pang and Lee 2008; Pournarakis, Sotiropoulos, and Giaglis 2017)....

    [...]

  • ...For a detail process of SVM for sentiment classification, we refer the reader to Liu, Bi, and Fan (2017b)....

    [...]

Journal ArticleDOI
TL;DR: The experimental findings of this paper suggest that the proposed method can be used to accurately predict the PD and can be easily incorporated in healthcare for diagnosis purpose.
Abstract: The patient of Parkinson's disease (PD) is facing a critical neurological disorder issue. Efficient and early prediction of people having PD is a key issue to improve patient's quality of life. The diagnosis of PD specifically in its initial stages is extremely complex and time-consuming. Thus, the accurate and efficient diagnosis of PD has been a significant challenge for medical experts and practitioners. In order to tackle this issue and to accurately diagnosis the patient of PD, we proposed a machine-learning-based prediction system. In the development of the proposed system, the support vector machine (SVM) was used as a predictive model for the prediction of PD. The L1-norm SVM of features selection was used for appropriate and highly related features selection for accurate target classification of PD and healthy people. The L1-norm SVM produced a new subset of features from the PD dataset based on a feature weight value. For the validation of the proposed system, the K-fold cross-validation method was used. In addition, the metrics of performance measures, such as accuracy, sensitivity, specificity, precision, F1 score, and execution time, were computed for model performance evaluation. The PD dataset was in this paper. The optimal accuracy achieved the best subset of the selected features that might be due to various contributions of the PD features. The experimental findings of this paper suggest that the proposed method can be used to accurately predict the PD and can be easily incorporated in healthcare for diagnosis purpose. Currently, the computer-based assisted predictive system is playing an important role to assist in PD recognition. In addition, the proposed approach fills in a gap on feature selection and classification using voice recordings data by properly matching the experimental design.

112 citations


Cites methods from "Multi-class sentiment classificatio..."

  • ...[47] proposed a framework for multi-class sentiment classification....

    [...]

References
More filters
Book
Vladimir Vapnik1
01 Jan 1995
TL;DR: Setting of the learning problem consistency of learning processes bounds on the rate of convergence ofLearning processes controlling the generalization ability of learning process constructing learning algorithms what is important in learning theory?
Abstract: Setting of the learning problem consistency of learning processes bounds on the rate of convergence of learning processes controlling the generalization ability of learning processes constructing learning algorithms what is important in learning theory?.

40,147 citations

Book
15 Oct 1992
TL;DR: A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting.
Abstract: From the Publisher: Classifier systems play a major role in machine learning and knowledge-based systems, and Ross Quinlan's work on ID3 and C4.5 is widely acknowledged to have made some of the most significant contributions to their development. This book is a complete guide to the C4.5 system as implemented in C for the UNIX environment. It contains a comprehensive guide to the system's use , the source code (about 8,800 lines), and implementation notes. The source code and sample datasets are also available on a 3.5-inch floppy diskette for a Sun workstation. C4.5 starts with large sets of cases belonging to known classes. The cases, described by any mixture of nominal and numeric properties, are scrutinized for patterns that allow the classes to be reliably discriminated. These patterns are then expressed as models, in the form of decision trees or sets of if-then rules, that can be used to classify new cases, with emphasis on making the models understandable as well as accurate. The system has been applied successfully to tasks involving tens of thousands of cases described by hundreds of properties. The book starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Advantages and disadvantages of the C4.5 approach are discussed and illustrated with several case studies. This book and software should be of interest to developers of classification-based intelligent systems and to students in machine learning and expert systems courses.

21,674 citations

Book
25 Oct 1999
TL;DR: This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining.
Abstract: Data Mining: Practical Machine Learning Tools and Techniques offers a thorough grounding in machine learning concepts as well as practical advice on applying machine learning tools and techniques in real-world data mining situations. This highly anticipated third edition of the most acclaimed work on data mining and machine learning will teach you everything you need to know about preparing inputs, interpreting outputs, evaluating results, and the algorithmic methods at the heart of successful data mining. Thorough updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including new material on Data Transformations, Ensemble Learning, Massive Data Sets, Multi-instance Learning, plus a new version of the popular Weka machine learning software developed by the authors. Witten, Frank, and Hall include both tried-and-true techniques of today as well as methods at the leading edge of contemporary research. *Provides a thorough grounding in machine learning concepts as well as practical advice on applying the tools and techniques to your data mining projects *Offers concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods *Includes downloadable Weka software toolkit, a collection of machine learning algorithms for data mining tasks-in an updated, interactive interface. Algorithms in toolkit cover: data pre-processing, classification, regression, clustering, association rules, visualization

20,196 citations

Journal ArticleDOI
TL;DR: In this paper, an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail, is described, and a reported shortcoming of the basic algorithm is discussed.
Abstract: The technology for building knowledge-based systems by inductive inference from examples has been demonstrated successfully in several practical applications. This paper summarizes an approach to synthesizing decision trees that has been used in a variety of systems, and it describes one such system, ID3, in detail. Results from recent studies show ways in which the methodology can be modified to deal with information that is noisy and/or incomplete. A reported shortcoming of the basic algorithm is discussed and two means of overcoming it are compared. The paper concludes with illustrations of current research directions.

17,177 citations