scispace - formally typeset
Search or ask a question
Author

Mohd Ridzwan Yaakub

Bio: Mohd Ridzwan Yaakub is an academic researcher from National University of Malaysia. The author has contributed to research in topics: Sentiment analysis & Feature selection. The author has an hindex of 9, co-authored 37 publications receiving 287 citations. Previous affiliations of Mohd Ridzwan Yaakub include Queensland University of Technology.

Papers
More filters
Proceedings ArticleDOI
28 Jul 2015
TL;DR: It can be concluded that metaheuristic based algorithms have the potential to be implemented in sentiment analysis research and can produce an optimal subset of features by eliminating features that are irrelevant and redundant.
Abstract: Sentiment analysis functions by analyzing and extracting opinions from documents, websites, blogs, discussion forums and others to identify sentiment patterns on opinions expressed by consumers. It analyzes people's sentiment and identifies types of sentiment in comments expressed by consumers on certain matters. This paper highlights comparative studies on the types of feature selection in sentiment analysis based on natural language processing and modern methods such as Genetic Algorithm and Rough Set Theory. This study compares feature selection in text classification based on traditional and sentiment analysis methods. Feature selection is an important step in sentiment analysis because a suitable feature selection can identify the actual product features criticized or discussed by consumers. It can be concluded that metaheuristic based algorithms have the potential to be implemented in sentiment analysis research and can produce an optimal subset of features by eliminating features that are irrelevant and redundant.

56 citations

Journal ArticleDOI
TL;DR: This study hybridize two n-gram models, unigram and n- gram, and applied Laplace smoothing to Naïve Bayesian classifier and Katz back-off on the model in order to smoothen and address the limitation of accuracy in terms of precision and recall of n- Gram models caused by the ‘zero count problem.’
Abstract: Twitter, an online micro-blogging and social networking service, provides registered users the ability to write in 140 characters anything they wish and hence providing them the opportunity to express their opinions and sentiments on events taking place. Politically sentimental tweets are top-trending tweets; whenever election is near, users tweet about their favorite candidates or political parties and at times give their reasons for that. In this study, we hybridize two n-gram [two n-gram models used in this study are unigram and n-gram. Therefore, in this study, where unigram is mentioned that refers to a least-order n-gram (unigram) and where n-gram is mentioned that refers to the highest-order (full sentence or tweet level) n-gram] models and applied Laplace smoothing to Naive Bayesian classifier and Katz back-off on the model. This was done in order to smoothen and address the limitation of accuracy in terms of precision and recall of n-gram models caused by the ‘zero count problem.’ Result from our baseline model shows an increase of 6.05% in average F-Harmonic accuracy in comparison with the n-gram model and 1.75% increase in comparison with the semantic-topic model proposed from a previous study on the same dataset, i.e., Obama–McCain dataset.

35 citations

Journal ArticleDOI
TL;DR: This paper builds a model for movie revenue prediction prior to the movie's release using YouTube trailer reviews and proves the superiority of this approach compared to three baseline approaches and achieved a relative absolute error of 29.65%.
Abstract: The increase in acceptability and popularity of social media has made extracting information from the data generated on social media an emerging field of research. An important branch of this field is predicting future events using social media data. This paper is focused on predicting box-office revenue of a movie by mining people's intention to purchase a movie ticket, termed purchase intention, from trailer reviews. Movie revenue prediction is important due to risks involved in movie production despite the high cost involved in the production. Previous studies in this domain focus on the use of twitter data and IMDB reviews for the prediction of movies that have already been released. In this paper, we build a model for movie revenue prediction prior to the movie's release using YouTube trailer reviews. Our model consists of novel methods of calculating purchase intention, positive-to-negative sentiment ratio, and like-to-dislike ratio for movie revenue prediction. Our experimental results prove the superiority of our approach compared to three baseline approaches and achieved a relative absolute error of 29.65%.

29 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

15 May 2015
TL;DR: In this article, a universally applicable attitude and skill set for computer science is presented, which is a set of skills and attitudes that everyone would be eager to learn and use, not just computer scientists.
Abstract: It represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and use.

430 citations

Journal ArticleDOI
TL;DR: A comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature is provided and a taxonomy of these methods is presented.
Abstract: In recent years, unsupervised feature selection methods have raised considerable interest in many research areas; this is mainly due to their ability to identify and select relevant features without needing class label information. In this paper, we provide a comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature. We present a taxonomy of these methods and describe the main characteristics and the fundamental ideas they are based on. Additionally, we summarized the advantages and disadvantages of the general lines in which we have categorized the methods analyzed in this review. Moreover, an experimental comparison among the most representative methods of each approach is also presented. Finally, we discuss some important open challenges in this research area.

325 citations

Journal ArticleDOI
TL;DR: It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.
Abstract: Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

307 citations

Journal Article
TL;DR: A survey on the techniques used for designing software to mine opinion features in reviews and how Natural Language Processing techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted.
Abstract: Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customer’s opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.

229 citations