scispace - formally typeset
Search or ask a question
Author

Mohd Ridzwan Yaakub

Bio: Mohd Ridzwan Yaakub is an academic researcher from National University of Malaysia. The author has contributed to research in topics: Sentiment analysis & Feature selection. The author has an hindex of 9, co-authored 37 publications receiving 287 citations. Previous affiliations of Mohd Ridzwan Yaakub include Queensland University of Technology.

Papers
More filters
Journal ArticleDOI
TL;DR: This study focuses on the use of product reviews for identifying whether the reviews signify the intention of purchase or not and proposes a novel lexicon-based approach for the classification of product Reviews into those that signify theintention of purchase and those that do not signify the intend of purchase.
Abstract: Extracting people’s opinions from social media has attracted a large number of studies over the years. This is as a result of the growing popularity of social media. People share their sentiments and opinions via these social media platforms. Therefore, extracting and analyzing these sentiments is beneficial in many ways, for example, business intelligence. However, despite a large number of studies on extracting and analyzing social media data, only a fraction of these studies focuses on its practical application. In this study, we focus on the use of product reviews for identifying whether the reviews signify the intention of purchase or not. Therefore, we propose a novel lexicon-based approach for the classification of product reviews into those that signify the intention of purchase and those that do not signify the intention of purchase. We evaluated our proposed approach using a benchmark dataset based on accuracy, precision, and recall. The experimental results obtained prove the efficiency of our proposed approach to purchase intention identification.

4 citations

Journal Article
TL;DR: In this paper, a new architecture for opinion mining is proposed, which uses a multidimensional model to integrate customers' characteristics and their comments about products (or services) and transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations.
Abstract: As e-commerce is becoming more and more popular, the number of customer reviews that a product receives grows rapidly. In order to enhance customer satisfaction and their shopping experiences, it has become important to analysis customers reviews to extract opinions on the products that they buy. Thus, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers’ behavior for businesses purpose. The right decision in producing new products or services based on data about customers’ characteristics means profit for organization/company. This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers’ characteristics and their comments about products (or services). The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations. This research presents a comprehensive way to calculate customers’ orientation for all possible products’ attributes.

4 citations

Proceedings ArticleDOI
01 Dec 2019
TL;DR: This paper summarizes the findings using sentiment analysis as well as comparing it to the quantitative data obtained from the survey, where most teachers agreed upon the benefits of ICT use and conclude more positive sentiment polarity.
Abstract: Sentiment analysis in gaining more attention as it is increasingly used in multiple domains, including in interpreting educational data. The article uses sentiment analysis technique to understand the early childhood educators reported beliefs (perception) on young children’s ICT use. The dataset was obtained from a comparative study of early childhood educators from two countries, Australia and Malaysia. The result shows a similar outcome where most teachers agreed upon the benefits of ICT use and conclude more positive sentiment polarity.This paper summarizes the findings using sentiment analysis as well as comparing it to the quantitative data obtained from the survey.

4 citations

Journal Article
TL;DR: A hybrid categorization method for Arabic text mining that combines the merits of statistical classifier (NB) and rule based classifiers (AC) in one framework and tried to overcome their limitations is proposed.
Abstract: Text categorization is one of key technology for organizing digital dataset. The Naiv Bayes (NB) is popular categorization method due its efficiency and less time complexity, and the Associative Classification (AC) approach has the capability to produces classifier rival to those learned by traditional categorization techniques. However, the independence assumption for text features and the omission of feature frequencies in NB method violates its performance when the selected features are not highly correlated to text categories. Likewise, the lack of useful discovery and usage of categorization rules is the major problem of AC and its performance is declined with large set of rules. This paper proposed a hybrid categorization method for Arabic text mining that combines the merits of statistical classifier (NB) and rule based classifier (AC) in one framework and tried to overcome their limitations. In the first stage, the useful categorization rules are discovered using AC approach and ensure that associated features are highly correlated to their categories. In the second stage, the NB is utilized at the back end of discovery process and takes the discovered rules, concatenates the associated features for each category and classifies texts based on the statistical information of associated features. The proposed method was evaluated on three Arabic text datasets with multiple categories with and without feature selection methods. The experimental results showed that the hybrid method outperforms AC individually with/without feature selection methods and it is better than NB in few cases only with some feature selection methods when the selected feature subset was small.

3 citations

Journal Article
TL;DR: The experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.
Abstract: This research paper aims to propose a hybrid of ant colony optimization (ACO) and k-nearest neighbour (KNN) algorithms as feature selections for selecting and choosing relevant features from customer review datasets. Information gain (IG), genetic algorithm (GA), and rough set attribute reduction (RSAR) were used as baseline algorithms in a performance comparison with the proposed algorithm. This paper will also discuss the significance test, which was used to evaluate the performance differences between the ACO-KNN, the IG-GA, and the IG-RSAR algorithms. The dependency relation algorithm was used to identify actual features commented by customers by linking the dependency relation between product feature and sentiment words in customers sentences. This study evaluated the performance of the ACOKNN algorithm using precision, recall, and F-score, which was validated using the parametric statistical significance tests. The evaluation process has statistically proven that this ACO-KNN algorithm has been significantly improved compared to the baseline algorithms. In addition, the experimental results have proven that the ACO-KNN can be used as a feature selection technique in sentiment analysis to obtain quality, optimal feature subset that can represent the actual data in customer review data.

3 citations


Cited by
More filters
Journal ArticleDOI

[...]

08 Dec 2001-BMJ
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …

33,785 citations

15 May 2015
TL;DR: In this article, a universally applicable attitude and skill set for computer science is presented, which is a set of skills and attitudes that everyone would be eager to learn and use, not just computer scientists.
Abstract: It represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and use.

430 citations

Journal ArticleDOI
TL;DR: A comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature is provided and a taxonomy of these methods is presented.
Abstract: In recent years, unsupervised feature selection methods have raised considerable interest in many research areas; this is mainly due to their ability to identify and select relevant features without needing class label information. In this paper, we provide a comprehensive and structured review of the most relevant and recent unsupervised feature selection methods reported in the literature. We present a taxonomy of these methods and describe the main characteristics and the fundamental ideas they are based on. Additionally, we summarized the advantages and disadvantages of the general lines in which we have categorized the methods analyzed in this review. Moreover, an experimental comparison among the most representative methods of each approach is also presented. Finally, we discuss some important open challenges in this research area.

325 citations

Journal ArticleDOI
TL;DR: It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.
Abstract: Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.

307 citations

Journal Article
TL;DR: A survey on the techniques used for designing software to mine opinion features in reviews and how Natural Language Processing techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted.
Abstract: Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customer’s opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.

229 citations