Mohd Ridzwan Yaakub
Other affiliations: Queensland University of Technology
Bio: Mohd Ridzwan Yaakub is an academic researcher from National University of Malaysia. The author has contributed to research in topic(s): Sentiment analysis & Feature selection. The author has an hindex of 9, co-authored 37 publication(s) receiving 287 citation(s). Previous affiliations of Mohd Ridzwan Yaakub include Queensland University of Technology.
01 Jan 2019
28 Jul 2015
TL;DR: It can be concluded that metaheuristic based algorithms have the potential to be implemented in sentiment analysis research and can produce an optimal subset of features by eliminating features that are irrelevant and redundant.
Abstract: Sentiment analysis functions by analyzing and extracting opinions from documents, websites, blogs, discussion forums and others to identify sentiment patterns on opinions expressed by consumers. It analyzes people's sentiment and identifies types of sentiment in comments expressed by consumers on certain matters. This paper highlights comparative studies on the types of feature selection in sentiment analysis based on natural language processing and modern methods such as Genetic Algorithm and Rough Set Theory. This study compares feature selection in text classification based on traditional and sentiment analysis methods. Feature selection is an important step in sentiment analysis because a suitable feature selection can identify the actual product features criticized or discussed by consumers. It can be concluded that metaheuristic based algorithms have the potential to be implemented in sentiment analysis research and can produce an optimal subset of features by eliminating features that are irrelevant and redundant.
01 Jan 2013
TL;DR: In this paper, a multi-dimensional model for opinion mining is proposed, which integrates customers' characteristics and their opinion about any products, and the model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location.
Abstract: Online business or Electronic Commerce (EC) is getting popular among customers today, as a result large number of product reviews have been posted online by the customers. This information is very valuable not only for prospective customers to make decision on buying product but also for companies to gather information of customers’ satisfaction about their products. Opinion mining is used to capture customer reviews and separated this review into subjective expressions (sentiment word) and objective expressions (no sentiment word). This paper proposes a novel, multi-dimensional model for opinion mining, which integrates customers’ characteristics and their opinion about any products. The model captures subjective expression from product reviews and transfers to fact table before representing in multi-dimensions named as customers, products, time and location. Data warehouse techniques such as OLAP and Data Cubes were used to analyze opinionated sentences. A comprehensive way to calculate customers’ orientation on products’ features and attributes are presented in this paper.
TL;DR: This study hybridize two n-gram models, unigram and n- gram, and applied Laplace smoothing to Naïve Bayesian classifier and Katz back-off on the model in order to smoothen and address the limitation of accuracy in terms of precision and recall of n- Gram models caused by the ‘zero count problem.’
Abstract: Twitter, an online micro-blogging and social networking service, provides registered users the ability to write in 140 characters anything they wish and hence providing them the opportunity to express their opinions and sentiments on events taking place. Politically sentimental tweets are top-trending tweets; whenever election is near, users tweet about their favorite candidates or political parties and at times give their reasons for that. In this study, we hybridize two n-gram [two n-gram models used in this study are unigram and n-gram. Therefore, in this study, where unigram is mentioned that refers to a least-order n-gram (unigram) and where n-gram is mentioned that refers to the highest-order (full sentence or tweet level) n-gram] models and applied Laplace smoothing to Naive Bayesian classifier and Katz back-off on the model. This was done in order to smoothen and address the limitation of accuracy in terms of precision and recall of n-gram models caused by the ‘zero count problem.’ Result from our baseline model shows an increase of 6.05% in average F-Harmonic accuracy in comparison with the n-gram model and 1.75% increase in comparison with the semantic-topic model proposed from a previous study on the same dataset, i.e., Obama–McCain dataset.
19 Oct 2011
TL;DR: This research presents a comprehensive way to calculate customers' orientation for all possible products' attributes and uses a multidimensional model to integrate customers' characteristics and their comments about products (or services) to achieve this objective.
Abstract: Nowadays, Opinion Mining is getting more important than before especially in doing analysis and forecasting about customers' behavior for businesses purpose The right decision in producing new products or services based on data about customers' characteristics means profit for organization/company This paper proposes a new architecture for Opinion Mining, which uses a multidimensional model to integrate customers' characteristics and their comments about products (or services) The key step to achieve this objective is to transfer comments (opinions) to a fact table that includes several dimensions, such as, customers, products, time and locations This research presents a comprehensive way to calculate customers' orientation for all possible products' attributes A use case study is also presented in this paper to show the advantages of using OLAP and data cubes to analyze costumers' opinions
TL;DR: There is, I think, something ethereal about i —the square root of minus one, which seems an odd beast at that time—an intruder hovering on the edge of reality.
Abstract: There is, I think, something ethereal about i —the square root of minus one. I remember first hearing about it at school. It seemed an odd beast at that time—an intruder hovering on the edge of reality. Usually familiarity dulls this sense of the bizarre, but in the case of i it was the reverse: over the years the sense of its surreal nature intensified. It seemed that it was impossible to write mathematics that described the real world in …
15 May 2015
TL;DR: In this article, a universally applicable attitude and skill set for computer science is presented, which is a set of skills and attitudes that everyone would be eager to learn and use, not just computer scientists.
Abstract: It represents a universally applicable attitude and skill set everyone, not just computer scientists, would be eager to learn and use.
TL;DR: A survey on the techniques used for designing software to mine opinion features in reviews and how Natural Language Processing techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted.
Abstract: Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customer’s opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.
01 Jan 2005
TL;DR: The goal is to help developers find the most suitable language for their representation needs in the Semantic Web, which has a need for languages to represent the semantic information that this Web requires.
Abstract: being used in many other applications to explicitly declare the knowledge embedded in them. However, not only are ontologies useful for applications in which knowledge plays a key role, but they can also trigger a major change in current Web contents. This change is leading to the third generation of the Web—known as the Semantic Web—which has been defined as “the conceptual structuring of the Web in an explicit machine-readable way.”1 This definition does not differ too much from the one used for defining an ontology: “An ontology is an explicit, machinereadable specification of a shared conceptualization.”2 In fact, new ontology-based applications and knowledge architectures are developing for this new Web. A common claim for all of these approaches is the need for languages to represent the semantic information that this Web requires—solving the heterogeneous data exchange in this heterogeneous environment. Here, we don’t decide which language is best of the Semantic Web. Rather, our goal is to help developers find the most suitable language for their representation needs.
TL;DR: It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines and Random Forests, while being the fastest algorithm in terms of prediction efficiency.
Abstract: Up-to-date report on the accuracy and efficiency of state-of-the-art classifiers.We compare the accuracy of 11 classification algorithms pairwise and groupwise.We examine separately the training, parameter-tuning, and testing time.GBDT and Random Forests yield highest accuracy, outperforming SVM.GBDT is the fastest in testing, Naive Bayes the fastest in training. Current benchmark reports of classification algorithms generally concern common classifiers and their variants but do not include many algorithms that have been introduced in recent years. Moreover, important properties such as the dependency on number of classes and features and CPU running time are typically not examined. In this paper, we carry out a comparative empirical study on both established classifiers and more recently proposed ones on 71 data sets originating from different domains, publicly available at UCI and KEEL repositories. The list of 11 algorithms studied includes Extreme Learning Machine (ELM), Sparse Representation based Classification (SRC), and Deep Learning (DL), which have not been thoroughly investigated in existing comparative studies. It is found that Stochastic Gradient Boosting Trees (GBDT) matches or exceeds the prediction performance of Support Vector Machines (SVM) and Random Forests (RF), while being the fastest algorithm in terms of prediction efficiency. ELM also yields good accuracy results, ranking in the top-5, alongside GBDT, RF, SVM, and C4.5 but this performance varies widely across all data sets. Unsurprisingly, top accuracy performers have average or slow training time efficiency. DL is the worst performer in terms of accuracy but second fastest in prediction efficiency. SRC shows good accuracy performance but it is the slowest classifier in both training and testing.