scispace - formally typeset
Search or ask a question
Author

Moch Arif Bijaksana

Bio: Moch Arif Bijaksana is an academic researcher from Telkom University. The author has contributed to research in topics: WordNet & Named-entity recognition. The author has an hindex of 7, co-authored 87 publications receiving 257 citations. Previous affiliations of Moch Arif Bijaksana include Queensland University of Technology & Telkom Institute of Technology.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: Word2Vec is a model used in this paper to represent words into vector form using the 320,000 articles in the English Wikipedia as the corpus and Cosine Similarity calculation method is used to determine the similarity value.

96 citations

Journal ArticleDOI
TL;DR: Substantial experiments show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods in text mining.
Abstract: It is a big challenge to guarantee the quality of discovered relevance features in text documents for describing user preferences because of large scale terms and data patterns. Most existing popular text mining and classification methods have adopted term-based approaches. However, they have all suffered from the problems of polysemy and synonymy. Over the years, there has been often held the hypothesis that pattern-based methods should perform better than term-based ones in describing user preferences; yet, how to effectively use large scale patterns remains a hard problem in text mining. To make a breakthrough in this challenging issue, this paper presents an innovative model for relevance feature discovery. It discovers both positive and negative patterns in text documents as higher level features and deploys them over low-level features (terms). It also classifies terms into categories and updates term weights based on their specificity and their distributions in patterns. Substantial experiments using this model on RCV1, TREC topics and Reuters-21578 show that the proposed model significantly outperforms both the state-of-the-art term-based methods and the pattern based methods.

67 citations

Proceedings ArticleDOI
01 Sep 2016
TL;DR: In this article, the authors proposed to enhance SMS spam filtering performance by combining two of data mining task association and classification, namely, FP-growth in association is utilized for mining frequent pattern on SMS and Naive Bayes Classifier is used to classify whether SMS is spam or ham.
Abstract: SMS (Short Message Service) is still the primary choice as a communication medium even though nowadays mobile phone is growing with a variety of communication media messenger applications. However, nowadays along with the SMS tariff reduction leads to the increase of SMS spam, as used by some people as an alternative to advertise and fraud. Therefore, it becomes an important issue as it can bug and harm the users and one of its solution is with automatic SMS spam filtering. One of most challenging in SMS spam filtering is its accuracy. In this research we proposed to enhanced SMS spam filtering performance by combining two of data mining task association and classification. FP-growth in association is utilized for mining frequent pattern on SMS and Naive Bayes Classifier is used to classify whether SMS is spam or ham. Training data was using SMS spam collection from previous research. The result of using collaboration of Naive Bayes and FP-Growth performs the highest average accuracy of 98, 506% and 0,025% better than without using FP-Growth for dataset SMS Spam Collection v.1, and improves the precision score; thus, the classification result is more accurate.

33 citations

Journal ArticleDOI
TL;DR: This model was built using a combination of deep learning and machine learning approaches, Bidirectional Long Short-Term Memory (BLSTM) and Conditional Random Field (CRF) as the solutions and identified entities identified in the form of Person, Location and Organization.

22 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: This paper investigates whether measurements of collective sentiment (mood) states about a product, extracted from Twitter feeds, are correlated to the value of the churn rate of the observed product, and test the hypothesis that mood states from Twitter are predictive of changes in churn rate values.
Abstract: Due to hypercompetitive internet service provider market, customers can easily move from one companies or operator to another if they did not obtain a good service. This customers' movement is a major issue for companies as the reason most often found why customer churn. Churn management is an important program for companies to maintain valuable customers thus predicting customer churn is crucial. In literature, churn analysis has been widely studied with a various churn analysis techniques. These techniques mostly utilize customer complaint data, customer tenure, customer usage, customer payment behaviour, etc. This paper employs customer opinions from Twitter with some specific keyword. We investigate whether measurements of collective sentiment (mood) states about a product, extracted from Twitter feeds, are correlated to the value of the churn rate of the observed product. This study examines the text content of daily Twitter feeds by Convolution Neural Network that measures positive, negative or neutral sentiment. Cross-validate the resulting mood time series with Granger causality analysis was conducted. A Recurrent Neural Network was then used to test the hypothesis that mood states from Twitter are predictive of changes in churn rate values. The results indicate that the accuracy of churn rate predictions can be improved by the inclusion of specific mood dimensions, that is negative sentiment, but not others. The Mean Average Percentage Error (MAPE) is about 1.47%.

15 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis and presents a taxonomy of sentiment analysis, which highlights the power of deep learning architectures for solving sentiment analysis problems.
Abstract: Social media is a powerful source of communication among people to share their sentiments in the form of opinions and views about any topic or article, which results in an enormous amount of unstructured information. Business organizations need to process and study these sentiments to investigate data and to gain business insights. Hence, to analyze these sentiments, various machine learning, and natural language processing-based approaches have been used in the past. However, deep learning-based methods are becoming very popular due to their high performance in recent times. This paper provides a detailed survey of popular deep learning models that are increasingly applied in sentiment analysis. We present a taxonomy of sentiment analysis and discuss the implications of popular deep learning architectures. The key contributions of various researchers are highlighted with the prime focus on deep learning approaches. The crucial sentiment analysis tasks are presented, and multiple languages are identified on which sentiment analysis is done. The survey also summarizes the popular datasets, key features of the datasets, deep learning model applied on them, accuracy obtained from them, and the comparison of various deep learning models. The primary purpose of this survey is to highlight the power of deep learning architectures for solving sentiment analysis problems.

385 citations

01 Jan 2008
TL;DR: In this special issue, the focus will be on the technical side, although other issues related to knowledge and data engineering for e-Iearning may also be considered.
Abstract: With the advent of the Internet, we are seeing more sophisticated techniques being developed to support e-Iearning. The rapid developme nt of Web-based learning and new concepts like virtual classrooms, virtual laboratories and virtual universities introduces many new issues to be addressed. On the technical side, we need to develop effective e-technologies for supporting distance education. On the learning and management side, we need to consider issues such as new style of learning and different system set-u p requirements. Finally, the issue of standardization of e-Iearning systems should also be considered. In this special issue, our focus will be on the technical side, although other issues related to knowledge and data engineering for e-Iearning may also be considered. Topics: In this special issue, we call for original papers describing novel knowledge and data engineering techniques that support e-Iearning. Preference will be given to papers that include an evaluation of users' experience in using the proposed methods. Areas of interests include, but are not limited to: • Semantic Web technology for e-Iearning • Data modeling (eg., XML) for efficient management of course materials • Searching and indexing techniques to suppo rt effective course notes retrieval • User-centric e-Iearning systems and user interaction management • Profiling techniques to support grading and learning recommendation • Data and knowledge base suppo rt for pervasive e-Iearning • Course material analysis and understanding • Automatic generation of questions and answers • Collaborative communities for e-Iearning

310 citations

Journal ArticleDOI
TL;DR: It is shown that with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures such as the Hilbert-Schmidt independence criterion and the globally optimal solution can be efficiently computed.
Abstract: The goal of supervised feature selection is to find a subset of input features that are responsible for predicting output values. The least absolute shrinkage and selection operator (Lasso) allows computationally efficient feature selection based on linear dependency between input features and output values. In this letter, we consider a feature-wise kernelized Lasso for capturing nonlinear input-output dependency. We first show that with particular choices of kernel functions, nonredundant features with strong statistical dependence on output values can be found in terms of kernel-based independence measures such as the Hilbert-Schmidt independence criterion. We then show that the globally optimal solution can be efficiently computed; this makes the approach scalable to high-dimensional problems. The effectiveness of the proposed method is demonstrated through feature selection experiments for classification and regression with thousands of features.

240 citations

Journal Article
TL;DR: A survey on the techniques used for designing software to mine opinion features in reviews and how Natural Language Processing techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted.
Abstract: Now days, E-commerce systems have become extremely important. Large numbers of customers are choosing online shopping because of its convenience, reliability, and cost. Client generated information and especially item reviews are significant sources of data for consumers to make informed buy choices and for makers to keep track of customer’s opinions. It is difficult for customers to make purchasing decisions based on only pictures and short product descriptions. On the other hand, mining product reviews has become a hot research topic and prior researches are mostly based on pre-specified product features to analyse the opinions. Natural Language Processing (NLP) techniques such as NLTK for Python can be applied to raw customer reviews and keywords can be extracted. This paper presents a survey on the techniques used for designing software to mine opinion features in reviews. Elven IEEE papers are selected and a comparison is made between them. These papers are representative of the significant improvements in opinion mining in the past decade.

229 citations

Journal ArticleDOI
TL;DR: This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade.
Abstract: Pervasive growth and usage of the Internet and mobile applications have expanded cyberspace. The cyberspace has become more vulnerable to automated and prolonged cyberattacks. Cyber security techniques provide enhancements in security measures to detect and react against cyberattacks. The previously used security systems are no longer sufficient because cybercriminals are smart enough to evade conventional security systems. Conventional security systems lack efficiency in detecting previously unseen and polymorphic security attacks. Machine learning (ML) techniques are playing a vital role in numerous applications of cyber security. However, despite the ongoing success, there are significant challenges in ensuring the trustworthiness of ML systems. There are incentivized malicious adversaries present in the cyberspace that are willing to game and exploit such ML vulnerabilities. This paper aims to provide a comprehensive overview of the challenges that ML techniques face in protecting cyberspace against attacks, by presenting a literature on ML techniques for cyber security including intrusion detection, spam detection, and malware detection on computer networks and mobile networks in the last decade. It also provides brief descriptions of each ML method, frequently used security datasets, essential ML tools, and evaluation metrics to evaluate a classification model. It finally discusses the challenges of using ML techniques in cyber security. This paper provides the latest extensive bibliography and the current trends of ML in cyber security.

135 citations