scispace - formally typeset
Search or ask a question
Author

Muhammad Zidny Naf’an

Other affiliations: Gadjah Mada University
Bio: Muhammad Zidny Naf’an is an academic researcher from Telkom Institute of Technology. The author has contributed to research in topics: Sentiment analysis & Question answering. The author has an hindex of 5, co-authored 14 publications receiving 53 citations. Previous affiliations of Muhammad Zidny Naf’an include Gadjah Mada University.

Papers
More filters
Journal ArticleDOI
12 Apr 2019
TL;DR: This study aims to create a system that can classify comments whether they contain elements of cyberbullying or not, and the results of the classification will be used to detect cyberbullies comments.
Abstract: Instagram is a social media for sharing images, photos and videos. Instagram has many active users from various circles. In addition to sharing submissions, Instagram users can also give likes and comments to other users' posts. However, the comment feature is often misused, for example it is used for cyberbullying which includes one act against the law. But until now, Instagram still does not provide a feature to detect cyberbullying. Therefore, this study aims to create a system that can classify comments whether they contain elements of cyberbullying or not. The results of the classification will be used to detect cyberbullying comments. The algorithm used for classification is Naive Bayes Classifier. Then for each comment will pass the preprocessing and feature extraction stages with the TF-IDF method. For evaluation and testing using the K-Fold Cross Validation method. The experiment is divided into two, namely using stemming and without stemming. The training data used is 455 data. The best experimental results obtained an accuracy of 84% both with stemming, and without stemming.

27 citations

Journal ArticleDOI
26 Mar 2019
TL;DR: This study aims to detect the similarity of text documents using the cosine similarity algorithm and weighting TF-IDF so that it can be used to determine the value of plagiarism.
Abstract: Plagiarisme merupakan tindakan mengambil sebagian atau seluruh ide seseorang berupa dokumen maupun teks tanpa mencantumkan sumber pengambilan informasi. Penelitian ini bertujuan untuk mendeteksi kemiripan dokumen teks menggunakan algoritma cosine similarity dan pembobotan TF-IDF sehingga dapat digunakan untuk menentukan nilai plagiarisme. Dokumen yang digunakan untuk perbandingan teks ini adalah abstrak bahasa Indonesia. Hasil penelitian yaitu saat dilakukan stemming nilai kemiripan lebih tinggi rata-rata 10% daripada tidak dilakukan proses stemming. Penelitian ini menghasilkan nilai similaritas diatas 50% untuk dokumen yang tingkat kemiripannya tinggi. Sedangkan untuk dokumen dengan tingkat kemiripan rendah atau tidak berplagiat menghasilkan nilai similarity dibawah 40%. Dengan metode yang digunakan pada preprocessing yang terdiri dari case folding, tokenizing, stopword removeal, dan stemming. Setelah proses preprocessing maka tahap selanjutnya dilakukan perhitungan pembobotan TF-IDF dan nilai kemiripan menggunakan cosine similarity sehingga mendapatkan nilai persentase kemiripan. Berdasarkan hasil percobaan algoritma cosine similarity dan pembobotan TF-IDF mampu menghasilkan nilai kemiripan dari masing-masing dokumen pembanding

14 citations

Journal ArticleDOI
27 Feb 2017
TL;DR: Tanda tangan merupakan penanda atau identitas ying ada pada suatu dokumen. as mentioned in this paper, tahap selanjutnya mengkonversi citra tangan tangan digital dari true color menjadi binary.
Abstract: Tanda tangan merupakan penanda atau identitas yang ada pada suatu dokumen. Tanda tangan mempunyai peranan penting dalam memverifikasi dan melegalisasi dokumen. Tujuan dari penelitian ini menerapkan teknik pengolahan citra pada tanda tangan dan mengidentifikasi pola citra tanda tangan berdasarkan nilai entropi dan waktu perhitungan nilai entropi. Tahapan penelitian meliputi pengambilan data responden berupa tanda tangan citra analog, berikutnya akusisi citra tanda tangan digital dengan cara memindai tanda tangan tersebut, tahap selanjutnya mengkonversi citra tangan tangan digital dari true color menjadi binary. Tahap akhir melakukan perhitungan nilai entropi dan mencatat waktu perhitungan nilai entropi dengan menggunakan software matlab dan dilihat sebaran nilai entropi dari masing - masing citra tanda tangan. Sebaran nilai entropi pada tanda tangan asli mempunyai error 3,31% dari total responden (30 responden). Nilai error ini merupakan nilai entropi yang keluar dari kelompoknya. Waktu perhitungan nilai entropi pada tanda tangan palsu jika coretan atau piksel pada citra lebih besar dari citra tanda tangan asli maka waktu perhitungan nilai entropinya lebih lama dibandingkan dengan citra tanda tangan asli.

9 citations

Journal ArticleDOI
01 Mar 2018
TL;DR: The test results show that the method of MOORA with Ratio System approach without sentiment analysis has the best accuracy among other approaches.
Abstract: Besides specification and price, smartphone reviews can affect on consumer interest buying. This study aims to use the value of smartphone review sentiment as one of the attributes/criterias in addition to specifications and prices on the calculation of Decision Support System using MOORA method to generate smartphone recommendations. Sentiment value is obtained from sentiment analysis using SentiWordNet. There are two approaches of MOORA method used in this research, Ratio System and Reference Point Approach. Testing has been done by comparing the results of smartphone recommendations between approaches on the MOORA method, with or without sentiment analysis, on smartphone rankings based on the number of smartphone fans on the GSM Arena site. The test results show that the method of MOORA with Ratio System approach without sentiment analysis has the best accuracy among other approaches.

6 citations

Proceedings ArticleDOI
01 Nov 2016
TL;DR: This paper discusses some attempts conducted to improve performance of Question Answering System for Khulafaa Al-Rashidin (Caliphs) history (called as QAKH), which showed that only 61,67% questions got a correct answer in 2012.
Abstract: This paper discusses some attempts conducted to improve performance of Question Answering System for Khulafaa Al-Rashidin (Caliphs) history (called as QAKH). Experiments done on QAKH in 2012 showed that only 61,67% questions got a correct answer. The one contributed in this achievement was lack of Indonesian stemming process that implemented by utilizing Lucene library. It was found that Lucene did over stemming on some Indonesian word in indexing phase, so that it was delivered impacts on passage retrieval and answer extraction as well. We tried to implement two approaches in order to solve that problem. As the first, we used other library for doing stemming process for Indonesian words that developed by Information Retrieval laboratorium in University of Indonesia. In any case if there is no answer delivered, we shifted to the second approach where there is no stemming process applied. Result of evaluation demonstrated a better achievement where we got 66,67% correct answers. Analysis of each fold of experiments also discussed.

5 citations


Cited by
More filters
Journal ArticleDOI
TL;DR: This study transformed sentiment analysis into a multi-classification problem based on machine learning methods, and further designed a keyword semantic expansion method based on a knowledge graph to build an effective sentiment classification model for online travel review text.
Abstract: In recent years, the number of review texts on online travel review sites has increased dramatically, which has provided a novel source of data for travel research. Sentiment analysis is a process that can extract tourists’ sentiments regarding travel destinations from online travel review texts. The results of sentiment analysis form an important basis for tourism decision making. Thus far, there has been minimal concern as to how sentiment analysis methods can be effectively applied to improve the effect of sentiment analysis. However, online travel review texts are largely short texts characterized by uneven sentiment distribution, which makes it difficult to obtain accurate sentiment analysis results. Accordingly, in order to improve the sentiment classification accuracy of online travel review texts, this study transformed sentiment analysis into a multi-classification problem based on machine learning methods, and further designed a keyword semantic expansion method based on a knowledge graph. Our proposed method extracts keywords from online travel review texts and obtains the concept list of keywords through Microsoft Knowledge Graph. This list is then added to the review text to facilitate the construction of semantically expanded classification data. Our proposed method increases the number of classification features used for short text by employing the huge corpus of information associated with the knowledge graph. In addition, this article introduces online travel review text preprocessing, keyword extraction, text representation, sampling, establishment classification labeling, and the selection and application of machine learning-based sentiment classification methods in order to build an effective sentiment classification model for online travel review text. Experiments were implemented and evaluated based on the English review texts of four famous attractions in four countries on the TripAdvisor website. Our experimental results demonstrate that the method proposed in this paper can be used to effectively improve the accuracy of the sentiment classification of online travel review texts. Our research attempts to emphasize and improve the methodological relevance and applicability of sentiment analysis for future travel research.

31 citations

Journal ArticleDOI
TL;DR: The abusive message detection model proposed in this study can contribute to the development of Turkish comment filters on Instagram by selecting the best model that gives better recognition accuracy.
Abstract: Instagram is a free photo-sharing platform where each user has a profile and can upload photos for followers to view, like, and comment. Abusive comments on images can be humiliating and harmful to those who share photos. Developing a comment filter in languages other than English is difficult and time-consuming. This paper proposes a dataset called Abusive Turkish Comments (ATC) to detect abusive Instagram comments in Turkish. It is composed of a large number of Instagram comments posted to tabloid and sports accounts (i.e., 10,528 abusive and 19,826 not-abusive). It is the first public dataset dedicated to detecting abusive Turkish messages, as far as we know. The sentiment annotation has been done in sentence-level by assigning polarity to each comment. The performance of the abusive message detection models was evaluated using several performance metrics: Convolutional Neural Network (CNN), five well-known classifiers (i.e., Naive Bayes, Support Vector Machine, Decision Tree, Random Forest, and Logistic Regression), and two reweighted classifiers (i.e., Adaptive Boosting (AdaBoost), eXtreme Gradient Boosting (XGBoost)) were compared in terms of F1-score, precision, and recall. The results showed that the best performance (i.e., Micro-averaged F1-score: 0.974, Macro-averaged F1-score: 0.973, Kappa-value: 0.946) was yielded by the CNN model on the oversampled ATC dataset. The abusive message detection model proposed in this study can contribute to the development of Turkish comment filters on Instagram. Different model combinations are considered to select the best model that gives better recognition accuracy.

15 citations

Journal ArticleDOI
30 Oct 2020
TL;DR: A model for categorizing pantun types and analyze the accuracy of support vector machines (SVM) showed that SVM classified the types of pantun with accuracy of 81,91%.
Abstract: This study aims to create a model for categorizing pantun types and analyze the accuracy of support vector machines (SVM). The first stage is collecting pantun that have been labeled with pantun category. The pantun categories consist of pantun for children, pantun for young people, and pantun for elder. After collecting data, the next stage is pre-processing. This pre-processing stage makes data ready to be processed on the extraction stage. The pre-processing stage consists of text segmentation, case folding, tokenization, stop word removal, and stemming. The feature extraction stage is intended to analyze potential information and represent terms as a vector. Separating training data and testing data is necessary to be conducted before the classification process. Then the classification process is done by using multiclass SVM. The results of the classification are evaluated to obtain accuracy and will be analyzed whether the classification model is proper to be used. The results showed that SVM classified the types of pantun with accuracy of 81,91%.

14 citations

Proceedings ArticleDOI
01 Nov 2017
TL;DR: The results of this review show most existing research on text mining based on complexity, ambiguities, and optimizing for the SQA system application can be beneficial to researchers in this area to carry it to the next level.
Abstract: Nowadays there is an increasing trend in computers used for learning Islamic knowledge from Indonesian Translation of AL-Quran (ITQ). As a result, substantial knowledge is stored in the form of unstructured text on chapters (surah) of ITQ. Text mining is exciting research area incorporated with information extraction, natural language processing, information retrieval, and data mining. It tries to discover knowledge from unstructured text. Text mining on ITQ is an ability to process Indonesian text into sentences (ayat) or documents (surah), interpret its text meaningfully, and identify as well as extract relationship among concept to directly answer the question of interest. This paper presents a review of concepts, searching and question answer (SQA) applications, and issues on text mining for ITQ. We reviewed the research papers highlighted some of the problems, gaps, critical challenges in this area and proposed some future research directions. Review method is composed of three phases: planning, conducting, and reporting the review. The results of this review show most existing research on text mining based on complexity, ambiguities, and optimizing for the SQA system application. Finally, this review can be beneficial to researchers in this area to carry it to the next level.

13 citations

Journal ArticleDOI
06 Oct 2019
TL;DR: A study to classify news into 12 classes automatically against 360 Indonesian news data showed that the data using only stemming without stopword removal, using the MI selection feature and SVM classification method produced the best results of 94.24%, compared to the other methods.
Abstract: News is a source of information disseminated in various types of media. In order to make it easier for news readers to obtain the desired news, the news needs to be classified. The large number of scattered news creates difficulties in classifying the news based on the topic. Therefore the author conducted a study to classify news into 12 classes (culture, economy, entertainment, law, health, life, automotive, education, politics, sports, technology, and tourism) automatically against 360 Indonesian news data. In this study several test scenarios were conducted to see the effect of stopword removal and stemming methods on data preprocessing, the effect of mutual information in selecting features, and performance of Support Vector Machine in classifying news data. The test results showed that the data using only stemming without stopword removal, using the MI selection feature and SVM classification method produced the best results of 94.24%, compared to the other methods.

10 citations