scispace - formally typeset
Search or ask a question
Author

Aytuğ Onan

Bio: Aytuğ Onan is an academic researcher from Izmir Kâtip Çelebi University. The author has contributed to research in topics: Artificial intelligence & Computer science. The author has an hindex of 16, co-authored 52 publications receiving 1227 citations. Previous affiliations of Aytuğ Onan include Ege University & Celal Bayar University.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability ofText classification schemes, which is of practical importance in the application fields of text classification.
Abstract: Text classification is a domain with high dimensional feature space.Extracting the keywords as the features can be extremely useful in text classification.An empirical analysis of five statistical keyword extraction methods.A comprehensive analysis of classifier and keyword extraction ensembles.For ACM collection, a classification accuracy of 93.80% with Bagging ensemble of Random Forest. Automatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naive Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification.

445 citations

Journal ArticleDOI
TL;DR: An ensemble approach for feature selection is presented, which aggregates the several individual feature lists obtained by the different feature selection methods so that a more robust and efficient feature subset can be obtained.
Abstract: Sentiment analysis is an important research direction of natural language processing, text mining and web mining which aims to extract subjective information in source materials The main challenge encountered in machine learning method-based sentiment classification is the abundant amount of data available This amount makes it difficult to train the learning algorithms in a feasible time and degrades the classification accuracy of the built model Hence, feature selection becomes an essential task in developing robust and efficient classification models whilst reducing the training time In text mining applications, individual filter-based feature selection methods have been widely utilized owing to their simplicity and relatively high performance This paper presents an ensemble approach for feature selection, which aggregates the several individual feature lists obtained by the different feature selection methods so that a more robust and efficient feature subset can be obtained In order to aggregate the individual feature lists, a genetic algorithm has been utilized Experimental evaluations indicated that the proposed aggregation model is an efficient method and it outperforms individual filter-based feature selection methods on sentiment classification

274 citations

Journal ArticleDOI
TL;DR: Experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed ensemble method can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting.
Abstract: Typically performed by supervised machine learning algorithms, sentiment analysis is highly useful for extracting subjective information from text documents online. Most approaches that use ensemble learning paradigms toward sentiment analysis involve feature engineering in order to enhance the predictive performance. In response, we sought to develop a paradigm of a multiobjective, optimization-based weighted voting scheme to assign appropriate weight values to classifiers and each output class based on the predictive performance of classification algorithms, all to enhance the predictive performance of sentiment classification. The proposed ensemble method is based on static classifier selection involving majority voting error and forward search, as well as a multiobjective differential evolution algorithm. Based on the static classifier selection scheme, our proposed ensemble method incorporates Bayesian logistic regression, naive Bayes, linear discriminant analysis, logistic regression, and support vector machines as base learners, whose performance in terms of precision and recall values determines weight adjustment. Our experimental analysis of classification tasks, including sentiment analysis, software defect prediction, credit risk modeling, spam filtering, and semantic mapping, suggests that the proposed classification scheme can predict better than conventional ensemble learning methods such as AdaBoost, bagging, random subspace, and majority voting. Of all datasets examined, the laptop dataset showed the best classification accuracy (98.86%).

272 citations

Journal ArticleDOI
TL;DR: The empirical results indicate that the proposed deep learning architecture outperforms the conventional deep learning methods on sentiment analysis on product reviews obtained from Twitter.
Abstract: Sentiment analysis is one of the major tasks of natural language processing, in which attitudes, thoughts, opinions, or judgments toward a particular subject has been extracted. Web is an unstructured and rich source of information containing many text documents with opinions and reviews. The recognition of sentiment can be helpful for individual decision makers, business organizations, and governments. In this article, we present a deep learning‐based approach to sentiment analysis on product reviews obtained from Twitter. The presented architecture combines TF‐IDF weighted Glove word embedding with CNN‐LSTM architecture. The CNN‐LSTM architecture consists of five layers, that is, weighted embedding layer, convolution layer (where, 1‐g, 2‐g, and 3‐g convolutions have been employed), max‐pooling layer, followed by LSTM, and dense layer. In the empirical analysis, the predictive performance of different word embedding schemes (ie, word2vec, fastText, GloVe, LDA2vec, and DOC2vec) with several weighting functions (ie, inverse document frequency, TF‐IDF, and smoothed inverse document frequency function) have been evaluated in conjunction with conventional deep neural network architectures. The empirical results indicate that the proposed deep learning architecture outperforms the conventional deep learning methods.

197 citations

Journal ArticleDOI
TL;DR: An ensemble classification scheme is presented, which integrates Random Subspace ensemble of Random Forest with four types of features (features used in authorship attribution, character n-grams, part of speech n- grams and the frequency of the most discriminative words) and the highest average predictive performance obtained by the proposed scheme is 94.43%.
Abstract: Text genre classification is the process of identifying functional characteristics of text documents. The immense quantity of text documents available on the web can be properly filtered, organised...

193 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Journal ArticleDOI
TL;DR: The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability ofText classification schemes, which is of practical importance in the application fields of text classification.
Abstract: Text classification is a domain with high dimensional feature space.Extracting the keywords as the features can be extremely useful in text classification.An empirical analysis of five statistical keyword extraction methods.A comprehensive analysis of classifier and keyword extraction ensembles.For ACM collection, a classification accuracy of 93.80% with Bagging ensemble of Random Forest. Automatic keyword extraction is an important research direction in text mining, natural language processing and information retrieval. Keyword extraction enables us to represent text documents in a condensed way. The compact representation of documents can be helpful in several applications, such as automatic indexing, automatic summarization, automatic classification, clustering and filtering. For instance, text classification is a domain with high dimensional feature space challenge. Hence, extracting the most important/relevant words about the content of the document and using these keywords as the features can be extremely useful. In this regard, this study examines the predictive performance of five statistical keyword extraction methods (most frequent measure based keyword extraction, term frequency-inverse sentence frequency based keyword extraction, co-occurrence statistical information based keyword extraction, eccentricity-based keyword extraction and TextRank algorithm) on classification algorithms and ensemble methods for scientific text document classification (categorization). In the study, a comprehensive study of comparing base learning algorithms (Naive Bayes, support vector machines, logistic regression and Random Forest) with five widely utilized ensemble methods (AdaBoost, Bagging, Dagging, Random Subspace and Majority Voting) is conducted. To the best of our knowledge, this is the first empirical analysis, which evaluates the effectiveness of statistical keyword extraction methods in conjunction with ensemble learning algorithms. The classification schemes are compared in terms of classification accuracy, F-measure and area under curve values. To validate the empirical analysis, two-way ANOVA test is employed. The experimental analysis indicates that Bagging ensemble of Random Forest with the most-frequent based keyword extraction method yields promising results for text classification. For ACM document collection, the highest average predictive performance (93.80%) is obtained with the utilization of the most frequent based keyword extraction method with Bagging ensemble of Random Forest algorithm. In general, Bagging and Random Subspace ensembles of Random Forest yield promising results. The empirical analysis indicates that the utilization of keyword-based representation of text documents in conjunction with ensemble learning can enhance the predictive performance and scalability of text classification schemes, which is of practical importance in the application fields of text classification.

445 citations

Journal ArticleDOI
01 Feb 2017-Catena
TL;DR: Analysis of results indicates that landslide models using machine learning ensemble frameworks are promising methods which can be used as alternatives of individual base classifiers for landslide susceptibility assessment of other prone areas.
Abstract: The main objective of this study is to evaluate and compare the performance of landslide models using machine learning ensemble technique for landslide susceptibility assessment. This technique is a combination of ensemble methods (AdaBoost, Bagging, Dagging, MultiBoost, Rotation Forest, and Random SubSpace) and the base classifier of Multiple Perceptron Neural Networks (MLP Neural Nets). Ensemble techniques have been widely applied in other fields; however, their application is still rare in the assessment of landslide problems. Meanwhile, MLP Neural Nets, which is known as an artificial neural network, has been applied widely and efficiently in landslide problems. In the present study, landslide models of part Himalayan area (India) have been constructed and validated. For the evaluation and comparison of these models, receiver operating characteristic curve and Chi Square test methods have been applied. Overall, all landslide models performed well in landslide susuceptibility assessment but the performance of the MultiBoost model is the highest (AUC = 0.886), followed by Dagging model (AUC = 0.885), the Rotation Forest model (AUC = 0.882), the Bagging and Random SubSpace models (AUC = 0.881), and the AdaBoost model (AUC = 0.876), respectively. Moreover, machine learning ensemble models have improved significantly the performance of the base classifier of MLP Neural Nets (AUC = 0.874). Analysis of results indicates that landslide models using machine learning ensemble frameworks are promising methods which can be used as alternatives of individual base classifiers for landslide susceptibility assessment of other prone areas.

436 citations

Journal ArticleDOI
TL;DR: Results indicate that the proposed Bagging-LMT model can be used for sustainable management of flood-prone areas and outperformed all state-of-the-art benchmark soft computing models.
Abstract: A new artificial intelligence (AI) model, called Bagging-LMT - a combination of bagging ensemble and Logistic Model Tree (LMT) - is introduced for mapping flood susceptibility. A spatial database was generated for the Haraz watershed, northern Iran, that included a flood inventory map and eleven flood conditioning factors based on the Information Gain Ratio (IGR). The model was evaluated using precision, sensitivity, specificity, accuracy, Root Mean Square Error, Mean Absolute Error, Kappa and area under the receiver operating characteristic curve criteria. The model was also compared with four state-of-the-art benchmark soft computing models, including LMT, logistic regression, Bayesian logistic regression, and random forest. Results revealed that the proposed model outperformed all these models and indicate that the proposed model can be used for sustainable management of flood-prone areas.

372 citations