scispace - formally typeset
Search or ask a question
Author

Alp Kut

Bio: Alp Kut is an academic researcher from Dokuz Eylül University. The author has contributed to research in topics: Data warehouse & Cluster analysis. The author has an hindex of 7, co-authored 31 publications receiving 1085 citations.

Papers
More filters
Journal ArticleDOI
01 Jan 2007
TL;DR: A new density-based clustering algorithm based on DBSCAN, which has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects, is presented and an implementation of the algorithm is shown by using this data warehouse and the data mining results are presented.
Abstract: This paper presents a new density-based clustering algorithm, ST-DBSCAN, which is based on DBSCAN. We propose three marginal extensions to DBSCAN related with the identification of (i) core objects, (ii) noise objects, and (iii) adjacent clusters. In contrast to the existing density-based clustering algorithms, our algorithm has the ability of discovering clusters according to non-spatial, spatial and temporal values of the objects. In this paper, we also present a spatial-temporal data warehouse system designed for storing and clustering a wide range of spatial-temporal data. We show an implementation of our algorithm by using this data warehouse and present the data mining results.

1,081 citations

Journal ArticleDOI
09 Oct 2006
TL;DR: A new outlier detection algorithm is introduced to find small groups of data objects that are exceptional when compared with rest large amount of data to detect spatio-temporal outliers in large databases.
Abstract: Outlier detection is one of the major data mining methods. This paper proposes a three-step approach to detect spatio-temporal outliers in large databases. These steps are clustering, checking spatial neighbors, and checking temporal neighbors. In this paper, we introduce a new outlier detection algorithm to find small groups of data objects that are exceptional when compared with rest large amount of data. In contrast to the existing outlier detection algorithms, new algorithm has the ability of discovering outliers according to the non-spatial, spatial and temporal values of the objects. In order to demonstrate the new algorithm, this paper also presents an example application using a data warehouse

108 citations

Journal ArticleDOI
TL;DR: Experimental results show that the incremental genetic algorithm considerably decreases the time needed for training to construct a new classifier with the new dataset, which is highly desirable to perform these updates incrementally.
Abstract: Traditionally, data mining tasks such as classification and clustering are performed on data warehouses. Usually, updates are collected and applied to the data warehouse frequent time periods. For this reason, all patterns derived from the data warehouse have to be updated frequently as well. Due to the very large volumes of data, it is highly desirable to perform these updates incrementally. This study proposes a new incremental genetic algorithm for classification for efficiently handling new transactions. It presents the comparison results of traditional genetic algorithm and incremental genetic algorithm for classification. Experimental results show that our incremental genetic algorithm considerably decreases the time needed for training to construct a new classifier with the new dataset. This study also includes the sensitivity analysis of the incremental genetic algorithm parameters such as crossover probability, mutation probability, elitism and population size. In this analysis, many specific models were created using the same training dataset but with different parameter values, and then the performances of the models were compared.

25 citations

Book ChapterDOI
19 Jul 2013
TL;DR: A new clustering algorithm SOM++ is introduced, which first uses K-Means++ method to determine the initial weight values and the starting points, and then uses Self-Organizing Map (SOM) to find the final clustering solution.
Abstract: Data clustering is an important and widely used task of data mining that groups similar items together into subsets. This paper introduces a new clustering algorithm SOM++, which first uses K-Means++ method to determine the initial weight values and the starting points, and then uses Self-Organizing Map (SOM) to find the final clustering solution. The purpose of this algorithm is to provide a useful technique to improve the solution of the data clustering and data mining in terms of runtime, the rate of unstable data points and internal error. This paper also presents the comparison of our algorithm with simple SOM and K-Means + SOM by using a real world data. The results show that SOM++ has a good performance in stability and significantly outperforms three other methods training time.

20 citations

Proceedings ArticleDOI
02 May 2018
TL;DR: The application of four fundamental ensemble learning methods with five different classification algorithms with the most optimal parameter values on signal datasets with the best classification performance was obtained with the Random Forest algorithm which is a Bagging based method.
Abstract: In recent years, the machine learning algorithms commenced to be used widely in signal classification area as well as many other areas. Ensemble learning has become one of the most popular Machine Learning approaches due to the high classification performance it provides. In this study, the application of four fundamental ensemble learning methods (Bagging, Boosting, Stacking, and Voting) with five different classification algorithms (Neural Network, Support Vector Machines, k-Nearest Neighbor, Naive Bayes, and C4.5) with the most optimal parameter values on signal datasets is presented. In the experimental studies, ensemble learning methods were applied on 14 different signal datasets and the results were compared in terms of classification accuracy rates. According to the results, the best classification performance was obtained with the Random Forest algorithm which is a Bagging based method.

12 citations


Cited by
More filters
01 Jan 2002

9,314 citations

Book
11 Jan 2013
TL;DR: Outlier Analysis is a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit.
Abstract: With the increasing advances in hardware technology for data collection, and advances in software technology (databases) for data organization, computer scientists have increasingly participated in the latest advancements of the outlier analysis field. Computer scientists, specifically, approach this field based on their practical experiences in managing large amounts of data, and with far fewer assumptions the data can be of any type, structured or unstructured, and may be extremely large. Outlier Analysisis a comprehensive exposition, as understood by data mining experts, statisticians and computer scientists. The book has been organized carefully, and emphasis was placed on simplifying the content, so that students and practitioners can also benefit. Chapters will typically cover one of three areas: methods and techniques commonly used in outlier analysis, such as linear methods, proximity-based methods, subspace methods, and supervised methods; data domains, such as, text, categorical, mixed-attribute, time-series, streaming, discrete sequence, spatial and network data; and key applications of these methods as applied to diverse domains such as credit card fraud detection, intrusion detection, medical diagnosis, earth science, web log analytics, and social network analysis are covered.

1,278 citations

Journal ArticleDOI
TL;DR: This review paper begins at the definition of clustering, takes the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyzes the clustered algorithms from two perspectives, the traditional ones and the modern ones.
Abstract: Data analysis is used as a common method in modern science research, which is across communication science, computer science and biology science. Clustering, as the basic composition of data analysis, plays a significant role. On one hand, many tools for cluster analysis have been created, along with the information increase and subject intersection. On the other hand, each clustering algorithm has its own strengths and weaknesses, due to the complexity of information. In this review paper, we begin at the definition of clustering, take the basic elements involved in the clustering process, such as the distance or similarity measurement and evaluation indicators, into consideration, and analyze the clustering algorithms from two perspectives, the traditional ones and the modern ones. All the discussed clustering algorithms will be compared in detail and comprehensively shown in Appendix Table 22.

1,234 citations

Journal ArticleDOI
TL;DR: Deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process.
Abstract: Machine learning is a technique for recognizing patterns that can be applied to medical images. Although it is a powerful tool that can help in rendering medical diagnoses, it can be misapplied. Machine learning typically begins with the machine learning algorithm system computing the image features that are believed to be of importance in making the prediction or diagnosis of interest. The machine learning algorithm system then identifies the best combination of these image features for classifying the image or computing some metric for the given image region. There are several methods that can be used, each with different strengths and weaknesses. There are open-source versions of most of these machine learning methods that make them easy to try and apply to images. Several metrics for measuring the performance of an algorithm exist; however, one must be aware of the possible associated pitfalls that can result in misleading metrics. More recently, deep learning has started to be used; this method has the benefit that it does not require image feature identification and calculation as a first step; rather, features are identified as part of the learning process. Machine learning has been used in medical imaging and will have a greater influence in the future. Those working in medical imaging must be aware of how machine learning works. ©RSNA, 2017.

870 citations