TL;DR: Improvement in the kmean clustering algorithm will be proposed which can define number of clusters automatically and assign required cluster to un-clustered points and will leads to improvement in accuracy and reduce clustering time by the member assigned to the cluster to predict cancer.
Abstract: Clustering is technique which is used to analyze the data in efficient manner and generate required information. To cluster the dataset, there is a technique named k-mean, is applied which is based on central point selection and calculation of Euclidian Distance. Here in k-mean, dataset will be loaded and from the dataset. Central points are selected using the formulae Euclidian distance and on the basis of Euclidian distance points are assigned to the clusters. The main disadvantage of k-mean is of accuracy, as in k-mean clustering user needs to define number of clusters. Because of user defined number of clusters, some points of the dataset are remained un-clustered. In this work, improvement in the kmean clustering algorithm will be proposed which can define number of clusters automatically and assign required cluster to un-clustered points. The proposed improvement will leads to improvement in accuracy and reduce clustering time by the member assigned to the cluster to predict cancer.
TL;DR: Pregnant women make quick decisions in case of miscarriage or probable miscarriage is predicted by creating a real time system prediction of miscarriage using wearable healthcare sensors, mobile tools, data mining algorithms and big data technologies.
Abstract: Mobile phone and sensors have become very useful to understand and analyze human lifestyle because of the huge amount of data they can collect every second. This triggered the idea of combining benefits and advantages of reality mining, machine learning and big data predictive analytics tools, applied to smartphones/sensors real time. The main goal of our study is to build a system that interacts with mobile phones and wearable healthcare sensors to predict patterns. Wearable healthcare sensors (heart rate sensor, temperature sensor and activity sensor) and mobile phone are used for gathering real time data. All sensors are managed using IoT systems; we used Arduino for collecting data from health sensors and Raspberry Pi 3 for programming and processing. Kmeans clustering algorithm is used for patterns prediction and predicted clusters (partitions) are transmitted to the user in his front-end interface in the mobile application. Real world data and clustering validation statistics (Elbow method and Silhouette method) are used to validate the proposed system and assess its performance and effectiveness. All data management and processing tasks are conducted over Apache Spark Databricks. This system relies on real time gathered data and can be applied to any prediction case making use of sensors and mobile generated data. As a proof of concept, we worked on predicting miscarriages to help pregnant women make quick decisions in case of miscarriage or probable miscarriage by creating a real time system prediction of miscarriage using wearable healthcare sensors, mobile tools, data mining algorithms and big data technologies. 9 risk factors contribute vastly in prediction, the Elbow method asserts that the optimal number of cluster is 2 and we achieve a higher value (0, 95) of Silhouette width that validates the good matching between clusters and observations. K-means algorithm gives good results in clustering the data.
TL;DR: The genetic algorithm model has the best performance, and effective regional segmentation based on the auction appraisal price improves the predictive accuracy.
Abstract: The real estate auction market has become increasingly important in the financial, economic and investment fields, but few artificial intelligence-based studies have attempted to forecast the auction prices of real estate. The purpose of this study is to develop forecasting models of real estate auction prices using artificial intelligence and statistical methodologies. The forecasting models are developed through a regression model, an artificial neural network and a genetic algorithm. For empirical analysis, we use Seoul apartment auction data from 2013 to 2017 to predict the auction prices and compare the forecasting accuracy of the models. The genetic algorithm model has the best performance, and effective regional segmentation based on the auction appraisal price improves the predictive accuracy.
TL;DR: In this article, the authors compared Naive Bayes and C.45 algorithms for credit card submission cases at a bank and showed that Naive-Bayes algorithm is better than C45 algorithm.
Abstract: Pada paper ini, telah diterapkan metode Naive Bayes serta C.45 ke dalam 4 buah studi kasus, yaitu kasus penerimaan “Kartu Indonesia Sehat”, penentuan pengajuan kartu kredit di sebuah bank, penentuan usia kelahiran, serta penentuan kelayakan calon anggota kredit pada koperasi untuk mengetahui algoritma terbaik di setiap kasus . Setelah itu, dilakukan perbandingan dalam hal Precision , Recall serta Accuracy untuk setiap data training dan data testing yang telah diberikan. Dari hasil implementasi yang dilakukan, telah dibangun sebuah aplikasi yang dapat menerapkan algoritma Naive Bayes dan C.45 di 4 buah kasus tersebut. Aplikasi telah diuji dengan blackbox dan algoritma dengan hasil valid dan dapat mengimplementasikan kedua buah algoritma dengan benar. Berdasarkan hasil pengujian, semakin banyaknya data training yang digunakan, maka nilai precision, recall dan accuracy akan semakin meningkat. Selain itu, hasil klasifikasi pada algoritma Naive Bayes dan C.45 tidak dapat memberikan nilai yang absolut atau mutlak di setiap kasus. Pada kasus penentuan penerimaan Kartu Indonesia Sehat, kedua buah algoritma tersebut sama-sama efektif untuk digunakan. Untuk kasus pengajuan kartu kredit di sebuah bank, C.45 lebih baik daripada Naive Bayes. Pada kasus penentuan usia kelahiran, Naive Bayes lebih baik daripada C.45. Sedangkan pada kasus penentuan kelayakan calon anggota kredit di koperasi, Naive Bayes memberikan nilai yang lebih baik pada precision, tapi untuk recall dan accuracy, C.45 memberikan hasil yang lebih baik. Sehingga untuk menentukan algoritma terbaik yang akan dipakai di sebuah kasus, harus melihat kriteria, variable maupun jumlah data di kasus tersebut. Abstract In this paper, applied Naive Bayes and C.45 into 4 case studies, namely the case of acceptance of “Kartu Indonesia Sehat”, determination of credit card application in a bank, determination of birth age, and determination of eligibility of prospective members of credit to Koperasi to find out the best algorithm in each case. After that, the comparison in Precision, Recall and Accuracy for each training data and data testing has been given. From the results of the implementation, has built an application that can apply the Naive Bayes and C.45 algorithm in 4 cases. Applications have been tested in blackbox and algorithms with valid results and can implement both algorithms correctly. Based on the test results, the more training data used, the value of precision, recall and accuracy will increase. The classification results of Naive Bayes and C.45 algorithms can not provide absolute value in each case. In the case of determining the acceptance of the Kartu Indonesia Indonesia, the two algorithms are equally effective to use. For credit card submission cases at a bank, C.45 is better than Naive Bayes. In the case of determining the age of birth, Naive Bayes is better than C.45. Whereas in the case of determining the eligibility of prospective credit members in the cooperative, Naive Bayes provides better value in precision, but for recall and accuracy, C.45 gives better results. So, to determine the best algorithm to be used in a case, it must look at the criteria, variables and amount of data in the case
18 citations
Cites background from "Improved K-mean Clustering Algorith..."
...Bansar, Sharma & Goel (2017) menyatakan klasifikasi adalah sebuah teknik untuk menentukan keanggotaan kelompok berdasarkan data-data yang sudah ada....
TL;DR: In this paper, a new technology developed in recent years, data mining is used to discover the valuable and potential knowledge hidden behind the data and provide strong support for scientistic research.
Abstract: Data mining is a new technology developed in recent years. Through data mining, people can discover the valuable and potential knowledge hidden behind the data and provide strong support for scient...
TL;DR: This study has used a large-scale real-world data set to identify the efficiency of clustering technique to improve the classification model and found that applying K-means clustering prior to KNN model helps in reducing the computation time.
Abstract: Product classification is the key issue in e-commerce domains. Many products are released to the market rapidly and to select the correct category in taxonomy for each product has become a challenging task. The application of classification model is useful to precisely classify the products. The study proposed a method to apply clustering prior to classification. This study has used a large-scale real-world data set to identify the efficiency of clustering technique to improve the classification model. The conventional text classification procedures are used in the study such as preprocessing, feature extraction and feature selection before applying the clustering technique. Results show that clustering technique improves the accuracy of the classification model. The best classification model for all three approaches which are classification model only, classification with hierarchical clustering and classification with K-means clustering is K-Nearest Neighbor (KNN) model. Even though the accuracy of the KNN models are the same across different approaches but the KNN model with K-means clustering had the shortest time of execution. Hence, applying K-means clustering prior to KNN model helps in reducing the computation time.
17 citations
Cites background or methods from "Improved K-mean Clustering Algorith..."
...Previous studies often utilized these techniques to enhance the performance of their classification models [13], [14],[20]-[22]....
[...]
...Recent studies have shown that the combination of classification and clustering models can provide better classification result [13],[14]....
TL;DR: This paper presents a simple validity measure based on the intra-clusters and inter-cluster distance measures which allows the number of clusters to be determined automatically and is tested for synthetic images for which theNumber of clusters in known, and is also implemented for natural images.
Abstract: The main disadvantage of the k-means algorithm is that the number of clusters, K, must be supplied as a parameter. In this paper we present a simple validity measure based on the intra-cluster and inter-cluster distance measures which allows the number of clusters to be determined automatically. The basic procedure involves producing all the segmented images for 2 clusters up to Kmax clusters, where Kmax represents an upper limit on the number of clusters. Then our validity measure is calculated to determine which is the best clustering by finding the minimum value for our measure. The validity measure is tested for synthetic images for which the number of clusters in known, and is also implemented for natural images.
649 citations
"Improved K-mean Clustering Algorith..." refers methods in this paper
...These clustering methods are work well for finding spherical –shaped clusters in small to medium size databases [5]....
TL;DR: A method for making the k-means clustering algorithm more effective and efficient, so as to get better clustering with reduced complexity is proposed.
Abstract: Emergence of modern techniques for scientific data collection has resulted in large scale accumulation of data per- taining to diverse fields. Conventional database querying methods are inadequate to extract useful information from huge data banks. Cluster analysis is one of the major data analysis methods and the k-means clustering algorithm is widely used for many practical applications. But the original k-means algorithm is computationally expensive and the quality of the resulting clusters heavily depends on the selection of initial centroids. Several methods have been proposed in the literature for improving the performance of the k-means clustering algorithm. This paper proposes a method for making the algorithm more effective and efficient, so as to get better clustering with reduced complexity.
TL;DR: In this paper, appropriate and efficient networks for breast cancer knowledge discovery from clinically collected data sets are investigated and principal component techniques are used to reduce the dimension of data and find appropriate networks.
Abstract: In this paper, appropriate and efficient networks for breast cancer knowledge discovery from clinically collected data sets are investigated. Invoking various data mining techniques, it is desired to find out the percentage of disease development, using the developed network. The results, help in choosing a reasonable treatment of the patient. Several neural network structures are evaluated for this investigation. The performance of the statistical neural network structures, self organizing map(SOM), radial basis function network (RBF), general regression neural network (GRNN) and probabilistic neural network (PNN) are tested both on the Wisconsin breast cancer data (WBCD) and on the Shiraz Namazi Hospital breast cancer data (NHBCD). To overcome the problem of high dimension of the data set and realizing the correlated nature of the data, principal component techniques are used to reduce the dimension of data and find appropriate networks. The results are quite satisfactory while presenting a comparison of effectiveness each proposed network for such problems.
234 citations
"Improved K-mean Clustering Algorith..." refers methods in this paper
...In this paper [12] they presented an analysis of the prediction of survivability rate of breast cancer patients using data mining techniques....
TL;DR: A new method is proposed for finding the better initial centroids and to provide an efficient way of assigning the data points to suitable clusters with reduced time complexity.
Abstract: Cluster analysis is one of the primary data analysis methods and k-means is one of the most well known popular clustering algorithms. The k-means algorithm is one of the frequently used clustering method in data mining, due to its performance in clustering massive data sets. The final clustering result of the k- means clustering algorithm greatly depends upon the correctness of the initial centroids, which are selected randomly. The original k-means algorithm converges to local minimum, not the global optimum. Many improvements were already proposed to improve the performance of the k-means, but most of these require additional inputs like threshold values for the number of data points in a set. In this paper a new method is proposed for finding the better initial centroids and to provide an efficient way of assigning the data points to suitable clusters with reduced time complexity. According to our experimental results, the proposed algorithm has the more accuracy with less computational time comparatively original k-means clustering algorithm.
167 citations
"Improved K-mean Clustering Algorith..." refers methods in this paper
...It is a fast method and is independent of the number of data objects and depends only on the number of cells in each dimension in the quantized space [7]....
TL;DR: K-mean clustering algorithm was combined with the deterministic model to analyze the students’ results of a private Institution in Nigeria which is a good benchmark to monitor the progression of academic performance of students in higher Institution for the purpose of making an effective decision by the academic planners.
Abstract: The ability to monitor the progress of students academic performance is a critical issue to the academic community of higher learning. A system for analyzing students results based on cluster analysis and uses standard statistical algorithms to arrange their scores data according to the level of their performance is described. In this paper, we also implemented k mean clustering algorithm for analyzing students result data. The model was combined with the deterministic model to analyze the students results of a private Institution in Nigeria which is a good benchmark to monitor the progression of academic performance of students in higher Institution for the purpose of making an effective decision by the academic planners.
In k-means clustering for functional data, the most relevant function is determined using probability and Euclidean distance formula, and functions are clustered based on majority voting.