scispace - formally typeset
Search or ask a question
JournalISSN: 2005-4270

International journal of database theory and application 

NADIA
About: International journal of database theory and application is an academic journal. The journal publishes majorly in the area(s): Cluster analysis & Cloud computing. It has an ISSN identifier of 2005-4270. Over the lifetime, 568 publications have been published receiving 2530 citations.

Papers published on a yearly basis

Papers
More filters
Journal ArticleDOI
TL;DR: This paper first categorize the documents using KNN based machine learning approach and then return the most relevant documents to solve the text categorization problem.
Abstract: Text Categorization (TC), also known as Text Classification, is the task of automatically classifying a set of text documents into different categories from a predefined set. If a document belongs to exactly one of the categories, it is a single-label classification task; otherwise, it is a multi-label classification task. TC uses several tools from Information Retrieval (IR) and Machine Learning (ML) and has received much attention in the last years from both researchers in the academia and industry developers. In this paper, we first categorize the documents using KNN based machine learning approach and then return the most relevant documents.

197 citations

Journal ArticleDOI
TL;DR: There is a strong relationship between learner’s behaviors and their academic achievement, and the proposed model based on data mining techniques with new data attributes/features, which are called student's behavioral features proves the reliability of this proposed model.
Abstract: Educational data mining has received considerable attention in the last few years. Many data mining techniques are proposed to extract the hidden knowledge from educational data. The extracted knowledge helps the institutions to improve their teaching methods and learning process. All these improvements lead to enhance the performance of the students and the overall educational outputs. In this paper, we propose a new student’s performance prediction model based on data mining techniques with new data attributes/features, which are called student’s behavioral features. These type of features are related to the learner’s interactivity with the e-learning management system. The performance of student’s predictive model is evaluated by set of classifiers, namely; Artificial Neural Network, Naive Bayesian and Decision tree. In addition, we applied ensemble methods to improve the performance of these classifiers. We used Bagging, Boosting and Random Forest (RF), which are the common ensemble methods used in the literature. The obtained results reveal that there is a strong relationship between learner’s behaviors and their academic achievement. The accuracy of the proposed model using behavioral features achieved up to 22.1% improvement comparing to the results when removing such features and it achieved up to 25.8% accuracy improvement using ensemble methods. By testing the model using newcomer students, the achieved accuracy is more than 80%. This result proves the reliability of the proposed model.

195 citations

Journal ArticleDOI
TL;DR: This work proposes to implement a typical decision tree algorithm, C4.5, using MapReduce programming model, and transforms the traditional algorithm into a series of Map and Reduce procedures, showing both time efficiency and scalability.
Abstract: Recent years have witness the development of cloud computing and the big data era, which brings up challenges to traditional decision tree algorithms. First, as the size of dataset becomes extremely big, the process of building a decision tree can be quite time consuming. Second, because the data cannot fit in memory any more, some computation must be moved to the external storage and therefore increases the I/O cost. To this end, we propose to implement a typical decision tree algorithm, C4.5, using MapReduce programming model. Specifically, we transform the traditional algorithm into a series of Map and Reduce procedures. Besides, we design some data structures to minimize the communication cost. We also conduct extensive experiments on a massive dataset. The results indicate that our algorithm exhibits both time efficiency and scalability.

145 citations

Journal ArticleDOI
TL;DR: The main aim of this paper is to extrapolate the various areas of SVM with a basis of understanding the technique and a comprehensive survey, while offering researchers a modernized picture of the depth and breadth in both the theory and applications.
Abstract: During the last two decades, a substantial amount of research efforts has been intended for support vector machine at the application of various data mining tasks. Data Mining is a pioneering and attractive research area due to its huge application areas and task primitives. Support Vector Machine (SVM) is playing a decisive role as it provides techniques those are especially well suited to obtain results in an efficient way and with a good level of quality. In this paper, we survey the role of SVM in various data mining tasks like classification, clustering, prediction, forecasting and others applications. In broader point of view, we have reviewed the number of research publications that have been contributed in various internationally reputed journals for the data mining applications and also suggested a possible no. of issues of SVM. The main aim of this paper is to extrapolate the various areas of SVM with a basis of understanding the technique and a comprehensive survey, while offering researchers a modernized picture of the depth and breadth in both the theory and applications.

107 citations

Journal ArticleDOI
TL;DR: An analysis of 10% of KDD cup’99 training dataset based on intrusion detection establishes a relationship between the attack types and the protocol used by the hackers, using clustered data.
Abstract: The KDD Cup 99 dataset has been the point of attraction for many researchers in the field of intrusion detection from the last decade. Many researchers have contributed their efforts to analyze the dataset by different techniques. Analysis can be used in any type of industry that produces and consumes data, of course that includes security. This paper is an analysis of 10% of KDD cup’99 training dataset based on intrusion detection. We have focused on establishing a relationship between the attack types and the protocol used by the hackers, using clustered data. Analysis of data is performed using k-means clustering; we have used the Oracle 10g data miner as a tool for the analysis of dataset and build 1000 clusters to segment the 494,020 records. The investigation revealed many interesting results about the protocols and attack types preferred by the hackers for intruding the networks. Keyword: KDD 99 dataset, clustering, k-means, intrusion detection

93 citations

Network Information
Related Journals (5)
Multimedia Tools and Applications
16K papers, 185.7K citations
79% related
Journal of Software
6.7K papers, 42.4K citations
75% related
International Journal of Computer Applications
26.6K papers, 157.4K citations
75% related
Ksii Transactions on Internet and Information Systems
2.8K papers, 23.5K citations
75% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
201726
2016240
2015159
2014105
201331
20122