scispace - formally typeset
Search or ask a question
JournalISSN: 0974-9683

Data mining and knowledge engineering 

About: Data mining and knowledge engineering is an academic journal. The journal publishes majorly in the area(s): Association rule learning & Cluster analysis. It has an ISSN identifier of 0974-9683. Over the lifetime, 192 publications have been published receiving 5137 citations.


Papers
More filters
Journal Article
TL;DR: Data mining is the search for new, valuable, and nontrivial information in large volumes of data, a cooperative effort of humans and computers that is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining which produces new, nontrivials information based on the available data set.
Abstract: Understand the need for analyses of large, complex, information-rich data sets. Identify the goals and primary tasks of the data-mining process. Describe the roots of data-mining technology. Recognize the iterative character of a data-mining process and specify its basic steps. Explain the influence of data quality on a data-mining process. Establish the relation between data warehousing and data mining. Data mining is an iterative process within which progress is defined by discovery, through either automatic or manual methods. Data mining is most useful in an exploratory analysis scenario in which there are no predetermined notions about what will constitute an "interesting" outcome. Data mining is the search for new, valuable, and nontrivial information in large volumes of data. It is a cooperative effort of humans and computers. Best results are achieved by balancing the knowledge of human experts in describing problems and goals with the search capabilities of computers. In practice, the two primary goals of data mining tend to be prediction and description. Prediction involves using some variables or fields in the data set to predict unknown or future values of other variables of interest. Description, on the other hand, focuses on finding patterns describing the data that can be interpreted by humans. Therefore, it is possible to put data-mining activities into one of two categories: Predictive data mining, which produces the model of the system described by the given data set, or Descriptive data mining, which produces new, nontrivial information based on the available data set.

4,646 citations

Journal Article
TL;DR: In this article, a generic methodology for weather forecasting is proposed by the help of incremental K-means clustering algorithm, which is done based on the incremental air pollution database of west Bengal in the years of 2009 and 2010.
Abstract: Clustering has wide application areas in several research fields. Clustering is a powerful tool which has been used in several forecasting works, such as time series forecasting, real time storm detection, flood forecasting and so on. In this paper, a generic methodology for weather forecasting is proposed by the help of incremental K-means clustering algorithm. Weather forecasting plays an important role in day to day applications. Weather forecasting of this paper is done based on the incremental air pollution database of west Bengal in the years of 2009 and 2010. This paper generally uses typical K-means clustering on the main air pollution database and a list of weather category will be developed based on the maximum mean values of the clusters.Now when the new data are coming, the incremental K-means is used to group those data into those clusters whose weather category has been already defined. Thus it builds up a strategy to predict the weather of the upcoming data of the upcoming days. This forecasting database is totally based on the weather of west Bengal and this forecasting methodology is developed to mitigating the impacts of air pollutions and launch focused modeling computations for prediction and forecasts of weather events. Here accuracy of this approach is also measured.

24 citations

Journal Article
TL;DR: This review concentrates on improving the performance of Apriori, generating interesting Association rules using large databases, Quantitative Association rule mining and optimizing the Association rules.
Abstract: Association rule mining is a popular and well researched method to discover interesting relations between the itemsets in large databases. Association rules show attributes value conditions that occur frequently together in a given dataset. Mining Association rules from the databases has the overhead in generating interesting rules, which includes rare itemsets, mining interesting rules from large databases and generation of strong associations. This review concentrates on improving the performance of Apriori, generating interesting Association rules using large databases, Quantitative Association rule mining and optimizing the Association rules. It also states various techniques used in Association rule generation process.

16 citations

Journal Article
TL;DR: A model to evaluate collaborative inference based on the query sequences of collaborators and their task-sensitive collaboration levels and a technique to prevent multiple collaborative users from deriving sensitive information via inference is developed.
Abstract: Malicious users can exploit the correlation among data to infer sensitive information from a series of seemingly innocuous data accesses. Thus, we develop an inference violation detection system to protect sensitive data content. Based on data dependency, database schema and semantic knowledge. We constructed a semantic inference model (SIM) that represents the possible inference channels from any attribute to the pre-assigned sensitive attributes. The SIM is then instantiated to a semantic inference graph (SIG) for query-time inference violation detection. For a single user case, when a user poses a query, the detection system will examine his/her past query log and calculate the probability of inferring sensitive information. The query request will be denied if the inference probability exceeds the pre specified threshold. For multi-user cases, the users may share their query answers to increase the inference probability. Therefore, we develop a model to evaluate collaborative inference based on the query sequences of collaborators and their task-sensitive collaboration levels. Experimental studies reveal that information authoritativeness, communication fidelity and honesty in collaboration are three key factors that affect the level of achievable collaboration. An example is given to illustrate the use of the proposed technique to prevent multiple collaborative users from deriving sensitive information via inference.

15 citations

Journal Article
TL;DR: Some pre-processing techniques in Gujarati are introduced in this paper and it is shown that Gujarati is very rich in morphology, it gives rise to a very large number of word forms and feature spaces.
Abstract: Text mining is the process of obtaining interesting patterns or knowledge from text documents. The most often used type of data in the WWW is text. Text mining is used to extract interesting knowledge from unstructured text data. Pre-processing is a very important phase in the text mining process. Text mining framework includes two components, text refining and knowledge distillation. This paper is about pre-processing for text mining in English and Gujarati language. There is very less work done for text mining in Gujarati language. It is very challenging task as Gujarati is very rich in morphology, it gives rise to a very large number of word forms and feature spaces. Some pre-processing techniques in Gujarati are introduced in this paper.

15 citations

Network Information
Related Journals (5)
International Journal of Computer Applications
26.6K papers, 157.4K citations
80% related
Journal of Computer Science
2.7K papers, 23.5K citations
77% related
International journal of engineering research and technology
22.3K papers, 32.6K citations
75% related
International Journal of Computer Theory and Engineering
1.2K papers, 10.2K citations
74% related
Indian journal of science and technology
11.7K papers, 57.4K citations
74% related
Performance
Metrics
No. of papers from the Journal in previous years
YearPapers
20211
20201
20193
20183
20179
201616