Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.

...read moreread less

Papers published on a yearly basis

Papers

PDF

Open Access

More filters

Journal Article•DOI•

Handling class imbalance in customer churn prediction

[...]

Jonathan Burez¹, D Van den Poel¹•Institutions (1)

Ghent University¹

01 Apr 2009-Expert Systems With Applications

TL;DR: It is found that there is no need to under-sample so that there are as many churners in your training set as non churners, and under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC.

...read moreread less

Abstract: Customer churn is often a rare event in service industries, but of great interest and great value. Until recently, however, class imbalance has not received much attention in the context of data mining [Weiss, G. M. (2004). Mining with rarity: A unifying framework. SIGKDD Explorations, 6(1), 7-19]. In this study, we investigate how we can better handle class imbalance in churn prediction. Using more appropriate evaluation metrics (AUC, lift), we investigated the increase in performance of sampling (both random and advanced under-sampling) and two specific modelling techniques (gradient boosting and weighted random forests) compared to some standard modelling techniques. AUC and lift prove to be good evaluation metrics. AUC does not depend on a threshold, and is therefore a better overall evaluation metric compared to accuracy. Lift is very much related to accuracy, but has the advantage of being well used in marketing practice [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press]. Results show that under-sampling can lead to improved prediction accuracy, especially when evaluated with AUC. Unlike Ling and Li [Ling, C., & Li, C. (1998). Data mining for direct marketing problems and solutions. In Proceedings of the fourth international conference on knowledge discovery and data mining (KDD-98). New York, NY: AAAI Press], we find that there is no need to under-sample so that there are as many churners in your training set as non churners. Results show no increase in predictive performance when using the advanced sampling technique CUBE in this study. This is in line with findings of Japkowicz [Japkowicz, N. (2000). The class imbalance problem: significance and strategies. In Proceedings of the 2000 international conference on artificial intelligence (IC-AI'2000): Special track on inductive learning, Las Vegas, Nevada], who noted that using sophisticated sampling techniques did not give any clear advantage. Weighted random forests, as a cost-sensitive learner, performs significantly better compared to random forests, and is therefore advised. It should, however always be compared to logistic regression. Boosting is a very robust classifier, but never outperforms any other technique.

...read moreread less

462 citations

Journal Article•DOI•

Understanding, building and using ontologies

[...]

Nicola Guarino¹•Institutions (1)

National Research Council¹

01 Mar 1997-International Journal of Human-computer Studies \/ International Journal of Man-machine Studies

TL;DR: It is defended here the thesis of the independence between domain knowledge and problem-solving knowledge, arguing against the dominance of the so-called ‘‘interaction problem’’ mentioned in a recent paper to dispute the feasibility of a single domain ontology shared by a number of different applications.

...read moreread less

Abstract: I defend here the thesis of the independence between domain knowledge and problem-solving knowledge, arguing against the dominance of the so-called ‘‘interaction problem’’ mentioned in a recent paper by Van Heijst, Schreiber and Wielinga to dispute the feasibility of a single domain ontology shared by a number of different applications. The main point is that reusability across multiple tasks or methods can and should be systematically pursued even when modelling knowledge related to a single task or method. Under this view, I discuss how the principles of formal ontology and ontological engineering can be used in the practice of knowledge engineering, focusing in particular on the interplay between general ontologies, method ontologies and application ontologies, and on the role of ontologies in the knowledge engineering process. I will then stress the role of domain analysis, often absent in current methodologies for the development of knowledge-based systems.

...read moreread less

458 citations

Journal Article•DOI•

Self-labeled techniques for semi-supervised learning: taxonomy, software and empirical study

[...]

Isaac Triguero¹, Salvador García², Francisco Herrera¹•Institutions (2)

University of Granada¹, University of Jaén²

01 Feb 2015-Knowledge and Information Systems

TL;DR: This paper provides a survey of self-labeled methods for semi-supervised classification and proposes a taxonomy based on the main characteristics presented in them, aiming to measure their performance in terms of transductive and inductive classification capabilities.

...read moreread less

Abstract: Semi-supervised classification methods are suitable tools to tackle training sets with large amounts of unlabeled data and a small quantity of labeled data. This problem has been addressed by several approaches with different assumptions about the characteristics of the input data. Among them, self-labeled techniques follow an iterative procedure, aiming to obtain an enlarged labeled data set, in which they accept that their own predictions tend to be correct. In this paper, we provide a survey of self-labeled methods for semi-supervised classification. From a theoretical point of view, we propose a taxonomy based on the main characteristics presented in them. Empirically, we conduct an exhaustive study that involves a large number of data sets, with different ratios of labeled data, aiming to measure their performance in terms of transductive and inductive classification capabilities. The results are contrasted with nonparametric statistical tests. Note is then taken of which self-labeled models are the best-performing ones. Moreover, a semi-supervised learning module has been developed for the Knowledge Extraction based on Evolutionary Learning software, integrating analyzed methods and data sets.

...read moreread less

457 citations

Book Chapter•DOI•

A survey of evolutionary algorithms for data mining and knowledge discovery

[...]

Alex A. Freitas¹•Institutions (1)

Pontifícia Universidade Católica do Paraná¹

01 Jan 2003

TL;DR: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery, and discusses some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers.

...read moreread less

Abstract: This chapter discusses the use of evolutionary algorithms, particularly genetic algorithms and genetic programming, in data mining and knowledge discovery. We focus on the data mining task of classification. In addition, we discuss some preprocessing and postprocessing steps of the knowledge discovery process, focusing on attribute selection and pruning of an ensemble of classifiers. We show how the requirements of data mining and knowledge discovery influence the design of evolutionary algorithms. In particular, we discuss how individual representation, genetic operators and fitness functions have to be adapted for extracting high-level knowledge from data.

...read moreread less

452 citations

Journal Article•DOI•

Data mining in manufacturing: a review based on the kind of knowledge

[...]

Alok K. Choudhary¹, Jenny A. Harding¹, Manoj Kumar Tiwari²•Institutions (2)

Loughborough University¹, Indian Institutes of Technology²

01 Jan 2009-Journal of Intelligent Manufacturing

TL;DR: There is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years, and a review of the literature reveals the progressive applications and existing gaps identified.

...read moreread less

Abstract: In modern manufacturing environments, vast amounts of data are collected in database management systems and data warehouses from all involved areas, including product and process design, assembly, materials planning, quality control, scheduling, maintenance, fault detection etc. Data mining has emerged as an important tool for knowledge acquisition from the manufacturing databases. This paper reviews the literature dealing with knowledge discovery and data mining applications in the broad domain of manufacturing with a special emphasis on the type of functions to be performed on the data. The major data mining functions to be performed include characterization and description, association, classification, prediction, clustering and evolution analysis. The papers reviewed have therefore been categorized in these five categories. It has been shown that there is a rapid growth in the application of data mining in the context of manufacturing processes and enterprises in the last 3 years. This review reveals the progressive applications and existing gaps identified in the context of data mining in manufacturing. A novel text mining approach has also been used on the abstracts and keywords of 150 papers to identify the research gaps and find the linkages between knowledge area, knowledge type and the applied data mining tools and techniques.

...read moreread less

450 citations

Collapse

Network Information

Performance

Metrics

20,644

Papers

453,302

Citations

No. of papers in the topic in previous years
Year	Papers
2023	120
2022	285
2021	506
2020	660
2019	740
2018	683

Knowledge extraction

Papers published on a yearly basis

Papers

Trending Questions (10)

Network Information

Related Topics (5)

Performance

Metrics