scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: A rule induction system based on rough sets and attribute-oriented generalization is introduced and was applied to a database of congenital malformation to extract diagnostic rules and an expert system which makes a differential diagnosis on congenital disorders is developed.

123 citations

Journal ArticleDOI
TL;DR: A case study involving two problems involving the understanding of customer retention patterns by classifying policy holders as likely to renew or terminate their policies is presented and a variety of techniques within the methodology of data mining are solved.
Abstract: The insurance industry is concerned with many problems of interest to the operational research community. This paper presents a case study involving two such problems and solves them using a variety of techniques within the methodology of data mining. The first of these problems is the understanding of customer retention patterns by classifying policy holders as likely to renew or terminate their policies. The second is better understanding claim patterns, and identifying types of policy holders who are more at risk. Each of these problems impacts on the decisions relating to premium pricing, which directly affects profitability. A data mining methodology is used which views the knowledge discovery process within an holistic framework utilising hypothesis testing, statistics, clustering, decision trees, and neural networks at various stages. The impacts of the case study on the insurance company are discussed.

123 citations

Journal ArticleDOI
TL;DR: The results show that the integrated feature extraction approach, which is based on rough set theory and genetic algorithms, can remarkably reduce the cost and time consumed on product quality evaluation without compromising the overall specifications of the acceptance tests.

123 citations

Journal ArticleDOI
TL;DR: A time series data mining methodology for temporal knowledge discovery in big BAS data to identify dynamics, patterns and anomalies in building operations, derive temporal association rules within and between subsystems, assess building system performance and spot opportunities in energy conservation.

123 citations

Journal ArticleDOI
TL;DR: This article develops the DCM framework, which consists of data preprocessing, dual mining of positive and negative correlations, and finally matching construction, and develops a novel “ensemble” approach, which creates an ensemble of DCM matchers by randomizing the schema data into many trials and aggregating their ranked results by taking majority voting.
Abstract: To enable information integration, schema matching is a critical step for discovering semantic correspondences of attributes across heterogeneous sources. While complex matchings are common, because of their far more complex search space, most existing techniques focus on simple 1:1 matchings. To tackle this challenge, this article takes a conceptually novel approach by viewing schema matching as correlation mining, for our task of matching Web query interfaces to integrate the myriad databases on the Internet. On this “deep Web ” query interfaces generally form complex matchings between attribute groups (e.g., {author} corresponds to {first name, last name} in the Books domain). We observe that the co-occurrences patterns across query interfaces often reveal such complex semantic relationships: grouping attributes (e.g., {first name, last name}) tend to be co-present in query interfaces and thus positively correlated. In contrast, synonym attributes are negatively correlated because they rarely co-occur. This insight enables us to discover complex matchings by a correlation mining approach. In particular, we develop the DCM framework, which consists of data preprocessing, dual mining of positive and negative correlations, and finally matching construction. We evaluate the DCM framework on manually extracted interfaces and the results show good accuracy for discovering complex matchings. Further, to automate the entire matching process, we incorporate automatic techniques for interface extraction. Executing the DCM framework on automatically extracted interfaces, we find that the inevitable errors in automatic interface extraction may significantly affect the matching result. To make the DCM framework robust against such “noisy” schemas, we integrate it with a novel “ensemble” approach, which creates an ensemble of DCM matchers, by randomizing the schema data into many trials and aggregating their ranked results by taking majority voting. As a principled basis, we provide analytic justification of the robustness of the ensemble approach. Empirically, our experiments show that the “ensemblization” indeed significantly boosts the matching accuracy, over automatically extracted and thus noisy schema data. By employing the DCM framework with the ensemble approach, we thus complete an automatic process of matchings Web query interfaces.

123 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683