scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: The outcome of this study favors the usage of effective yet computationally cheap feature engineering methods such as EDA; for other building energy data mining problems, the method proposed in this study still holds important implications since it provides a starting point where efficient feature engineering and machine learning models could be further developed.

89 citations

Journal ArticleDOI
TL;DR: The steps in a data mining project include: integrating and cleaning or modifying the data sources, mining the data, examining and pruning the mining results, and reporting the final results.
Abstract: Data mining is such a hot topic that it has become an obscured buzzword. Data mining can be a powerful tool for extracting useful information from tons of data. But it can just as easily extract erroneous and useless information if it's not used correctly. Key to avoiding the pitfalls is a basic understanding of what data mining is and what things to consider in planning a data mining project. The steps in a data mining project include: integrating and cleaning or modifying the data sources, mining the data, examining and pruning the mining results, and reporting the final results.

89 citations

Proceedings ArticleDOI
27 Jun 2006
TL;DR: This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text), and shows how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools.
Abstract: This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning.

89 citations

Proceedings ArticleDOI
10 Mar 2009
TL;DR: This paper model social networks as undirected graphs and formally define privacy models, attack models for the anonymization problem, in particular an i-hop degree-based anonymizationProblem, and presents two new and efficient clustering methods for undirecting graphs: bounded t-means clustering and union-split clustering algorithms that group similar graph nodes into clusters with a minimum size constraint.
Abstract: Knowledge discovery on social network data can uncover latent social trends and produce valuable findings that benefit the welfare of the general public. A growing amount of research finds that social networks play a surprisingly powerful role in people's behaviors. Before the social network data can be released for research purposes, the data needs to be anonymized to prevent potential re-identification attacks. Most of the existing anonymization approaches were developed for relational data, and cannot be used to handle social network data directly.In this paper, we model social networks as undirected graphs and formally define privacy models, attack models for the anonymization problem, in particular an i-hop degree-based anonymization problem, i.e., the adversary's prior knowledge includes the target's degree and the degrees of neighbors within i hops from the target. We present two new and efficient clustering methods for undirected graphs: bounded t-means clustering and union-split clustering algorithms that group similar graph nodes into clusters with a minimum size constraint. These clustering algorithms are contributions beyond the specific social network problems studied and can be used to cluster general data types besides graph vertices. We also develop a simple-yet-effective inter-cluster matching method for anonymizing social networks by strategically adding and removing edges based on nodes' social roles. We carry out a series of experiments to evaluate the graph utilities of the anonymized social networks produced by our algorithms.

89 citations

Journal ArticleDOI
TL;DR: A framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality is presented and a security analysis is presented for measuring the security level of a data set.
Abstract: During the whole process of data mining (from data collection to knowledge discovery) various sensitive data get exposed to several parties including data collectors, cleaners, preprocessors, miners and decision makers. The exposure of sensitive data can potentially lead to breach of individual privacy. Therefore, many privacy preserving techniques have been proposed recently. In this paper we present a framework that uses a few novel noise addition techniques for protecting individual privacy while maintaining a high data quality. We add noise to all attributes, both numerical and categorical. We present a novel technique for clustering categorical values and use it for noise addition purpose. A security analysis is also presented for measuring the security level of a data set.

89 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683