scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: This paper proposes incremental approaches for updating approximations dynamically in set-valued ordered decision systems under the attribute generalization, which involves several modifications to relevant matrices without having to retrain from the start on all accumulated training data.

85 citations

Journal Article
TL;DR: This work proposes a novel relational clustering algorithm that uses both attribute and relational information for determining the underlying domain entities, and gives an efficient implementation and investigates the impact that different relational similarity measures have on entity resolution quality.
Abstract: Many databases contain uncertain and imprecise references to real-world entities. The absence of identifiers for the underlying entities often results in a database which contains multiple references to the same entity. This can lead not only to data redundancy, but also inaccuracies in query processing and knowledge extraction. These problems can be alleviated through the use of entity resolution. Entity resolution involves discovering the underlying entities and mapping each database reference to these entities. Traditionally, entities are resolved using pairwise similarity over the attributes of references. However, there is often additional relational information in the data. Specifically, references to different entities may cooccur. In these cases, collective entity resolution, in which entities for cooccurring references are determined jointly rather than independently, can improve entity resolution accuracy. We propose a novel relational clustering algorithm that uses both attribute and relational information for determining the underlying domain entities, and we give an efficient implementation. We investigate the impact that different relational similarity measures have on entity resolution quality. We evaluate our collective entity resolution algorithm on multiple real-world databases. We show that it improves entity resolution performance over both attribute-based baselines and over algorithms that consider relational information but do not resolve entities collectively. In addition, we perform detailed experiments on synthetically generated data to identify data characteristics that favor collective relational resolution over purely attribute-based algorithms.

85 citations

01 Jan 2000
TL;DR: This work discusses new privacy threats posed KDDM, which includes massive data collection, data warehouses, statistical analysis and deductive learning techniques, and uses vast amounts of data to generate hypotheses and discover general patterns.
Abstract: Recent developments in information technology have enabled collection and processing of vast amounts of personal data, such as criminal records, shopping habits, credit and medical history, and driving records. This information is undoubtedly very useful in many areas, including medical research, law enforcement and national security. However, there is an increasing public concern about the individuals' privacy. Privacy is commonly seen as the right of individuals to control information about themselves. The appearance of technology for Knowledge Discovery and Data Mining (KDDM) has revitalized concern about the following general privacy issues: • secondary use of the personal information, • handling misinformation, and • granulated access to personal information. They demonstrate that existing privacy laws and policies are well behind the developments in technology, and no longer offer adequate protection. We also discuss new privacy threats posed KDDM, which includes massive data collection, data warehouses, statistical analysis and deductive learning techniques. KDDM uses vast amounts of data to generate hypotheses and discover general patterns. KDDM poses the following new challenges to privacy.

85 citations

Journal ArticleDOI
TL;DR: Experimental results show that Cloud-Trust converges more rapidly and accurately than do existing approaches, thereby verifying that it can effectively take on trust measurement tasks in cloud computing.
Abstract: In cloud computing, trust management is more important than ever before in the use of information and communication technologies. Owing to the dynamic nature of the cloud, continuous monitoring on trust attributes is necessary to enforce service-level agreements. This study presents Cloud-Trust, an adaptive trust management model for efficiently evaluating the competence of a cloud service based on its multiple trust attributes. In Cloud-Trust, two kinds of adaptive modelling tools (rough set and induced ordered weighted averaging (IOWA) operator) are organically integrated and successfully applied to trust data mining and knowledge discovery. Using rough set to discover knowledge from trust attributes makes the model surpass the limitations of traditional models, in which weights are assigned subjectively. Moreover, Cloud-Trust uses the IOWA operator to aggregate the global trust degree based on time series, thereby enabling better real-time performance. Experimental results show that Cloud-Trust converges more rapidly and accurately than do existing approaches, thereby verifying that it can effectively take on trust measurement tasks in cloud computing.

85 citations

Proceedings Article
Tsuyoshi Ide, Keisuke Inoue1
01 Jan 2005
TL;DR: This paper introduces a new nonlinear transformation, singular spectrum transformation (SST), to address the problem of knowledge discovery of causal relationships from a set of time series and demonstrates that SST enables us to discover a hidden and useful dependency between variables.
Abstract: Most of the stream mining techniques presented so far have primary paid attention to discovering association rules by direct comparison between time-series data sets. However, their utility is very limited for heterogeneous systems, where time series of various types (discrete, continuous, oscillatory, noisy, etc.) act dynamically in a strongly correlated manner. In this paper, we introduce a new nonlinear transformation, singular spectrum transformation (SST), to address the problem of knowledge discovery of causal relationships from a set of time series. SST is a transformation that transforms a time series into the probability density function that represents a chance to observe some particular change. For an automobile data set, we demonstrate that SST enables us to discover a hidden and useful dependency between variables.

85 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683