scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Book ChapterDOI
19 Apr 2009
TL;DR: An efficient tree-based data structure is used, called Periodic-frequent pattern tree (PF-tree in short), that captures the database contents in a highly compact manner and enables a pattern growth mining technique to generate the complete set of periodic-f frequent patterns in a database for user-given periodicity and support thresholds.
Abstract: Since mining frequent patterns from transactional databases involves an exponential mining space and generates a huge number of patterns, efficient discovery of user-interest-based frequent pattern set becomes the first priority for a mining algorithm. In many real-world scenarios it is often sufficient to mine a small interesting representative subset of frequent patterns. Temporal periodicity of pattern appearance can be regarded as an important criterion for measuring the interestingness of frequent patterns in several applications. A frequent pattern can be said periodic-frequent if it appears at a regular interval given by the user in the database. In this paper, we introduce a novel concept of mining periodic-frequent patterns from transactional databases. We use an efficient tree-based data structure, called Periodic-frequent pattern tree (PF-tree in short), that captures the database contents in a highly compact manner and enables a pattern growth mining technique to generate the complete set of periodic-frequent patterns in a database for user-given periodicity and support thresholds. The performance study shows that mining periodic-frequent patterns with PF-tree is time and memory efficient and highly scalable as well.

149 citations

Journal ArticleDOI
TL;DR: A novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order, which can be easily maintained when database transactions are inserted, deleted, and/or modified.
Abstract: Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating. Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern mining. In this paper, we propose a novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order. By exploiting its nice properties, the CanTree can be easily maintained when database transactions are inserted, deleted, and/or modified. For example, the CanTree does not require adjustment, merging, and/or splitting of tree nodes during maintenance. No rescan of the entire updated database or reconstruction of a new tree is needed for incremental updating. Experimental results show the effectiveness of our CanTree in the incremental mining of frequent patterns. Moreover, the applicability of CanTrees is not confined to incremental mining; CanTrees can also be applicable to other frequent-pattern mining tasks including constrained mining and interactive mining.

149 citations

01 Jan 1993
TL;DR: The study shows that knowledge discovery has wide applications in spatial databases, and relatively efficient algorithms can be developed for discovery of general knowledge in large spatial databases.
Abstract: Extraction of interesting and general knowledge from large spatial databases is an important task in the development of spatial dataand knowledge-base systems. In this paper, we investigate knowledge discovery in spatial databases and develop a generalization-based knowledge discovery mechanism which integrates attribute-oriented induction on nonspatial data and spatial merge and generalization on spatial data. The study shows that knowledge discovery has wide applications in spatial databases, and relatively efficient algorithms can be developed for discovery of general knowledge in large spatial databases.

148 citations

Journal ArticleDOI
TL;DR: It is believed that the evidence presented here shows that knowledge engineering has much to offer KM and can be the basis on which to move towards a Knowledge Technology.
Abstract: Knowledge Management (KM) is crucial to organizational survival, yet is a difficult task requiring large expenditure of resources. Information Technology solutions, such as email, document management and intranets, are proving very useful in certain areas. However, many important problems still exist, providing opportunities for new techniques and tools more oriented towards knowledge. We refer to this as Knowledge Technology. A framework has been developed which has allowed opportunities for Knowledge Technology to be identified in support of five key KM activities: personalization, creation/innovation, codification, discovery and capture/monitor. In developing Knowledge Technology for these areas, methods from knowledge engineering are being explored. Our main work in this area has involved the application and evaluation of existing knowledge for a large intranet system. This, and other case studies, have provided important lessons and insights which have led to ongoing research in ontologies, generic models and process modelling methods. We believe that the evidence presented here shows that knowledge engineering has much to offer KM and can be the basis on which to move towards a Knowledge Technology.

148 citations

Journal ArticleDOI
TL;DR: This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM, which constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others.
Abstract: Traditional Data Mining and Knowledge Discovery algorithms assume free access to data, either at a centralized location or in federated form. Increasingly, privacy and security concerns restrict this access, thus derailing data mining projects. What is required is distributed knowledge discovery that is sensitive to this problem. The key is to obtain valid results, while providing guarantees on the nondisclosure of data. Support vector machine classification is one of the most widely used classification methodologies in data mining and machine learning. It is based on solid theoretical foundations and has wide practical application. This paper proposes a privacy-preserving solution for support vector machine (SVM) classification, PP-SVM for short. Our solution constructs the global SVM classification model from data distributed at multiple parties, without disclosing the data of each party to others. Solutions are sketched out for data that is vertically, horizontally, or even arbitrarily partitioned. We quantify the security and efficiency of the proposed method, and highlight future challenges.

148 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683