scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
01 Nov 2010
TL;DR: The most relevant studies carried out in educational data mining to date are surveyed and the different groups of user, types of educational environments, and the data they provide are described.
Abstract: Educational data mining (EDM) is an emerging interdisciplinary research area that deals with the development of methods to explore data originating in an educational context. EDM uses computational approaches to analyze educational data in order to study educational questions. This paper surveys the most relevant studies carried out in this field to date. First, it introduces EDM and describes the different groups of user, types of educational environments, and the data they provide. It then goes on to list the most typical/common tasks in the educational environment that have been resolved through data-mining techniques, and finally, some of the most promising future lines of research are discussed.

1,723 citations

Book
17 Dec 1999
TL;DR: The CommonKADS methodology, developed over the last decade by an industry-university consortium led by the authors, is used and makes as much use as possible of the new UML notation standard.
Abstract: The book covers in an integrated fashion the complete route from corporate knowledge management, through knowledge analysis andengineering, to the design and implementation of knowledge-intensiveinformation systems. The disciplines of knowledge engineering and knowledge management are closely tied. Knowledge engineering deals with the development of information systems in which knowledge and reasoning play pivotal roles. Knowledge management, a newly developed field at the intersection of computer science and management, deals with knowledge as a key resource in modern organizations. Managing knowledge within an organization is inconceivable without the use of advanced information systems; the design and implementation of such systems pose great organization as well as technical challenges. The book covers in an integrated fashion the complete route from corporate knowledge management, through knowledge analysis and engineering, to the design and implementation of knowledge-intensive information systems. The CommonKADS methodology, developed over the last decade by an industry-university consortium led by the authors, is used throughout the book. CommonKADS makes as much use as possible of the new UML notation standard. Beyond information systems applications, all software engineering and computer systems projects in which knowledge plays an important role stand to benefit from the CommonKADS methodology.

1,720 citations

Proceedings ArticleDOI
24 Aug 2014
TL;DR: The Knowledge Vault is a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories that computes calibrated probabilities of fact correctness.
Abstract: Recent years have witnessed a proliferation of large-scale knowledge bases, including Wikipedia, Freebase, YAGO, Microsoft's Satori, and Google's Knowledge Graph. To increase the scale even further, we need to explore automatic methods for constructing knowledge bases. Previous approaches have primarily focused on text-based extraction, which can be very noisy. Here we introduce Knowledge Vault, a Web-scale probabilistic knowledge base that combines extractions from Web content (obtained via analysis of text, tabular data, page structure, and human annotations) with prior knowledge derived from existing knowledge repositories. We employ supervised machine learning methods for fusing these distinct information sources. The Knowledge Vault is substantially bigger than any previously published structured knowledge repository, and features a probabilistic inference system that computes calibrated probabilities of fact correctness. We report the results of multiple studies that explore the relative utility of the different information sources and extraction methods.

1,657 citations

Book
18 Nov 2004
TL;DR: The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.
Abstract: The second edition of a highly praised, successful reference on data mining, with thorough coverage of big data applications, predictive analytics, and statistical analysis.Includes new chapters on Multivariate Statistics, Preparing to Model the Data, and Imputation of Missing Data, and an Appendix on Data Summarization and VisualizationOffers extensive coverage of the R statistical programming languageContains 280 end-of-chapter exercisesIncludes a companion website with further resources for all readers, and Powerpoint slides, a solutions manual, and suggested projects for instructors who adopt the book

1,637 citations

Journal ArticleDOI
TL;DR: Efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the association mining task are presented and the effect of using different database layout schemes combined with the proposed decomposition and traverse techniques are presented.
Abstract: Association rule discovery has emerged as an important problem in knowledge discovery and data mining. The association mining task consists of identifying the frequent itemsets, and then forming conditional implication rules among them. We present efficient algorithms for the discovery of frequent itemsets which forms the compute intensive phase of the task. The algorithms utilize the structural properties of frequent itemsets to facilitate fast discovery. The items are organized into a subset lattice search space, which is decomposed into small independent chunks or sublattices, which can be solved in memory. Efficient lattice traversal techniques are presented which quickly identify all the long frequent itemsets and their subsets if required. We also present the effect of using different database layout schemes combined with the proposed decomposition and traversal techniques. We experimentally compare the new algorithms against the previous approaches, obtaining improvements of more than an order of magnitude for our test databases.

1,637 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683