scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Journal ArticleDOI
TL;DR: WARMR is presented, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem.
Abstract: Discovery of frequent patterns has been studied in a variety of data mining settings. In its simplest form, known from association rule mining, the task is to discover all frequent itemsets, i.e., all combinations of items that are found in a sufficient number of examples. The fundamental task of association rule and frequent set discovery has been extended in various directions, allowing more useful patterns to be discovered with special purpose algorithms. We present WARMR, a general purpose inductive logic programming algorithm that addresses frequent query discovery: a very general DATALOG formulation of the frequent pattern discovery problem. The motivation for this novel approach is twofold. First, exploratory data mining is well supported: WARMR offers the flexibility required to experiment with standard and in particular novel settings not supported by special purpose algorithms. Also, application prototypes based on WARMR can be used as benchmarks in the comparison and evaluation of new special purpose algorithms. Second, the unified representation gives insight to the blurred picture of the frequent pattern discovery domain. Within the DATALOG formulation a number of dimensions appear that relink diverged settings. We demonstrate the frequent query approach and its use on two applications, one in alarm analysis, and one in a chemical toxicology domain.

330 citations

Journal ArticleDOI
Qi Wu1, Chunhua Shen1, Peng Wang1, Anthony Dick1, Anton van den Hengel1 
TL;DR: A visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions and allows questions to be asked where the image alone does not contain the information required to select the appropriate answer.
Abstract: Much of the recent progress in Vision-to-Language problems has been achieved through a combination of Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). This approach does not explicitly represent high-level semantic concepts, but rather seeks to progress directly from image features to text. In this paper we first propose a method of incorporating high-level concepts into the successful CNN-RNN approach, and show that it achieves a significant improvement on the state-of-the-art in both image captioning and visual question answering. We further show that the same mechanism can be used to incorporate external knowledge, which is critically important for answering high level visual questions. Specifically, we design a visual question answering model that combines an internal representation of the content of an image with information extracted from a general knowledge base to answer a broad range of image-based questions. It particularly allows questions to be asked where the image alone does not contain the information required to select the appropriate answer. Our final model achieves the best reported results for both image captioning and visual question answering on several of the major benchmark datasets.

329 citations

Book
06 Oct 2002
TL;DR: This chapter discusses data mining techniques for managing Uncertainty in Rule-Based Systems, which involves Integrating Data Mining, Expert Systems, and Intelligent Agents.
Abstract: (Each Chapter concludes with a Chapter Summary, Key Terms, and Exercises.) Preface. I. DATA MINING FUNDAMENTALS. 1. Data Mining: A First View. Data Mining: A Definition. What Can Computers Learn? Is Data Mining Appropriate for my Problem? Expert Systems or Data Mining? A Simple Data Mining Process Model. Why not Simple Search? Data Mining Applications. 2. Data Mining: A Closer Look. Data Mining Strategies. Supervised Data Mining Techniques. Association Rules. Clustering Techniques. Evaluating Performance. 3. Basic Data Mining Techniques. Decision Trees. Generating Association Rules. The K-Means Algorithm. Genetic Learning. Choosing a Data Mining Technique. 4. An Excel-Based Data Mining Tool. The iData Analyzer. ESX: A Multipurpose Tool for Data Mining. iDAV Format for Data Mining. A Five-Step Approach for Unsupervised Clustering. A Six-Step Approach for Supervised Learning. Techniques for Generating Rules. Instance Typicality. Special Considerations and Features. II. TOOLS FOR KNOWLEDGE DISCOVERY. 5. Knowledge Discovery in Databases. A KDD Process Model. Step 1: Goal Identification. Step 2: Creating a Target Data Set. Step 3: Data Preprocessing. Step 4: Data Transformation. Step 5: Data Mining. Step 6: Interpretation and Evaluation. Step 7: Taking Action. The CRISP-DM Process Model. Experimenting with ESX. 6. The Data Warehouse. Operational Databases. Data Warehouse Design. On-line Analytical Processing (OLAP). Excel Pivot Tables for Data Analysis. 7. Formal Evaluation Techniques. What Should be Evaluated? Tools for Evaluation. Computing Test Set Confidence Intervals. Comparing Supervised Learner Models. Attribute Evaluation. Unsupervised Evaluation Techniques. Evaluating Supervised Models with Numeric Output. III. ADVANCED DATA MINING TECHNIQUES. 8. Neural Networks. Feed-Forward Neural Networks. Neural Network Training: A Conceptual View. Neural Network Explanation. General Considerations. Neural Network Learning: A Detailed View. 9. Building Neural Networks with iDA. A Four-Step Approach for Backpropagation Learning. A Four-Step Approach for Neural Network Clustering. ESX for Neural Network Cluster Analysis. 10. Statistical Techniques. Linear Regression Analysis. Logistic Regression. Bayes Classifier. Clustering Algorithms. Heuristics or Statistics? 11. Specialized Techniques. Time-Series Analysis. Mining the Web. Mining Textual Data. Improving Performance. IV. INTELLIGENT SYSTEMS. 12. Rule-Based Systems. Exploring Artificial Intelligence. Problem Solving as a State Space Search. Expert Systems. Structuring a Rule-Based System. 13. Managing Uncertainty in Rule-Based Systems. Uncertainty: Sources and Solutions. Fuzzy Rule-Based Systems. A Probability-Based Approach to Uncertainty. 14. Intelligent Agents. Characteristics of Intelligent Agents. Types of Agents. Integrating Data Mining, Expert Systems, and Intelligent Agents. Appendix. Appendix A: Software Installation. Appendix B: Datasets for Data Mining. Appendix C: Decision Tree Attribute Selection. Appendix D: Statistics for Performance Evaluation. Appendix E: Excel 97 Pivot Tables. Bibliography.

326 citations

Journal ArticleDOI
TL;DR: The results show that the evolutionary instance selection algorithms consistently outperform the nonevolutionary ones, the main advantages being: better instance reduction rates, higher classification accuracy, and models that are easier to interpret.
Abstract: Evolutionary algorithms are adaptive methods based on natural evolution that may be used for search and optimization As data reduction in knowledge discovery in databases (KDDs) can be viewed as a search problem, it could be solved using evolutionary algorithms (EAs) In this paper, we have carried out an empirical study of the performance of four representative EA models in which we have taken into account two different instance selection perspectives, the prototype selection and the training set selection for data reduction in KDD This paper includes a comparison between these algorithms and other nonevolutionary instance selection algorithms The results show that the evolutionary instance selection algorithms consistently outperform the nonevolutionary ones, the main advantages being: better instance reduction rates, higher classification accuracy, and models that are easier to interpret

325 citations

Journal ArticleDOI
TL;DR: The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery and the Reactome graph database use case shows the power of NoSQL database engines for complex biological data types.
Abstract: Reactome is a free, open-source, open-data, curated and peer-reviewed knowledgebase of biomolecular pathways One of its main priorities is to provide easy and efficient access to its high quality curated data At present, biological pathway databases typically store their contents in relational databases This limits access efficiency because there are performance issues associated with queries traversing highly interconnected data The same data in a graph database can be queried more efficiently Here we present the rationale behind the adoption of a graph database (Neo4j) as well as the new ContentService (REST API) that provides access to these data The Neo4j graph database and its query language, Cypher, provide efficient access to the complex Reactome data model, facilitating easy traversal and knowledge discovery The adoption of this technology greatly improved query efficiency, reducing the average query time by 93% The web service built on top of the graph database provides programmatic access to Reactome data by object oriented queries, but also supports more complex queries that take advantage of the new underlying graph-based data storage By adopting graph database technology we are providing a high performance pathway data resource to the community The Reactome graph database use case shows the power of NoSQL database engines for complex biological data types

324 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683