scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Book ChapterDOI
01 Jan 2004
TL;DR: This paper discusses modeling, representation and computation or validation of three types of complex semantic relationships: using predefined multi-ontology relationships for query processing and virtual relationships based on a set of patterns and paths between entities of interest.
Abstract: The primary goal of today's search and browsing techniques is to find relevant documents. As the current web evolves into the next generation termed the Semantic Web, the emphasis will shift from finding documents to finding facts, actionable information, and insights. Improving ability to extract facts, mainly in the form of entities, embedded within documents leads to the fundamental challenge of discovering relevant and interesting relationships amongst the entities that these documents describe. Relationships are fundamental to semantics—to associate meanings to words, terms and entities. They are a key to new insights. Knowledge discovery is also about discovery of heretofore new relationships. The Semantic Web seeks to associate annotations (i.e., metadata), primarily consisting of based on concepts (often representing entities) from one or more ontologies/vocabularies with all Web-accessible resources such that programs can associate "meaning with data". Not only it supports the goal of automatic interpretation and processing (access, invoke, utilize, and analyze), it also enables improvements in scalability compared to approaches that are not semantics-based. Identification, discovery, validation and utilization of relationships (such as during query evaluation), will be a critical computation on the Semantic Web. Based on our research over the last decade, this paper takes an empirical look at various types of simple and complex relationships, what is captured and how they are represented, and how they are identified, discovered or validated, and exploited. These relationships may be based only on what is contained in or directly derived from data (direct content based relationships), or may be based on information extraction, external and prior knowledge and user defined computations (content descriptive relationships). We also present some recent techniques for discovering indirect (i.e., transitive) and virtual (i.e., user-defined) yet meaningful (i.e., contextually relevant) relationships based on a set of patterns and paths between entities of interest. In particular, we will discuss modeling, representation and computation or validation of three types of complex semantic relationships: (a) using predefined multi-ontology relationships for query processing and

281 citations

Patent
Ronald M. Swartz1, Jeffrey L. Winkler1, Evelyn A. Janos1, Igor Markidan1, Qun Dou1 
29 Jun 1998
TL;DR: In this paper, the authors present a method and apparatus for first integrating the operation of various independent software applications directed to the management of information within an enterprise, which is an expandable architecture with built-in knowledge integration features that facilitate the monitoring of information flow into, out of, and between the integrated information management applications.
Abstract: The present invention is a method and apparatus for first integrating the operation of various independent software applications directed to the management of information within an enterprise. The system architecture is, however, an expandable architecture, with built-in knowledge integration features that facilitate the monitoring of information flow into, out of, and between the integrated information management applications so as to assimilate knowledge information and facilitate the control of such information. Also included are additional tools which, using the knowledge information enable the more efficient use of the knowledge within an enterprise, including the ability to develop a context for and visualization of such knowledge.

280 citations

Proceedings Article
01 Jan 2012
TL;DR: This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20 th European Symposium on Artificial Neural Networks, Computational In- telligence and Machine Learning, with an overview of the context of wider research on interpretability of machine learning models.
Abstract: Data of different levels of complexity and of ever growing diversity of characteristics are the raw materials that machine learning practitioners try to model using their wide palette of methods and tools. The obtained models are meant to be a synthetic representation of the available, observed data that captures some of their intrinsic regularities or patterns. Therefore, the use of machine learning techniques for data analysis can be understood as a problem of pattern recognition or, more informally, of knowledge discovery and data mining. There exists a gap, though, between data modeling and knowledge extraction. Models, de- pending on the machine learning techniques employed, can be described in diverse ways but, in order to consider that some knowledge has been achieved from their description, we must take into account the human cog- nitive factor that any knowledge extraction process entails. These models as such can be rendered powerless unless they can be interpreted ,a nd the process of human interpretation follows rules that go well beyond techni- cal prowess. For this reason, interpretability is a paramount quality that machine learning methods should aim to achieve if they are to be applied in practice. This paper is a brief introduction to the special session on interpretable models in machine learning, organized as part of the 20 th European Symposium on Artificial Neural Networks, Computational In- telligence and Machine Learning. It includes a discussion on the several works accepted for the session, with an overview of the context of wider research on interpretability of machine learning models.

280 citations

01 Jan 2010
TL;DR: The potential use of classification based data mining techniques such as Rule based, Decision tree, Naive Bayes and Artificial Neural Network to massive volume of healthcare data is examined.
Abstract: The healthcare environment is generally perceived as being 'information rich' yet 'knowledge poor'. There is a wealth of data available within the healthcare systems. However, there is a lack of effective analysis tools to discover hidden relationships and trends in data. Knowledge discovery and data mining have found numerous applications in business and scientific domain. Valuable knowledge can be discovered from application of data mining techniques in healthcare system. In this study, we briefly examine the potential use of classification based data mining techniques such as Rule based, Decision tree, Naive Bayes and Artificial Neural Network to massive volume of healthcare data. The healthcare industry collects huge amounts of healthcare data which, unfortunately, are not "mined" to discover hidden information. For data preprocessing and effective decision making One Dependency Augmented Naive Bayes classifier (ODANB) and naive credal classifier 2 (NCC2) are used. This is an extension of naive Bayes to imprecise probabilities that aims at delivering robust classifications also when dealing with small or incomplete data sets. Discovery of hidden patterns and relationships often goes unexploited. Using medical profiles such as age, sex, blood pressure and blood sugar it can predict the likelihood of patients getting a heart disease. It enables significant knowledge, e.g. patterns, relationships between medical factors related to heart disease, to be established.

279 citations

Journal ArticleDOI
01 Feb 2000
TL;DR: WaveCluster is proposed, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements and can effectively identify arbitrarily shaped clusters at different degrees of detail.
Abstract: Many applications require the management of spatial data in a multidimensional feature space. Clustering large spatial databases is an important problem, which tries to find the densely populated regions in the feature space to be used in data mining, knowledge discovery, or efficient information retrieval. A good clustering approach should be efficient and detect clusters of arbitrary shape. It must be insensitive to the noise (outliers) and the order of input data. We propose WaveCluster, a novel clustering approach based on wavelet transforms, which satisfies all the above requirements. Using the multiresolution property of wavelet transforms, we can effectively identify arbitrarily shaped clusters at different degrees of detail. We also demonstrate that WaveCluster is highly efficient in terms of time complexity. Experimental results on very large datasets are presented, which show the efficiency and effectiveness of the proposed approach compared to the other recent clustering methods.

279 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683