scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
BookDOI
01 Jan 2001
TL;DR: This volume serves as a comprehensive reference for graduate students, practitioners and researchers in KDD to report new developments and applications, to share hard-learned experiences in order to avoid similar pitfalls, and to shed light on the future development of instance selection.
Abstract: The ability to analyze and understand massive data sets lags far behind the ability to gather and store the data. To meet this challenge, knowledge discovery and data mining (KDD) is growing rapidly as an emerging field. However, no matter how powerful computers are now or will be in the future, KDD researchers and practitioners must consider how to manage ever-growing data which is, ironically, due to the extensive use of computers and ease of data collection with computers. Many different approaches have been used to address the data explosion issue, such as algorithm scale-up and data reduction. Instance, example, or tuple selection pertains to methods or algorithms that select or search for a representative portion of data that can fulfill a KDD task as if the whole data is used. Instance selection is directly related to data reduction and becomes increasingly important in many KDD applications due to the need for processing efficiency and/or storage efficiency. One of the major means of instance selection is sampling whereby a sample is selected for testing and analysis, and randomness is a key element in the process. Instance selection also covers methods that require search. Examples can be found in density estimation (finding the representative instances -- data points -- for a cluster); boundary hunting (finding the critical instances to form boundaries to differentiate data points of different classes); and data squashing (producing weighted new data with equivalent sufficient statistics). Other important issues related to instance selection extend to unwanted precision, focusing, concept drifts, noise/outlier removal, data smoothing, etc. Instance Selection and Construction for Data Mining brings researchers and practitioners together to report new developments and applications, to share hard-learned experiences in order to avoid similar pitfalls, and to shed light on the future development of instance selection. This volume serves as a comprehensive reference for graduate students, practitioners and researchers in KDD.

228 citations

Patent
Kelly Wical1
31 May 1995
TL;DR: In this article, a knowledge catalog includes a plurality of independent and parallel static ontologies to accurately represent a broad coverage of concepts that define knowledge, and a knowledge classification system that includes the knowledge catalog is also disclosed.
Abstract: A knowledge catalog includes a plurality of independent and parallel static ontologies to accurately represent a broad coverage of concepts that define knowledge. The actual configuration, structure and orientation of a particular static ontology is dependent upon the subject matter or field of the ontology in that each ontology contains a different point of view. The static ontologies store all senses for each word and concept. A knowledge classification system, that includes the knowledge catalog, is also disclosed. A knowledge catalog processor accesses the knowledge catalog to classify input terminology based on the knowledge concepts in the knowledge catalog. Furthermore, the knowledge catalog processor processes the input terminology prior to attachment in the knowledge catalog. The knowledge catalog further includes a dynamic level that includes dynamic hierarchies. The dynamic level adds details for the knowledge catalog by including additional words and terminology, arranged in a hierarchy, to permit a detailed and in-depth coverage of specific concepts contained in a particular discourse. The static and dynamic ontologies are relational such that the linking of one or more ontologies, or portions thereof, result in a very detailed organization of knowledge concepts.

228 citations

Proceedings ArticleDOI
29 Oct 2012
TL;DR: A novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases is developed, which improves the quality of prior link-based models, and also eliminates the need for explicit interlinkage between entities.
Abstract: Measuring the semantic relatedness between two entities is the basis for numerous tasks in IR, NLP, and Web-based knowledge extraction. This paper focuses on disambiguating names in a Web or text document by jointly mapping all names onto semantically related entities registered in a knowledge base. To this end, we have developed a novel notion of semantic relatedness between two entities represented as sets of weighted (multi-word) keyphrases, with consideration of partially overlapping phrases. This measure improves the quality of prior link-based models, and also eliminates the need for (usually Wikipedia-centric) explicit interlinkage between entities. Thus, our method is more versatile and can cope with long-tail and newly emerging entities that have few or no links associated with them. For efficiency, we have developed approximation techniques based on min-hash sketches and locality-sensitive hashing. Our experiments on semantic relatedness and on named entity disambiguation demonstrate the superiority of our method compared to state-of-the-art baselines.

224 citations

Journal ArticleDOI
01 Feb 1992
TL;DR: In this paper, the authors considered the problem of first-order theories of expert systems and presented techniques for resolving inconsistencies in such knowledge bases, and also provided algorithms for implementing these techniques.
Abstract: Consider the construction of an expert system by encoding the knowledge of different experts. Suppose the knowledge provided by each expert is encoded into a knowledge base. Then the process of combining the knowledge of these different experts is an important and nontrivial problem. We study this problem here when the expert systems are considered to be first-order theories. We present techniques for resolving inconsistencies in such knowledge bases. We also provide algorithms for implementing these techniques.

224 citations

Journal ArticleDOI
01 Oct 1995
TL;DR: It is shown that a specific problem solving episode, or case, may be viewed as data, information, or knowledge, depending on its role in decision making and learning from experience, and a conceptual framework for integration is suggested by focusing on their different roles and frames of reference within a decision-making process.
Abstract: The unclear distinction between data, information, and knowledge has impaired their combination and utilization for the development of integrated systems. There is need for a unified definitional model of data, information, and knowledge based on their roles in computational and cognitive information processing. An attempt to clarify these basic notions is made, and a conceptual framework for integration is suggested by focusing on their different roles and frames of reference within a decision-making process. On this basis, ways of integrating the functionalities of databases, information systems and knowledge-based systems are discussed by taking a knowledge level perspective to the analysis and modeling of systems behaviour. Motivated by recent work in the area of case-based reasoning related to decision support systems, it is further shown that a specific problem solving episode, or case, may be viewed as data, information, or knowledge, depending on its role in decision making and learning from experience. An outline of a case-based system architecture is presented, and used to show that a focus on the retaining and reuse of past cases facilitates a gradual and evolutionary transition from an information system to a knowledge-based system.

223 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683