Topic
Knowledge extraction
About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.
Papers published on a yearly basis
Papers
More filters
•
02 Aug 1996TL;DR: FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process.
Abstract: This paper describes the FACT system for knowledge discovery from text. It discovers associations - patterns of co-occurrence -amongst keywords labeling the items in a collection of textual documents. In addition, FACT is able to use background knowledge about the keywords labeling the documents in its discovery process. FACT takes a query-centered view of knowledge discovery, in which a discovery request is viewed as a query over the implicit set of possible results supported by a collection of documents, and where background knowledge is used to specify constraints on the desired results of this query process. Execution of a knowledge-discovery query is structured so that these background-knowledge constraints can be exploited in the search for possible results. Finally, rather than requiring a user to specify an explicit query expression in the knowledge-discovery query language, FACT presents the user with a simple-to-use graphical interface to the query language, with the language providing a well-defined semantics for the discovery actions performed by a user through the interface.
131 citations
••
TL;DR: The approach to protein functional annotation with case studies and examines common identification errors is described and it is illustrated that data integration in PIR supports exploration of protein relationships and may reveal protein functional associations beyond sequence homology.
131 citations
•
30 Sep 2008
TL;DR: This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.
Abstract: The massive daily overflow of electronic data to information seekers creates the need for better ways to digest and organize this information to make it understandable and useful. Text mining, a variation of data mining, extracts desired information from large, unstructured text collections stored in electronic forms. The Handbook of Research on Text and Web Mining Technologies is the first comprehensive reference to the state of research in the field of text mining, serving a pivotal role in educating practitioners in the field. This compendium of pioneering studies from leading experts is essential to academic reference collections and introduces researchers and students to cutting-edge techniques for gaining knowledge discovery from unstructured text.
130 citations
••
IBM1
TL;DR: This paper describes in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge.
Abstract: Access to a large amount of knowledge is critical for success at answering open-domain questions for DeepQA systems such as IBM Watson™. Formal representation of knowledge has the advantage of being easy to reason with, but acquisition of structured knowledge in open domains from unstructured data is often difficult and expensive. Our central hypothesis is that shallow syntactic knowledge and its implied semantics can be easily acquired and can be used in many areas of a question-answering system. We take a two-stage approach to extract the syntactic knowledge and implied semantics. First, shallow knowledge from large collections of documents is automatically extracted. Second, additional semantics are inferred from aggregate statistics of the automatically extracted shallow knowledge. In this paper, we describe in detail what kind of shallow knowledge is extracted, how it is automatically done from a large corpus, and how additional semantics are inferred from aggregate statistics. We also briefly discuss the various ways extracted knowledge is used throughout the IBM DeepQA system.
130 citations
••
01 Jul 2008TL;DR: By shifting the concept of k-anonymity from the source data to the extracted patterns, this paper formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provides a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns.
Abstract: It is generally believed that data mining results do not violate the anonymity of the individuals recorded in the source database. In fact, data mining models and patterns, in order to ensure a required statistical significance, represent a large number of individuals and thus conceal individual identities: this is the case of the minimum support threshold in frequent pattern mining. In this paper we show that this belief is ill-founded. By shifting the concept of k -anonymity from the source data to the extracted patterns, we formally characterize the notion of a threat to anonymity in the context of pattern discovery, and provide a methodology to efficiently and effectively identify all such possible threats that arise from the disclosure of the set of extracted patterns. On this basis, we obtain a formal notion of privacy protection that allows the disclosure of the extracted knowledge while protecting the anonymity of the individuals in the source database. Moreover, in order to handle the cases where the threats to anonymity cannot be avoided, we study how to eliminate such threats by means of pattern (not data!) distortion performed in a controlled way.
130 citations