scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
BookDOI
01 Sep 2004
TL;DR: It is shown how carefully crafted random matrices can achieve distance-preserving dimensionality reduction, accelerate spectral computations, and reduce the sample complexity of certain kernel methods.
Abstract: We show how carefully crafted random matrices can achieve distance-preserving dimensionality reduction, accelerate spectral computations, and reduce the sample complexity of certain kernel methods.

184 citations

Patent
03 Aug 2001
TL;DR: In this paper, a system and method for searching documents in a data source and more particularly, to a system for analyzing and clustering of documents for a search engine is presented. But the system is not suitable for large scale data sets.
Abstract: A system and method for searching documents in a data source and more particularly, to a system and method for analyzing and clustering of documents for a search engine. The system and method includes analyzing and processing documents to secure the infrastructure and standards for optimal document processing. By incorporating Computational Intelligence (CI) and statistical methods, the document information is analyzed and clustered using novel techniques for knowledge extraction. A comprehensive dictionary is built based on the keywords identified by the these techniques from the entire text of the document. The text is parsed for keywords or the number of its occurrences and the context in which the word appears in the documents. The whole document is identified by the knowledge that is represented in its contents. Based on such knowledge extracted from all the documents, the documents are clustered into meaningful groups in a catalog tree. The results of document analysis and clustering information are stored in a database.

184 citations

Patent
26 Aug 1999
TL;DR: In this paper, the authors present a computer-based method and apparatus for knowledge discovery from databases, which involves the user creation of a project plan comprising a plurality of operational components adapted to cooperatively extract desired information from a database.
Abstract: A computer-based method and apparatus for knowledge discovery from databases. The disclosed method involves the user creation of a project plan comprising a plurality of operational components adapted to cooperatively extract desired information from a database. In one embodiment, the project plan is created within a graphical user interface and consists of objects representing the various functional components of the overall plan interconnected by links representing the flow of data from the data source to a data sink. Data visualization components may be inserted essentially anywhere in the project plan. One or more data links in the project plan may be designated as caching links which maintain copies of the data flowing across them, such that the cached data is available to other components in the project plan. In one embodiment, compression technology is applied to reduce the overall size of the database.

184 citations

Book ChapterDOI
11 Oct 2001
TL;DR: The penetration of data warehouses into the management and exploitation of spatial databases is a major trend as it is for non-spatial databases.
Abstract: Recent years have witnessed major changes in the Geographic Information System (GIS) market, from technological offerings to user requests. For example, spatial databases used to be implemented in GISs or in Computer-Assisted Design (CAD) systems coupled with a Relational Data Base Management System (RDBMS). Today, spatial databases are also implemented in spatial extensions of universal servers, in spatial engine software components, in GIS web servers, in analytical packages using so-called 'data cubes' and in spatial data warehouses. Such databases are structured according to either a relational, object-oriented, multi-dimensional or hybrid paradigm. In addition, these offerings are integrated as a piece of the overall technological framework of the organization and they are implemented according to very diverse architectures responding to differing users' contexts: centralized vs distributed, thin-clients vs thick-clients, Local Area Network (LAN) vs intranets, spatial data warehouses vs legacy systems, etc. As one may say, 'Gone are the days of a spatial database implemented solely on a stand-alone GIS' (Bédard 1999). In fact, this evolution of the GIS market follows the general trends of mainstream Information Technologies (IT). Among all these possibilities, the penetration of data warehouses into the management and exploitation of spatial databases is a major trend as it is for non-spatial databases. According to Rawling and Kucera (1997), 'the term Data Warehouse has become the hottest industry buzzword of the decade just behind Internet and information highway'. More specifically, this penetration of data warehouses allows developers to build new solutions geared towards one major need which has never been solved efficiently insofar: to provide a unified view of dispersed heterogeneous databases in order to efficiently feed the decision-support tools used for strategic decision making. In fact, the data warehouse emerged as the unifying solution to a series of individual circumstances related to providing the necessary basis for global knowledge discovery. First, large organizations often have several departmental or application-oriented independent databases which may overlap in content. Usually, such systems work properly for day-today operational-level decisions. However, when one needs to obtain aggregated or summarized information integrating data from these different

183 citations

Book ChapterDOI
03 Oct 2005
TL;DR: This paper systematically analyzes the problem of mining hidden communities on heterogeneous social networks and proposes a new method for learning an optimal linear combination of these relations which can best meet the user's expectation.
Abstract: Social network analysis has attracted much attention in recent years. Community mining is one of the major directions in social network analysis. Most of the existing methods on community mining assume that there is only one kind of relation in the network, and moreover, the mining results are independent of the users' needs or preferences. However, in reality, there exist multiple, heterogeneous social networks, each representing a particular kind of relationship, and each kind of relationship may play a distinct role in a particular task. In this paper, we systematically analyze the problem of mining hidden communities on heterogeneous social networks. Based on the observation that different relations have different importance with respect to a certain query, we propose a new method for learning an optimal linear combination of these relations which can best meet the user's expectation. With the obtained relation, better performance can be achieved for community mining.

183 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683