scispace - formally typeset
Search or ask a question
Topic

Knowledge extraction

About: Knowledge extraction is a research topic. Over the lifetime, 20251 publications have been published within this topic receiving 413401 citations.


Papers
More filters
Book ChapterDOI
Bo Xu1, Yong Xu1, Jiaqing Liang1, Chenhao Xie1, Bin Liang1, Wanyun Cui1, Yanghua Xiao1 
27 Jun 2017
TL;DR: A never-ending Chinese Knowledge extraction system, CN-DBpedia, which can automatically generate a knowledge base that is of ever-increasing in size and constantly updated, and reduces the human costs by reusing the ontology of existing knowledge bases and building an end-to-end facts extraction model.
Abstract: Great efforts have been dedicated to harvesting knowledge bases from online encyclopedias These knowledge bases play important roles in enabling machines to understand texts However, most current knowledge bases are in English and non-English knowledge bases, especially Chinese ones, are still very rare Many previous systems that extract knowledge from online encyclopedias, although are applicable for building a Chinese knowledge base, still suffer from two challenges The first is that it requires great human efforts to construct an ontology and build a supervised knowledge extraction model The second is that the update frequency of knowledge bases is very slow To solve these challenges, we propose a never-ending Chinese Knowledge extraction system, CN-DBpedia, which can automatically generate a knowledge base that is of ever-increasing in size and constantly updated Specially, we reduce the human costs by reusing the ontology of existing knowledge bases and building an end-to-end facts extraction model We further propose a smart active update strategy to keep the freshness of our knowledge base with little human costs The 164 million API calls of the published services justify the success of our system

197 citations

Book ChapterDOI
06 Aug 1995
TL;DR: This paper addresses the task of class identification in spatial databases using clustering techniques using a well-known spatial access method, the R*-tree, and presents several strategies for focusing: selecting representatives from a spatial database, focusing on the relevant clusters and retrieving all objects of a given cluster.
Abstract: Both, the number and the size of spatial databases are rapidly growing because of the large amount of data obtained from satellite images, X-ray crystallography or other scientific equipment Therefore, automated knowledge discovery becomes more and more important in spatial databases So far, most of the methods for knowledge discovery in databases (KDD) have been based on relational database systems In this paper, we address the task of class identification in spatial databases using clustering techniques We put special emphasis on the integration of the discovery methods with the DB interface, which is crucial for the efficiency of KDD on large databases The key to this integration is the use of a well-known spatial access method, the R*-tree The focusing component of a KDD system determines which parts of the database are relevant for the knowledge discovery task We present several strategies for focusing: selecting representatives from a spatial database, focusing on the relevant clusters and retrieving all objects of a given cluster We have applied the proposed techniques to real data from a large protein database used for predicting protein-protein docking A performance evaluation on this database indicates that clustering on large spatial databases can be performed, both, efficiently and effectively

197 citations

Journal ArticleDOI
01 Mar 2017
TL;DR: The five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs ofbig data initiatives.
Abstract: The era of big data has resulted in the development and applications of technologies and methods aimed at effectively using massive amounts of data to support decision-making and knowledge discovery activities. In this paper, the five Vs of big data, volume, velocity, variety, veracity, and value, are reviewed, as well as new technologies, including NoSQL databases that have emerged to accommodate the needs of big data initiatives. The role of conceptual modeling for big data is then analyzed and suggestions made for effective conceptual modeling efforts with respect to big data.

197 citations

Journal ArticleDOI
TL;DR: A comprehensive literature review of major scientific contributions made so far in this research area is undertaken and a holistic overview of the main applications of Data Sciences to digital marketing is presented to generate insights related to the creation of innovative Data Mining and knowledge discovery techniques.

196 citations

Proceedings ArticleDOI
01 Oct 2014
TL;DR: A tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction by leveraging relational domain knowledge about entity type information, which is significantly faster than previous approaches and better able to discover new relations missing from the database.
Abstract: While relation extraction has traditionally been viewed as a task relying solely on textual data, recent work has shown that by taking as input existing facts in the form of entity-relation triples from both knowledge bases and textual data, the performance of relation extraction can be improved significantly. Following this new paradigm, we propose a tensor decomposition approach for knowledge base embedding that is highly scalable, and is especially suitable for relation extraction. By leveraging relational domain knowledge about entity type information, our learning algorithm is significantly faster than previous approaches and is better able to discover new relations missing from the database. In addition, when applied to a relation extraction task, our approach alone is comparable to several existing systems, and improves the weighted mean average precision of a state-of-theart method by 10 points when used as a subcomponent.

196 citations


Network Information
Related Topics (5)
Cluster analysis
146.5K papers, 2.9M citations
90% related
Support vector machine
73.6K papers, 1.7M citations
90% related
Artificial neural network
207K papers, 4.5M citations
87% related
Fuzzy logic
151.2K papers, 2.3M citations
86% related
Feature extraction
111.8K papers, 2.1M citations
86% related
Performance
Metrics
No. of papers in the topic in previous years
YearPapers
2023120
2022285
2021506
2020660
2019740
2018683