Proceedings ArticleDOI
Managing information extraction: state of the art and research directions
AnHai Doan,Raghu Ramakrishnan,Shivakumar Vaithyanathan +2 more
- pp 799-800
TLDR
This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text), and shows how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools.Abstract:
This tutorial makes the case for developing a unified framework that manages information extraction from unstructured data (focusing in particular on text). We first survey research on information extraction in the database, AI, NLP, IR, and Web communities in recent years. Then we discuss why this is the right time for the database community to actively participate and address the problem of managing information extraction (including in particular the challenges of maintaining and querying the extracted information, and accounting for the imprecision and uncertainty inherent in the extraction process). Finally, we show how interested researchers can take the next step, by pointing to open problems, available datasets, applicable standards, and software tools. We do not assume prior knowledge of text management, NLP, extraction techniques, or machine learning.read more
Citations
More filters
Journal ArticleDOI
YAGO: A Large Ontology from Wikipedia and WordNet
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.
Proceedings ArticleDOI
NAGA: Searching and Ranking Knowledge
TL;DR: This paper proposes NAGA, a new semantic search engine that builds on a knowledge base, which is organized as a graph with typed edges, and consists of millions of entities and relationships extracted from Web-based corpora.
Proceedings Article
Declarative information extraction using datalog with embedded extraction predicates
TL;DR: This paper argues that developing information extraction programs using Datalog with embedded procedural extraction predicates is a good way to proceed, and shows how optimizing such programs raises challenges specific to text data that cannot be accommodated in the current relational optimization framework.
Proceedings Article
EntityRank: searching entities directly and holistically
TL;DR: This work focuses on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking.
Journal ArticleDOI
On the provenance of non-answers to queries over extracted data
TL;DR: This work focuses on providing provenance-style explanations for non-answers and develops a mechanism for providing this new type of provenance and suggests that this approach can provide effective provenance information that can help a user resolve their doubts over non-ANSwers to a query.