Open AccessProceedings Article
EntityRank: searching entities directly and holistically
Tao Cheng,Xifeng Yan,Kevin Chen-Chuan Chang +2 more
- pp 387-398
Reads0
Chats0
TLDR
This work focuses on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking.Abstract:
As the Web has evolved into a data-rich repository, with the standard "page view," current search engines are becoming increasingly inadequate for a wide range of query tasks. While we often search for various data "entities" (e.g., phone number, paper PDF, date), today's engines only take us indirectly to pages. While entities appear in many pages, current engines only find each page individually. Toward searching directly and holistically for finding information of finer granularity, we study the problem of entity search, a significant departure from traditional document retrieval. We focus on the core challenge of ranking entities, by distilling its underlying conceptual model Impression Model and developing a probabilistic ranking framework, EntityRank, that is able to seamlessly integrate both local and global information in ranking. We evaluate our online prototype over a 2TB Web corpus, and show that EntityRank performs effectively.read more
Citations
More filters
Journal ArticleDOI
YAGO: A Large Ontology from Wikipedia and WordNet
TL;DR: YAGO is a large ontology with high coverage and precision, based on a clean logical model with a decidable consistency that allows representing n-ary relations in a natural way while maintaining compatibility with RDFS.
Journal ArticleDOI
Entity Linking with a Knowledge Base: Issues, Techniques, and Solutions
TL;DR: A thorough overview and analysis of the main approaches to entity linking is presented, and various applications, the evaluation of entity linking systems, and future directions are discussed.
Journal ArticleDOI
Annotating and searching web tables using entities, types and relationships
TL;DR: This paper proposes new machine learning techniques to annotate table cells with entities that they likely mention, table columns with types from which entities are drawn for cells in the column, and relations that pairs of table columns seek to express, and a new graphical model for making all these labeling decisions for each table simultaneously.
Book ChapterDOI
New Regularized Algorithms for Transductive Learning
TL;DR: This work proposes a new graph-based label propagation algorithm for transductive learning that can be extended to incorporate additional prior information, and demonstrates it with classifying data where the labels are not mutually exclusive.
Proceedings ArticleDOI
From information to knowledge: harvesting entities and relationships from web sources
Gerhard Weikum,Martin Theobald +1 more
TL;DR: This tutorial discusses state-of-the-art methods, research opportunities, and open challenges along this avenue of knowledge harvesting, to automatically construct and maintain a comprehensive knowledge base of facts about named entities, their semantic classes, and their mutual relations as well as temporal contexts, with high precision and high recall.
References
More filters
Journal ArticleDOI
The anatomy of a large-scale hypertextual Web search engine
Sergey Brin,Lawrence Page +1 more
TL;DR: This paper provides an in-depth description of Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and looks at the problem of how to effectively deal with uncontrolled hypertext collections where anyone can publish anything they want.
Journal Article
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Sergey Brin,Lawrence Page +1 more
TL;DR: Google as discussed by the authors is a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext and is designed to crawl and index the Web efficiently and produce much more satisfying search results than existing systems.
Journal Article
Accurate methods for the statistics of surprise and coincidence
TL;DR: The basis of a measure based on likelihood ratios that can be applied to the analysis of text is described, and in cases where traditional contingency table methods work well, the likelihood ratio tests described here are nearly identical.
Proceedings ArticleDOI
Web-scale information extraction in knowitall: (preliminary results)
Oren Etzioni,Michael Cafarella,Doug Downey,Stanley Kok,Ana-Maria Popescu,Tal Shaked,Stephen Soderland,Daniel S. Weld,Alexander Yates +8 more
TL;DR: KnowItAll, a system that aims to automate the tedious process of extracting large collections of facts from the web in an autonomous, domain-independent, and scalable manner, is introduced.
Proceedings ArticleDOI
GATE - a General Architecture for Text Engineering
TL;DR: GATE lies at the intersection of human language computation and software engineering, and constitutes aninfrastructural system supporting research and development of languageprocessing software.