Fuzzy matching of Web queries to structured data
Tao Cheng,Hady W. Lauw,Stelios Paparizos +2 more
- pp 713-716
Reads0
Chats0
TLDR
This paper proposes an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages and generates an expanded set of equivalent strings for each entity.Abstract:
Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.read more
Citations
More filters
Patent
Word detection and domain dictionary recommendation
Hao Sun,Chi-Ho Li,Jing Li +2 more
TL;DR: In this article, a new word detection and domain dictionary recommendation system is proposed for Chinese text, where text content is received according to a given language, for example, Chinese language, and words are extracted from the content by analyzing the content according to various rules.
Proceedings ArticleDOI
Structured annotations of web queries
TL;DR: This paper proposes a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and defines a dynamic threshold for filtering out misinterpreted query annotations.
Journal ArticleDOI
Keyword++: a framework to improve keyword search over entity databases
Venkatesh Ganti,Yeye He,Dong Xin +2 more
TL;DR: A general framework that can improve an existing search interface by translating a keyword query to a structured query that leverages the keyword to attribute value associations discovered in the results returned by the original search interface is proposed.
Proceedings ArticleDOI
Keyword-based search and exploration on databases
Yi Chen,Wei Wang,Ziyang Liu +2 more
TL;DR: This tutorial gives an overview of the state-of-the-art techniques for supporting keyword-based search and exploration on databases and identifies the challenges and opportunities for future research to advance the field.
Proceedings ArticleDOI
A framework for robust discovery of entity synonyms
TL;DR: A general framework for robustly discovering entity synonym with two novel similarity functions that overcome the limitations of prior techniques is proposed and efficient and scalable techniques leveraging the MapReduce framework are developed to discover synonyms at large scale.
References
More filters
Journal ArticleDOI
WordNet: a lexical database for English
TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Proceedings ArticleDOI
Generating query substitutions
TL;DR: A model for selecting between candidates is built, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions, which improves the quality of the candidates generated.
Proceedings ArticleDOI
Reference reconciliation in complex information spaces
TL;DR: This work considers complex information spaces: the authors' references belong to multiple related classes and each reference may have very few attribute values, and gradually enrich references by merging attribute values.
Journal ArticleDOI
Swoosh: a generic approach to entity resolution
Omar Benjelloun,Hector Garcia-Molina,David Menestrina,Qi Su,Steven Euijong Whang,Jennifer Widom +5 more
TL;DR: This work formalizes the generic ER problem, treating the functions for comparing and merging records as black-boxes, and identifies four important properties that, if satisfied by the match and merge functions, enable much more efficient ER algorithms.
Proceedings ArticleDOI
Random walks on the click graph
Nick Craswell,Martin Szummer +1 more
TL;DR: A Markov random walk model is applied to a large click log, producing a probabilistic ranking of documents for a given query, demonstrating its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively.