scispace - formally typeset
Open AccessProceedings ArticleDOI

Fuzzy matching of Web queries to structured data

Reads0
Chats0
TLDR
This paper proposes an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages and generates an expanded set of equivalent strings for each entity.
Abstract
Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.

read more

Citations
More filters
Patent

Word detection and domain dictionary recommendation

TL;DR: In this article, a new word detection and domain dictionary recommendation system is proposed for Chinese text, where text content is received according to a given language, for example, Chinese language, and words are extracted from the content by analyzing the content according to various rules.
Proceedings ArticleDOI

Structured annotations of web queries

TL;DR: This paper proposes a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and defines a dynamic threshold for filtering out misinterpreted query annotations.
Journal ArticleDOI

Keyword++: a framework to improve keyword search over entity databases

TL;DR: A general framework that can improve an existing search interface by translating a keyword query to a structured query that leverages the keyword to attribute value associations discovered in the results returned by the original search interface is proposed.
Proceedings ArticleDOI

Keyword-based search and exploration on databases

TL;DR: This tutorial gives an overview of the state-of-the-art techniques for supporting keyword-based search and exploration on databases and identifies the challenges and opportunities for future research to advance the field.
Proceedings ArticleDOI

A framework for robust discovery of entity synonyms

TL;DR: A general framework for robustly discovering entity synonym with two novel similarity functions that overcome the limitations of prior techniques is proposed and efficient and scalable techniques leveraging the MapReduce framework are developed to discover synonyms at large scale.
References
More filters
Journal ArticleDOI

WordNet: a lexical database for English

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.
Proceedings ArticleDOI

Generating query substitutions

TL;DR: A model for selecting between candidates is built, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions, which improves the quality of the candidates generated.
Proceedings ArticleDOI

Reference reconciliation in complex information spaces

TL;DR: This work considers complex information spaces: the authors' references belong to multiple related classes and each reference may have very few attribute values, and gradually enrich references by merging attribute values.
Journal ArticleDOI

Swoosh: a generic approach to entity resolution

TL;DR: This work formalizes the generic ER problem, treating the functions for comparing and merging records as black-boxes, and identifies four important properties that, if satisfied by the match and merge functions, enable much more efficient ER algorithms.
Proceedings ArticleDOI

Random walks on the click graph

TL;DR: A Markov random walk model is applied to a large click log, producing a probabilistic ranking of documents for a given query, demonstrating its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively.