Fuzzy matching of Web queries to structured data

doi:10.1109/ICDE.2010.5447817

Open AccessProceedings ArticleDOI

Fuzzy matching of Web queries to structured data

Tao Cheng, +2 more

- pp 713-716

Chats0

TLDR

This paper proposes an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages and generates an expanded set of equivalent strings for each entity.

Abstract:

Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismatch between how content creators describe entities and how different users try to retrieve them. In this paper, we consider the problem of determining whether a candidate query approximately matches with an entity. We propose an off-line, data-driven, bottom-up approach that mines query logs for instances where Web content creators and Web users apply a variety of strings to refer to the same Web pages. This way, given a set of strings that reference entities, we generate an expanded set of equivalent strings for each entity. The proposed method is verified with experiments on real-life data sets showing that we can dramatically increase the queries that can be matched.

Citations

PDF

Open Access

More filters

Patent

Word detection and domain dictionary recommendation

Hao Sun, +2 more

TL;DR: In this article, a new word detection and domain dictionary recommendation system is proposed for Chinese text, where text content is received according to a given language, for example, Chinese language, and words are extracted from the content by analyzing the content according to various rules.

...read moreread less

Proceedings ArticleDOI

Structured annotations of web queries

Nikos Sarkas, +2 more

TL;DR: This paper proposes a principled probabilistic scoring mechanism, using a generative model, for assessing the likelihood of a structured annotation, and defines a dynamic threshold for filtering out misinterpreted query annotations.

...read moreread less

Journal ArticleDOI

Keyword++: a framework to improve keyword search over entity databases

Venkatesh Ganti, +2 more

TL;DR: A general framework that can improve an existing search interface by translating a keyword query to a structured query that leverages the keyword to attribute value associations discovered in the results returned by the original search interface is proposed.

...read moreread less

Proceedings ArticleDOI

Keyword-based search and exploration on databases

Yi Chen, +2 more

TL;DR: This tutorial gives an overview of the state-of-the-art techniques for supporting keyword-based search and exploration on databases and identifies the challenges and opportunities for future research to advance the field.

...read moreread less

Proceedings ArticleDOI

A framework for robust discovery of entity synonyms

Kaushik Chakrabarti, +3 more

TL;DR: A general framework for robustly discovering entity synonym with two novel similarity functions that overcome the limitations of prior techniques is proposed and efficient and scalable techniques leveraging the MapReduce framework are developed to discover synonyms at large scale.

...read moreread less

Collapse

References

PDF

Open Access

More filters

Journal ArticleDOI

WordNet: a lexical database for English

George A. Miller

- 01 Nov 1995 -

Communications of The ACM

TL;DR: WordNet1 provides a more effective combination of traditional lexicographic information and modern computing, and is an online lexical database designed for use under program control.

...read moreread less

Proceedings ArticleDOI

Generating query substitutions

Rosie Jones, +3 more

TL;DR: A model for selecting between candidates is built, by using a number of features relating the query-candidate pair, and by fitting the model to human judgments of relevance of query suggestions, which improves the quality of the candidates generated.

...read moreread less

Proceedings ArticleDOI

Reference reconciliation in complex information spaces

Xin Dong, +2 more

TL;DR: This work considers complex information spaces: the authors' references belong to multiple related classes and each reference may have very few attribute values, and gradually enrich references by merging attribute values.

...read moreread less

Journal ArticleDOI

Swoosh: a generic approach to entity resolution

Omar Benjelloun, +5 more

TL;DR: This work formalizes the generic ER problem, treating the functions for comparing and merging records as black-boxes, and identifies four important properties that, if satisfied by the match and merge functions, enable much more efficient ER algorithms.

...read moreread less

Proceedings ArticleDOI

Random walks on the click graph

Nick Craswell, +1 more

TL;DR: A Markov random walk model is applied to a large click log, producing a probabilistic ranking of documents for a given query, demonstrating its ability to retrieve relevant documents that have not yet been clicked for that query and rank those effectively.

...read moreread less

Fuzzy matching of Web queries to structured data

Citations

Word detection and domain dictionary recommendation

Structured annotations of web queries

Keyword++: a framework to improve keyword search over entity databases

Keyword-based search and exploration on databases

A framework for robust discovery of entity synonyms

References

WordNet: a lexical database for English

Generating query substitutions

Reference reconciliation in complex information spaces

Swoosh: a generic approach to entity resolution

Random walks on the click graph

Related Papers (5)

Exploiting web search to generate synonyms for entities

Structured annotations of web queries

Mining the web for synonyms: PMI-IR versus LSA on TOEFL

Entity Synonyms for Structured Web Search

Concept-based interactive query expansion