scispace - formally typeset
Open AccessBook

Semantic Matching in Search

Hang Li, +1 more
Reads0
Chats0
TLDR
This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search, and focuses on the fundamental problems, as well as the state-of-the-art solutions.
Abstract
Relevance is the most important factor to assure users' satisfaction in search and the success of a search engine heavily depends on its performance on relevance. It has been observed that most of the dissatisfaction cases in relevance are due to term mismatch between queries and documents (e.g., query "NY times" does not match well with a document only containing "New York Times"), because term matching, i.e., the bag-of-words approach, still functions as the main mechanism of modern search engines. It is not exaggerated to say, therefore, that mismatch between query and document poses the most critical challenge in search. Ideally, one would like to see query and document match with each other, if they are topically relevant. Recently, researchers have expended significant effort to address the problem. The major approach is to conduct semantic matching, i.e., to perform more query and document understanding to represent the meanings of them, and perform better matching between the enriched query and document representations. With the availability of large amounts of log data and advanced machine learning techniques, this becomes more feasible and significant progress has been made recently. This survey gives a systematic and detailed introduction to newly developed machine learning technologies for query document matching (semantic matching) in search, particularly web search. It focuses on the fundamental problems, as well as the state-of-the-art solutions of query document matching on form aspect, phrase aspect, word sense aspect, topic aspect, and structure aspect. The ideas and solutions explained may motivate industrial practitioners to turn the research results into products. The methods introduced and the discussions made may also stimulate academic researchers to find new research directions and approaches. Matching between query and document is not limited to search and similar problems can be found in question answering, online advertising, cross-language information retrieval, machine translation, recommender systems, link prediction, image annotation, drug design, and other applications, as the general task of matching between objects from two different spaces. The technologies introduced can be generalized into more general machine learning techniques, which is referred to as learning to match in this survey.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

Short Text Similarity with Word Embeddings

TL;DR: This work proposes to go from word-level to text-level semantics by combining insights from methods based on external sources of semantic knowledge with word embeddings, and derives multiple types of meta-features from the comparison of the word vectors for short text pairs, and from the vector means of their respective word embedDings.
Posted Content

Text Matching as Image Recognition

TL;DR: In this article, a convolutional neural network is utilized to capture rich matching patterns in a layer-by-layer way, which can successfully identify salient signals such as n-gram and n-term matchings.
Proceedings ArticleDOI

Learning to Respond with Deep Neural Networks for Retrieval-Based Human-Computer Conversation System

TL;DR: This paper proposes a retrieval-based conversation system with the deep learning-to-respond schema through a deep neural network framework driven by web data and demonstrates significant performance improvement against a series of standard and state-of-art baselines for conversational purposes.
Posted Content

Pretrained Transformers for Text Ranking: BERT and Beyond

TL;DR: This tutorial provides an overview of text ranking with neural network architectures known as transformers, of which BERT (Bidirectional Encoder Representations from Transformers) is the best-known example, and covers a wide range of techniques.
Posted Content

An Information Retrieval Approach to Short Text Conversation

TL;DR: This paper proposes formalizing short text conversation as a search problem at the first step, and employing state-of-the-art information retrieval techniques to carry out the task, investigating the significance as well as the limitation of the IR approach.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Proceedings Article

The PageRank Citation Ranking : Bringing Order to the Web

TL;DR: This paper describes PageRank, a mathod for rating Web pages objectively and mechanically, effectively measuring the human interest and attention devoted to them, and shows how to efficiently compute PageRank for large numbers of pages.
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

Introduction to Information Retrieval

TL;DR: In this article, the authors present an up-to-date treatment of all aspects of the design and implementation of systems for gathering, indexing, and searching documents; methods for evaluating systems; and an introduction to the use of machine learning methods on text collections.
Related Papers (5)
Trending Questions (1)
How does the matching process in search engines work?

The paper discusses the importance of relevance in search and the challenges of term mismatch between queries and documents. It introduces machine learning technologies for semantic matching in search. However, it does not provide a detailed explanation of how the matching process in search engines works.