scispace - formally typeset
Proceedings ArticleDOI

Clickthrough-based latent semantic models for web search

Reads0
Chats0
TLDR
Two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR) are presented.
Abstract
This paper presents two new document ranking models for Web search based upon the methods of semantic representation and the statistical translation-based approach to information retrieval (IR). Assuming that a query is parallel to the titles of the documents clicked on for that query, large amounts of query-title pairs are constructed from clickthrough data; two latent semantic models are learned from this data. One is a bilingual topic model within the language modeling framework. It ranks documents for a query by the likelihood of the query being a semantics-based translation of the documents. The semantic representation is language independent and learned from query-title pairs, with the assumption that a query and its paired titles share the same distribution over semantic topics. The other is a discriminative projection model within the vector space modeling framework. Unlike Latent Semantic Analysis and its variants, the projection matrix in our model, which is used to map from term vectors into sematic space, is learned discriminatively such that the distance between a query and its paired title, both represented as vectors in the projected semantic space, is smaller than that between the query and the titles of other documents which have no clicks for that query. These models are evaluated on the Web search task using a real world data set. Results show that they significantly outperform their corresponding baseline models, which are state-of-the-art.

read more

Content maybe subject to copyright    Report

Citations
More filters
Book

Deep Learning: Methods and Applications

Li Deng, +1 more
TL;DR: This monograph provides an overview of general deep learning methodology and its applications to a variety of signal and information processing tasks, including natural language and text processing, information retrieval, and multimodal information processing empowered by multi-task deep learning.
Proceedings ArticleDOI

Learning deep structured semantic models for web search using clickthrough data

TL;DR: A series of new latent semantic models with a deep structure that project queries and documents into a common low-dimensional space where the relevance of a document given a query is readily computed as the distance between them are developed.
Proceedings ArticleDOI

A Latent Semantic Model with Convolutional-Pooling Structure for Information Retrieval

TL;DR: A new latent semantic model that incorporates a convolutional-pooling structure over word sequences to learn low-dimensional, semantic vector representations for search queries and Web documents is proposed.
Journal ArticleDOI

Deep sentence embedding using long short-term memory networks: analysis and application to information retrieval

TL;DR: In this article, the LSTM-RNN model was used for sentence embedding in a web search engine and the results showed that the proposed method significantly outperformed the Paragraph Vector method for web document retrieval task.
Proceedings ArticleDOI

Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval

TL;DR: This work develops a multi-task DNN for learning representations across multiple tasks, not only leveraging large amounts of cross-task data, but also benefiting from a regularization effect that leads to more general representations to help tasks in new domains.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

Indexing by Latent Semantic Analysis

TL;DR: A new method for automatic indexing and retrieval to take advantage of implicit higher-order structure in the association of terms with documents (“semantic structure”) in order to improve the detection of relevant documents on the basis of terms found in queries.
Book

Foundations of Statistical Natural Language Processing

TL;DR: This foundational text is the first comprehensive introduction to statistical natural language processing (NLP) to appear and provides broad but rigorous coverage of mathematical and linguistic foundations, as well as detailed discussion of statistical methods, allowing students and researchers to construct their own implementations.
Related Papers (5)