scispace - formally typeset
Open AccessProceedings Article

Learning the Latent Topics for Question Retrieval in Community QA

Reads0
Chats0
TLDR
This paper proposes a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions and combines the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval.
Abstract
Community-based Question Answering (cQA) is a popular online service where users can ask and answer questions on any topics. This paper is concerned with the problem of question retrieval. Question retrieval in cQA aims to find historical questions that are semantically equivalent or relevant to the queried questions. Although the translation-based language model (Xue et al., 2008) has gained the state-of-the-art performance for question retrieval, they ignore the latent topic information in calculating the semantic similarity between questions. In this paper, we propose a topic model incorporated with the category information into the process of discovering the latent topics in the content of questions. Then we combine the semantic similarity based latent topics with the translation-based language model into a unified framework for question retrieval. Experiments are carried out on a real world cQA data set from Yahoo! Answers. The results show that our proposed method can significantly improve the question retrieval performance of translation-based language model.

read more

Content maybe subject to copyright    Report

Citations
More filters
Proceedings ArticleDOI

CQArank: jointly model topics and expertise in community question answering

TL;DR: This work proposed Topic Expertise Model (TEM), a novel probabilistic generative model with GMM hybrid, to jointly model topics and expertise by integrating textual content model and link structure analysis, and proposed CQARank to measure user interests and expertise score under different topics.
Proceedings ArticleDOI

Learning Continuous Word Embedding with Metadata for Question Retrieval in Community Question Answering

TL;DR: This paper proposes to learn continuous word embeddings with metadata of category information within cQA pages for question retrieval with the framework of fisher kernel to deal with the variable size of word embedding vectors.
Proceedings ArticleDOI

Question-answer topic model for question retrieval in community question answering

TL;DR: A novel Question-Answer Topic Model (QATM) is proposed to learn the latent topics aligned across the question-answer pairs to alleviate the lexical gap problem, with the assumption that a question and its paired answer share the same topic distribution.
Proceedings ArticleDOI

Question Retrieval with High Quality Answers in Community Question Answering

TL;DR: A topic-based language model, which matches questions not only on a term level but also on a topic level, which can significantly outperform state-of-the-art retrieval models in CQA.
Proceedings ArticleDOI

Learning Hybrid Representations to Retrieve Semantically Equivalent Questions

TL;DR: Retrieving similar questions in online QA (2) BOW-CNN is more robust than the pure CNN for long texts.
References
More filters
Journal ArticleDOI

Latent dirichlet allocation

TL;DR: This work proposes a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hofmann's aspect model.
Proceedings Article

Latent Dirichlet Allocation

TL;DR: This paper proposed a generative model for text and other collections of discrete data that generalizes or improves on several previous models including naive Bayes/unigram, mixture of unigrams, and Hof-mann's aspect model, also known as probabilistic latent semantic indexing (pLSI).
Journal ArticleDOI

A vector space model for automatic indexing

TL;DR: An approach based on space density computations is used to choose an optimum indexing vocabulary for a collection of documents, demonstating the usefulness of the model.
Journal ArticleDOI

Finding scientific topics

TL;DR: A generative model for documents is described, introduced by Blei, Ng, and Jordan, and a Markov chain Monte Carlo algorithm is presented for inference in this model, which is used to analyze abstracts from PNAS by using Bayesian model selection to establish the number of topics.
Journal Article

The mathematics of statistical machine translation: parameter estimation

TL;DR: The authors describe a series of five statistical models of the translation process and give algorithms for estimating the parameters of these models given a set of pairs of sentences that are translations of one another.
Related Papers (5)